Unicode in filenames
C. Scott Ananian
cscott at laptop.org
Thu Nov 15 23:51:46 EST 2007
On Nov 15, 2007 2:43 PM, Albert Cahalan <acahalan at gmail.com> wrote:
> C. Scott Ananian writes:
> The accepted standard is to use precomposed glyphs. This is
> compatible with the Linux kernel, with Windows, and with many
> other things that you aren't about to change.
The linux kernel doesn't care one way or another. Filenames are just
byte strings.
Windows doesn't play well with unicode, period -- case insensitivity
causes a real mess -- and is pretty irrelevant anyway.
There is no 'accepted standard'. The w3c recommends the use of NFC
for information exchange; that's as close to a specific recommendation
as I could find. NFC is "form NFD, then do canonical composition", so
there's an efficiency argument to be made in favor of using NFD.
> Normalization form C or KC would be far better, but I still don't
> think this is something that should be enforced.
As I described in my original mail, not enforcing a standard -- at
least for core files and for things written by sugar and activities --
will lead to madness: "identical" filenames/urls which aren't found
when expected.
> Files in the base system should only use characters that can be found
> on **all** keyboards and in **all** fonts. Probably this means ASCII.
I believe SJ would beg to differ: the only files we currently have
which are non-ASCII are in the library.
--scott
--
( http://cscott.net/ )
More information about the Devel
mailing list