Unicode in filenames

C. Scott Ananian cscott at laptop.org
Thu Nov 15 23:51:46 EST 2007


On Nov 15, 2007 2:43 PM, Albert Cahalan <acahalan at gmail.com> wrote:
> C. Scott Ananian writes:
> The accepted standard is to use precomposed glyphs. This is
> compatible with the Linux kernel, with Windows, and with many
> other things that you aren't about to change.

The linux kernel doesn't care one way or another.  Filenames are just
byte strings.
Windows doesn't play well with unicode, period -- case insensitivity
causes a real mess -- and is pretty irrelevant anyway.

There is no 'accepted standard'.  The w3c recommends the use of NFC
for information exchange; that's as close to a specific recommendation
as I could find.  NFC is "form NFD, then do canonical composition", so
there's an efficiency argument to be made in favor of using NFD.

> Normalization form C or KC would be far better, but I still don't
> think this is something that should be enforced.

As I described in my original mail, not enforcing a standard -- at
least for core files and for things written by sugar and activities --
will lead to madness: "identical" filenames/urls which aren't found
when expected.

> Files in the base system should only use characters that can be found
> on **all** keyboards and in **all** fonts. Probably this means ASCII.

I believe SJ would beg to differ: the only files we currently have
which are non-ASCII are in the library.
 --scott

-- 
                         ( http://cscott.net/ )



More information about the Devel mailing list