Unicode in filenames

Albert Cahalan acahalan at gmail.com
Thu Nov 15 14:43:12 EST 2007


C. Scott Ananian writes:

> Joyride-277 doesn't validate, because it contains a file from the
> library with a filename in non-normalized unicode.  The file is named
> 'Annobo?n_Bioko-thumb.jpg', where the ? should be a separated accent
> on the o, but it is actually stored on the filename with a combined
> 'o+accent' glyph.

The accepted standard is to use precomposed glyphs. This is
compatible with the Linux kernel, with Windows, and with many
other things that you aren't about to change.

> My proposal is to ensure that all filenames in the base system (at
> least) are in normalization form D. I will write a checker in the
> build process to ensure this, and we should probably eventually write
> checkers for the activity/library bundle tools that will do the same.

Normalization form C or KC would be far better, but I still don't
think this is something that should be enforced.

Files in the base system should only use characters that can be found
on **all** keyboards and in **all** fonts. Probably this means ASCII.



More information about the Devel mailing list