Unicode in filenames
Dan Williams
dcbw at redhat.com
Thu Nov 15 14:15:57 EST 2007
On Thu, 2007-11-15 at 09:54 -0500, C. Scott Ananian wrote:
> Joyride-277 doesn't validate, because it contains a file from the
> library with a filename in non-normalized unicode. The file is named
> 'Annobo?n_Bioko-thumb.jpg', where the ? should be a separated accent
> on the o, but it is actually stored on the filename with a combined
> 'o+accent' glyph.
>
> Now, at first blush this is a bug in the (fast) contents verifier,
> which I will fix: all strings should be unicode-normalized before they
> are compared. But it seems like this raises issues with (for example)
> URLs to library content. Should we enforce the constraint that all
> filenames are unicode-normalized on disk, so that we can guarantee
> that a (unicode-normalized) URL will always resolve correctly?
Everything on disk should be UTF-8. Anything that's not UTF-8 will not
be guaranteed to work. Filenames need to be converted to UTF-8 before
the file is opened/created/renamed/etc.
Dan
> Otherwise we run the risk of someone editing a file and resaving it
> with a name which *appears* identical, but is actually encoded
> differently on disk, and having URLs to the file mysteriously break.
>
> For the technically-minded, we're talking about using the UTF-8
> encoding of Unicode Normalization Form D, as discussed (briefly) at
> http://wiki.laptop.org/go/Canonical_JSON. The problem has arisen
> because the old libraries used normalized filenames, but we've
> switched to installing the libraries from RPMs, and apparently
> non-normalized filenames have snuck in. If I were to hazard a guess,
> I'd say that the tar command normalizes filenames as they are
> archived, while RPM does not.
>
> My proposal is to ensure that all filenames in the base system (at
> least) are in normalization form D. I will write a checker in the
> build process to ensure this, and we should probably eventually write
> checkers for the activity/library bundle tools that will do the same.
> --scott
>
More information about the Devel
mailing list