[Localization] Omitting format specifiers in plural form translations

Khaled Hosny khaledhosny at eglug.org
Thu Aug 7 17:02:12 EDT 2008


On Thu, Aug 07, 2008 at 03:23:12PM -0400, Alexander Dupuy wrote:
> However, in a string like the following "There are %d files in the %s  
> directory" having a plural form translation "There are a pair of files  
> in the %s directory" is likely to cause an application written in C to  
> crash.  Using the positional format "There are a pair of files in the  
> %2$s directory" might work in some cases, but I would not want to depend  
> on it, since the printf documentation says:

Yes, we already encountered this in Arabic and we didn't figure a good
workaround yet (I usually put %Id in brackets after the plural, like
There are a pair (%1$d) of files....), I'll try the %.0s trick (it did
work with simple printf on my system).


>> There may be no gaps in the numbers of arguments specified using '$';  
>> for example, if arguments 1and 3 are specified, argument 2 must also  
>> be specified somewhere in the format string.
>
> I don't see any great solution for these sorts of strings; hopefully,  
> they are rare, and in the few cases where they occur, the trick that you  
> came up with for Python could be used, and might work on at least some  
> systems.

It isn't that rare in Arabic translation actually, I think we've 10s of
strings like this.

>> I think this a bug in python's gettext implementation, since this is
>> allowed in C.
>>   
>
> The issue here is not with gettext itself - either in Python or in C,  
> gettext does not interpret or replace the %d format specifier - in C,  
> the substitution of %d is done by a call to one of the printf functions;  
> in Python, the substitution is performed by the % string formatting  
> operator.

Yes, I just realized that.

>>
>> I tried %.d which I supposed it would suppress printing the number, but
>> it made no difference, however %.s does the trick. Now I'm wondering how
>> bad is that since msgfmt -c gives "fatal errors" but python didn't
>> complain so far.
>>   
>
> Python is much more flexible than C when it comes to implicit type  
> conversion, so it's quite reasonable to use %.s to print a zero-width  
> representation of a number.  I would suggest using %.0s to make it more  
> explicit that this is what you are doing and that it is intentional.   

OK, I'm going to fix the translations to use %.0s instead.

> It's also probably not a great idea to use %.0s for localizing C  
> applications (although it works on my Fedora 7 system) since some  
> implementations of printf may cause an application crash when formatting  
> a numeric value as if it were a string (even if it is zero-width).

I see, I've to do more testing for this.

> These changes would allow %.0s to be used as a placeholder when omitting  
> format specifiers in plural form translations for Python applications,  
> without triggering undesired errors from the msgfmt and  
> translate-toolkit checking.

Right.

Thanks very much for your informative replies.

Regards,
 Khaled

>
> @alex
> -- 
> mailto:alex.dupuy at mac.com

-- 
 Khaled Hosny
 Arabic localizer and member of Arabeyes.org team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.laptop.org/pipermail/localization/attachments/20080808/3881f787/attachment.pgp 


More information about the Localization mailing list