[sugar] python hash function limited to specific table size?

Bert Freudenberg bert at freudenbergs.de
Thu Mar 15 15:11:52 EDT 2007


On Mar 15, 2007, at 19:24 , Mike C. Fletcher wrote:

> Dan Williams wrote:
>> Hmm; right.  I was thinking that would reduce the spread/ 
>> effectiveness
>> of the hash algorithm, rather than allowing the algorithm to  
>> understand
>> the size required and possibly incorporate that in a better  
>> manner.  I
>> appear to be overthinking.  Bad Dan.
>>
> You already got here, but just to amplify your conclusion:
>
>    Python's dictionaries are extremely well optimised for most
>    reasonable hash table sizes (i.e. below millions, and they're even
>    reasonable at those sizes in my experience).  Python uses
>    dictionaries *everywhere* internally, so they are given a lot of
>    attention by python-dev.  Trying to code around them, e.g. by
>    computing hashes yourself, is almost always going to slow you down
>    compared to the highly optimised C code implementing the  
> dictionaries.
>
>    That is, if you have strings and want to index by them, just use  
> the
>    strings as keys and move onto the next task.  That isn't to say the
>    dictionary implementation is perfect, but it's pretty hard to beat
>    *in Python* without some rare and significant application-specific
>    restrictions.
>
>    The same largely holds true for lists.  There are cases where you
>    might want to use a different implementation, but most of the time
>    you want to assume "use a list" and only replace with a different
>    implementation if you discover a problem with the list-based
>    implementation.

Well, what I read between Dan's lines was that he does not actually  
have a dictionary, but wants to select one of 500 values based on a  
string hash. Another use would be, for example, to make a seed for a  
random generator that then would make a repeatable sequence, perhaps  
to assign a color for a nickname.

- Bert -




More information about the Sugar mailing list