[sugar] python hash function limited to specific table size?

Dan Williams dcbw at redhat.com
Thu Mar 15 22:29:28 EDT 2007


On Thu, 2007-03-15 at 20:11 +0100, Bert Freudenberg wrote:
> On Mar 15, 2007, at 19:24 , Mike C. Fletcher wrote:
> 
> > Dan Williams wrote:
> >> Hmm; right.  I was thinking that would reduce the spread/ 
> >> effectiveness
> >> of the hash algorithm, rather than allowing the algorithm to  
> >> understand
> >> the size required and possibly incorporate that in a better  
> >> manner.  I
> >> appear to be overthinking.  Bad Dan.
> >>
> > You already got here, but just to amplify your conclusion:
> >
> >    Python's dictionaries are extremely well optimised for most
> >    reasonable hash table sizes (i.e. below millions, and they're even
> >    reasonable at those sizes in my experience).  Python uses
> >    dictionaries *everywhere* internally, so they are given a lot of
> >    attention by python-dev.  Trying to code around them, e.g. by
> >    computing hashes yourself, is almost always going to slow you down
> >    compared to the highly optimised C code implementing the  
> > dictionaries.
> >
> >    That is, if you have strings and want to index by them, just use  
> > the
> >    strings as keys and move onto the next task.  That isn't to say the
> >    dictionary implementation is perfect, but it's pretty hard to beat
> >    *in Python* without some rare and significant application-specific
> >    restrictions.
> >
> >    The same largely holds true for lists.  There are cases where you
> >    might want to use a different implementation, but most of the time
> >    you want to assume "use a list" and only replace with a different
> >    implementation if you discover a problem with the list-based
> >    implementation.
> 
> Well, what I read between Dan's lines was that he does not actually  
> have a dictionary, but wants to select one of 500 values based on a  
> string hash. Another use would be, for example, to make a seed for a  
> random generator that then would make a repeatable sequence, perhaps  
> to assign a color for a nickname.

Almost exactly right.  We have 520 color pairs that look good in both
B&W and Color modes.  And we have wireless access points with an SSID
and capabilities which never change from boot to boot.

So, I want to assign the _same_ color pair to the hash of (SSID+caps)
every single time I see that wireless network.  hash(ssid+str(caps))
works, except that's within the range [-sys.maxint, sys.maxint], which
is waaay out of the range of [0, 519] which I can use to index into the
color table.  As Bert suggested, hash(ssid+str(caps)) % len(color_table)
works more or less.

Dan




More information about the Sugar mailing list