[sugar] python hash function limited to specific table size?
Dan Williams
dcbw at redhat.com
Thu Mar 15 22:29:28 EDT 2007
On Thu, 2007-03-15 at 20:11 +0100, Bert Freudenberg wrote:
> On Mar 15, 2007, at 19:24 , Mike C. Fletcher wrote:
>
> > Dan Williams wrote:
> >> Hmm; right. I was thinking that would reduce the spread/
> >> effectiveness
> >> of the hash algorithm, rather than allowing the algorithm to
> >> understand
> >> the size required and possibly incorporate that in a better
> >> manner. I
> >> appear to be overthinking. Bad Dan.
> >>
> > You already got here, but just to amplify your conclusion:
> >
> > Python's dictionaries are extremely well optimised for most
> > reasonable hash table sizes (i.e. below millions, and they're even
> > reasonable at those sizes in my experience). Python uses
> > dictionaries *everywhere* internally, so they are given a lot of
> > attention by python-dev. Trying to code around them, e.g. by
> > computing hashes yourself, is almost always going to slow you down
> > compared to the highly optimised C code implementing the
> > dictionaries.
> >
> > That is, if you have strings and want to index by them, just use
> > the
> > strings as keys and move onto the next task. That isn't to say the
> > dictionary implementation is perfect, but it's pretty hard to beat
> > *in Python* without some rare and significant application-specific
> > restrictions.
> >
> > The same largely holds true for lists. There are cases where you
> > might want to use a different implementation, but most of the time
> > you want to assume "use a list" and only replace with a different
> > implementation if you discover a problem with the list-based
> > implementation.
>
> Well, what I read between Dan's lines was that he does not actually
> have a dictionary, but wants to select one of 500 values based on a
> string hash. Another use would be, for example, to make a seed for a
> random generator that then would make a repeatable sequence, perhaps
> to assign a color for a nickname.
Almost exactly right. We have 520 color pairs that look good in both
B&W and Color modes. And we have wireless access points with an SSID
and capabilities which never change from boot to boot.
So, I want to assign the _same_ color pair to the hash of (SSID+caps)
every single time I see that wireless network. hash(ssid+str(caps))
works, except that's within the range [-sys.maxint, sys.maxint], which
is waaay out of the range of [0, 519] which I can use to index into the
color table. As Bert suggested, hash(ssid+str(caps)) % len(color_table)
works more or less.
Dan
More information about the Sugar
mailing list