simple datastore replacement, take two

Tomeu Vizoso tomeu at tomeuvizoso.net
Tue Sep 23 16:25:14 EDT 2008


On Tue, Sep 23, 2008 at 8:58 PM, Benjamin M. Schwartz
<bmschwar at fas.harvard.edu> wrote:
> Tomeu Vizoso wrote:
> | 2.- all metadata properties are just strings.
>
> I think this is a good decision (especially since by strings you mean
> "byte arrays").  However, it's not quite true.  Your design actually has
> two classes of metadata properties: strings in "metadata" and files in
> "extra_metadata".

Well, that was supposed to be an implementation detail, users of the
DS only know about properties and they look all the same.

> I like this design, but I think we can make it both simpler and more
> powerful.  Consider, for example, using a single djb-style database.
>
> metadata/
>        author
>        difficulty
>        sessionkey
>        preview
>
> Each item is stored in a file whose name is the key, and whose contents
> are the value.

I would like that much better than the current approach, but I'm
worried about performance. Retrieving the metadata for an entry is
currently the limiting factor in query speed, do you think this could
be faster than decoding a single file with a simple format (see link
below)?

http://dev.laptop.org/git?p=users/tomeu/datastore;a=blob;f=src/olpc/datastore/metadatastore.py;h=5b607be40f8c2c7b080efd02eb759af3cb477e61;hb=HEAD#l98

I was considering rewriting that function in C so we reduce query time
in 20-30%, but if people think that storing each property in its own
file would make sense from a performance POV, I could give that a
quick try.

> The Datastore can then provide two accessor functions:
> get_by_value(key) and get_by_reference(key).  get_by_value() returns the
> contents of the file as a bytestring in memory.  get_by_reference()
> returns the path to the metadata file, or another path linked (soft or
> hard) to that file.  This provides all the needed functionality for large
> and small metadata entries.

That sounds interesting. I guess that with a POSIX-like API
implemented with FUSE we would get equivalent fucntionality?

> If the API requires file-like and string-like metadata to be completely
> distinct, for example to create a dict for the string-like metadata, then
> we can achieve this by using two such databases:

No, no need for that.

> string_metadata/
>        author
>        difficulty
> file_metadata/
>        sessionkey
>        preview
>
> I hope that these designs may allow your datastore to become even simpler.

Much appreciated!

Thanks,

Tomeu



More information about the Devel mailing list