[PATCH] Maintain a metadata copy outside the index (was Re: Datastore & backup - request for help)

Tomeu Vizoso tomeu at tomeuvizoso.net
Wed May 21 14:14:47 EDT 2008


Hi,

On Wed, May 21, 2008 at 1:37 PM, Martin Langhoff
<martin.langhoff at gmail.com> wrote:
>
> Any idea if someone can lend a hand with the DS issues I mentioned in
> my opening post? To recap:
>
>  - Add a "dump all metadata to a file" mechanism in
> datastore/xapianindex.py that is fast. It could be one file per
> document, that wouldn't bother me in the least. As long as the
> resulting format is a JSON dump of a reasonable datastructure, I'm a
> happy camper.
>
>  - Sort out the story with pause()/unpause(). The functions in
> datastore.py are meant to "support backup", but I think they are
> broken. Reading through the implementation, they call stop() on the
> backends, which in the case of Xapian, means that the datastore is
> dead in the water while paused, and normal usage will fail.

the patch attached maintains a copy of the metadata of each object
outside the xapian index. How it works:

- at every create and update, a json file is created next to the object's file,

- it's also deleted along the object,

- at startup, if the file <datastore_path>/.metadata.exported doesn't
exist, check how many objects need to get their metadata exported
(0.8s for 3000 entries)

- in an idle callback, process each of those objects one per iteration
(3ms per entry with simplejson, 2ms with cjson).

In my tests this has worked quite well, but I have one concern: can
something bad happen if we have 20k files in the same dir (for a
journal with 10k entries)?

One side effect of this is that when (if) we agree on a new on-disk
data structure for the DS, it will be easier to convert than if we had
to extract all the metadata from the index.

Regards,

Tomeu



More information about the Devel mailing list