[PATCH] Maintain a metadata copy outside the index (was Re: Datastore & backup - request for help)
tomeu at tomeuvizoso.net
Wed May 21 14:14:47 EDT 2008
On Wed, May 21, 2008 at 1:37 PM, Martin Langhoff
<martin.langhoff at gmail.com> wrote:
> Any idea if someone can lend a hand with the DS issues I mentioned in
> my opening post? To recap:
> - Add a "dump all metadata to a file" mechanism in
> datastore/xapianindex.py that is fast. It could be one file per
> document, that wouldn't bother me in the least. As long as the
> resulting format is a JSON dump of a reasonable datastructure, I'm a
> happy camper.
> - Sort out the story with pause()/unpause(). The functions in
> datastore.py are meant to "support backup", but I think they are
> broken. Reading through the implementation, they call stop() on the
> backends, which in the case of Xapian, means that the datastore is
> dead in the water while paused, and normal usage will fail.
the patch attached maintains a copy of the metadata of each object
outside the xapian index. How it works:
- at every create and update, a json file is created next to the object's file,
- it's also deleted along the object,
- at startup, if the file <datastore_path>/.metadata.exported doesn't
exist, check how many objects need to get their metadata exported
(0.8s for 3000 entries)
- in an idle callback, process each of those objects one per iteration
(3ms per entry with simplejson, 2ms with cjson).
In my tests this has worked quite well, but I have one concern: can
something bad happen if we have 20k files in the same dir (for a
journal with 10k entries)?
One side effect of this is that when (if) we agree on a new on-disk
data structure for the DS, it will be easier to convert than if we had
to extract all the metadata from the index.
More information about the Devel