[PATCH] Maintain a metadata copy outside the index (was Re: Datastore & backup - request for help)

Thu May 22 04:47:31 EDT 2008

On Thu, May 22, 2008 at 1:13 AM, Martin Langhoff
<martin.langhoff at gmail.com> wrote:
> On Thu, May 22, 2008 at 6:14 AM, Tomeu Vizoso <tomeu at tomeuvizoso.net> wrote:
>> the patch attached maintains a copy of the metadata of each object
>> outside the xapian index. How it works:
>
> Fantastic. Except that... erm... arhm... you forgot the patch ;-)

Ouch.

>> - at every create and update, a json file is created next to the object's file,
>>
>> - it's also deleted along the object,
>>
>> - at startup, if the file <datastore_path>/.metadata.exported doesn't
>> exist, check how many objects need to get their metadata exported
>> (0.8s for 3000 entries)
>
> That's pretty good.
>
>> - in an idle callback, process each of those objects one per iteration
>> (3ms per entry with simplejson, 2ms with cjson).
>
> Exporting a few 100 per iteration probably is more efficient ;-)

Yes, but it needs to be? We are balancing here the speed at which
metadata will get exported _the first time after the update to 8.2_
with usability of Sugar during that limited period of time.

Anyway, feel free to try and play with the amount of entries exported
at each go, the code allows for this quite easily.

>> In my tests this has worked quite well, but I have one concern: can
>> something bad happen if we have 20k files in the same dir (for a
>> journal with 10k entries)?
>
> Ok, we can split it into a subdir (which will only have 10K files then).

Yes, this is the easiest option. I would like to hear from Dave if
there could be any problem with 10k files in the same dir.

> If there's a cost to large dirs in jfffs2 then we can use hashed dirs,
> and that change will be needed for both the main datastore storage
> _and_ the metadata files.

Yes, that's the approach I used in my DS rewrite, but I would prefer
to leave this change for a future release if it's possible.

>> One side effect of this is that when (if) we agree on a new on-disk
>> data structure for the DS, it will be easier to convert than if we had
>> to extract all the metadata from the index.
>
> Yes. And as you said earlier, easy recovery if xapian goes to la-la land.

Yeah, I'm wondering if metadata should be retrieved from the json file
instead of from the index, that may give us a performance improvement,
as well as increased robustness.

Thanks,

Tomeu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Maintain-a-metadata-copy-outside-the-index.patch
Type: text/x-patch
Size: 7336 bytes
Desc: not available
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080522/a6b5c1fe/attachment.bin>