[PATCH] Maintain a metadata copy outside the index (was Re: Datastore & backup - request for help)

Thu May 22 04:53:38 EDT 2008

On Thu, May 22, 2008 at 5:45 AM, Jameson Chema Quinn
<jquinn at cs.oberlin.edu> wrote:
> Yay, I am happy about this patch (when there is a patch :)
>>
>> > - at every create and update, a json file is created next to the
>> > object's file,
>
> I definitely think it should be in the same directory as the object file,
> with a related name. It might even be worth using the macintosh ._name
> naming convention.
>
> (Note that when we have directories as bundles, bundle-level metadata can
> live in a ._. file. If all bundles had some kind of manifest, then any
> subfiles which are used separately could grow their own metadata in
> ._subfile ; as long as that file were not in the manifest, it would not be
> packed up when exporting the bundle to foreign storage.)

In the proposed solution, the storage dir is still not very friendly
to being browsed, because of the lack of titles-as-filenames and flat
structure without search and filtering. AFAIK, these conventions in
OSX are oriented towards improving the life of people using normal
tools to browse the actual files.

>> > - it's also deleted along the object,
>> >
>> > - at startup, if the file <datastore_path>/.metadata.exported doesn't
>> > exist, check how many objects need to get their metadata exported
>> > (0.8s for 3000 entries)
>>
>> That's pretty good.
>>
>> > - in an idle callback, process each of those objects one per iteration
>> > (3ms per entry with simplejson, 2ms with cjson).
>>
>> Exporting a few 100 per iteration probably is more efficient ;-)
>
> This brings up the issue of TamTam imperfect timing - it would be great if
> there were some way to turn off all unnecessary background CPU use for cases
> like TamTam. If so, I'd say 12*3ms is about the right size for a background
> click every second or two.

Remember that this export process will happen only on first boot after
upgrade and will hopefully last little.

>> > In my tests this has worked quite well, but I have one concern: can
>> > something bad happen if we have 20k files in the same dir (for a
>> > journal with 10k entries)?
>>
>> Ok, we can split it into a subdir (which will only have 10K files then).
>>
>> If there's a cost to large dirs in jfffs2 then we can use hashed dirs,
>> and that change will be needed for both the main datastore storage
>> _and_ the metadata files.
>
> +1

But using a hashed dir will make browsing the actual files more
cumbersome to the occasional observer.

Thanks,

Tomeu