[Server-devel] [PATCH] ds_backup - major rework

Michael Stone michael at laptop.org
Mon Jun 16 17:27:39 EDT 2008


On Mon, Jun 16, 2008 at 01:46:25PM -0400, martin.langhoff at gmail.com wrote:
> From: Martin Langhoff <martin at laptop.org>
> +    # setup datastore connection
> +    bus = dbus.SessionBus()
> +    obj = bus.get_object(DS_DBUS_SERVICE, DS_DBUS_PATH)
> +    datastore = dbus.Interface(obj, DS_DBUS_INTERFACE)

Might it be appropriate to cache the resulting DS D-Bus object/interface
for later use?

> +    # name the backup file
> +    # and open a tmpfile in the same dir
> +    # to ensure an atomic replace
> +    md_path = os.path.join(ds_path,
> +                           'metadata.json')
> +    (md_fd, md_tmppath) = tempfile.mkstemp(suffix='.json',
> +                                           prefix='.metadata-',
> +                                           dir=ds_path)

Where is the resulting file, which lives on NAND, deleted?



> +
> +    # preview contains binary data we
> +    # don't actually want...
> +    drop_properties = ['preview']
> +
> +    query = {}

Why do we want to ditch previews? Just because we can?


>      entries, count = datastore.find(query, [], byte_arrays=True)
>      print 'Writing metadata and file indexes for %d entries.' % len(entries)
>      for entry in entries:
> -        for prop in external_properties:
> +        for prop in drop_properties:
>              if prop in entry:
>                  del entry[prop]
...
> +        md_fh.write(json.write(_sanitize_dbus_dict(entry))+'\n')
> +    md_fh.close()
>  
...
> +    os.rename(md_tmppath, md_path)

Are we expecting concurrent writers here? If so, then we need to be
aware that our cleanup_stale_metadata is racing with them. Writers who
lose the race with the winning writer's cleanup attempt will die when
their rename fails as their source path was unlinked by the winner.
Writers who lose the race with the winning writer but who beat its
cleanup attempt will overwrite the metadata to be transmittted. Is this
correct?

> +def cleanup_stale_metadata(ds_path):
> +    files = glob.glob(os.path.join(ds_path, '.metadata-*.json'))
> +    for file in files:
> +        os.unlink(file)
> +    return files.count;

Are we concerned about the cleanup code failing, for example because
permissions were changed on the ds_path dir? If it permanently failed
then I think our backup script would also permanently fail, which
doesn't seem quite right. (If something's broken with my permissions,
then I _want_ my backup script to take what data it can get and move it
off the system, right?)


> +        % (keyfile, user)
> +    rsync = """/usr/bin/rsync -az --partial --timeout=160 -e '%s' '%s' '%s' """ % \
> +            (ssh, from_path, to_path)
> +    print rsync
> +    rsync_p = popen2.Popen3(rsync, True)
> +
> +    # here we could track progress with a
> +    # for line in pipe:
> +    # (an earlier version had it)
> +
> +    # TODO: wait returns a 16-bit int, we want the lower
> +    # byte of that.
> +    rsync_exit = rsync_p.wait()
> +    if rsync_exit != 0:
> +        raise TransferError('rsync error code %s, message:' % rsync_exit, rsync_p.childerr.read())

Same comments about subprocess vs. os.Popen. Why are we removing the
progress-tracking code? Does it not work?

Michael


More information about the Server-devel mailing list