Announce: Compressed cache alpha-007 (kernel 2.6.18)

Nitin Gupta nitingupta.mail at gmail.com
Wed Sep 20 08:53:01 EDT 2006


On 9/20/06, Peter Zijlstra <a.p.zijlstra at chello.nl> wrote:
> On Wed, 2006-09-20 at 14:38 +0530, Nitin Gupta wrote:
> > ------------
> > Project: Compressed Caching for Linux
> > Project Home: http://linuxcompressed.sourceforge.net
> > Git (web): http://dev.laptop.org/git.do?p=projects/linux-mm-cc
> > Git (git): git://dev.laptop.org/projects/linux-mm-cc
> > ------------
> >
> > Hi!
> >
> > This patch has significantly changed locking scheme in
> > lookup functions. This was to solve race problems which
> > surfaced while testing on SMP system.
> >
> > [Patch download]:
> > http://prdownloads.sourceforge.net/linuxcompressed/patch-ccache-alpha-007-2.6.18?download
> >
<snip>

> Wasn't there some effort to stick LZO into the crypto API?
>
I'm not aware of that.

> Also, I think the consensus on including sources like that is to clean
> up the sources as much as possible; eg.
>  #ifdef __cplusplus
> is a no go. Also the LZO source is riddled with ifdeffery.
>
> See the thread here:
>   http://lkml.org/lkml/2006/8/26/96
>

Yes. Cleaning up LZO is in my TODO. Already removed several KLOC of
ifdeffery. It is extreme pain to cleanup and port to kernel space the
original LZO code.


> Perhaps use the LZF crypto API patches from suspend2?
>
I will look into this.


> Could you, for the next version, please split this huge patch into
> pieces, for example like so:
>
>   add LZO compressor
>   add WK compressor
>   add ccache core
>   add bits and pieces to glue ccache into kernel
>
> the LZO/WK code in the current patch is huge and that pretty much
> clutters it for readability.
>
> This too would make commenting easier, since you'd them be able to post
> the patches inline.

Sure. will do this for next patch.

>
> @@ -117,6 +117,7 @@ enum {
>         SWP_ACTIVE      = (SWP_USED | SWP_WRITEOK),
>                                         /* add others here before... */
>         SWP_SCANNING    = (1 << 8),     /* refcount in scan_swap_map */
> +       SWP_COMPRESSED  = (1 << 10),    /* it's compressed cache for anon pages */
>  };
>
> I think you missed the 'add others here before...' comment, you're now
> stomping onto the swap scan counter space.
>
Yes -- missed it :) Will make it (1<<2).

> For ccache.c; could you stick all those globals into a struct? That
> reads nicer.

ok. will group them under struct(s).

> Also, no need to assign 0 to statics.

ok.

>
> +       /*
> +        * This allocation must never sleep and must never fail.
> +        * -- Doing GFP_ATOMIC will guarantee that I doesn't sleep
> +        * but alloc may fail!
> +        * -- Doing GFP_KERNEL giver higher chances that alloc will
> +        * be successfull but it may sleep (and hence doesn't work)!
> +        * -- What to do??
> +        */
> +       comp_page = alloc_page(GFP_ATOMIC);
>
> unsigned long pflags = current->flags;
> current->flags |= PF_MEMALLOC;
> alloc_page(GFP_NOWAIT);
> current->flags = pflags;
>
A dumb question: IIRC 'current' points to currently executing task?
What if 'current' changes between these statements -- any locking
required to prevent this (like disable irq to keep 'current' current)?

> or steal my __GFP_EMERGENCY patch.
>
It's in mainline now? or plan to be mainlined soon?

> +       tmp_page = alloc_pages(GFP_KERNEL, 1);  /* page may expand */
>
> 1 order alloc? No way to not do that? Compression will fail if you don't
> provide enough output space, right? perhaps just discard the page,
> negative savings just don't make sense.
>
Yes. 1 order alloc!! A failure point for ccache. The pages that expand
are already discarded but problem is to provide temporary buffer to
get compressed output in first place. Alternate is to insert size
checks within compressor tight loop -- even this seemed hard in
strange LZO code.

> BTW, what's the deal with these chunks? is that a sort of heap allocator
> on top of the page allocator, and then link enough chunks to find all
> data again?

Yes. Store a single compressed page scattered over many physical pages
and track each of these 'chunks' by keeping their start address and
size (struct chunk).

>
> Can't you pull that out and abstract that? It will be another patch in
> the series, and you might end up with cleaner code.
>
Abstracting this seems good -- will try this.

> Also, merge_chunk() why does it loop, if you look for adjacent chunks on
> free, you can only ever find 1 before and 1 after.
>
It does not loop. It just looks 1 fwd and then 1 backward and then
returns. Will cleanup this function so it becomes clearer.

> Also:
>
> +struct chunk {
> +       void *start_addr;       /* page addr + offset within page
> +                                * where chunk starts */
> +       unsigned short size;    /* size: 12 LSB bits, flags: rest 4 bits */
> +       struct chunk *next;     /* link to next 'related' chunk */
> +       struct list_head chunks;        /* 'master chunk list': every
> +                                        * chunk is linked */
> +};
>
> that unsigned short is a waste, the compiler will align the next field
> to 32 bit anyway (unless you add the packed attribute to the struct) and
> the size of struct chunk would be 20 bytes and struct chunk is
> kmalloc'ed and the smallest slab is 32 bytes, so you waste heaps of
> memory here, just be clear and don't fiddle with bitfields, it doesn't
> gain you anything.
>

So, there's no way to allocate just 20 bytes? I need to keep this
struct as small as possible. 32 bytes will be too much for stuct
chunk! Declaring it as packed and then kmalloc() will still use
32-bytes. right?

> Don't like the bit_spin_trylock/bit_spin_unlock in handle_ccache_fault;
> what's wrong with TestSetPageLocked() and unlock_page() ?
>

If (PageCompressed(page)) is true then 'page' is 'struct chunk_head'
not 'struct page' and you cannot use unlock_page() on chunk_head.

> you have duplicated the radix_tree_loockup in handle_ccache_fault(), you
> can easily unify those two.
>
Yes! I see now. I will do that :)

> PageWillCompress() doesn't look necessary, can't you just add a
> parameter to pageout() instead?
>
hm...PageWillCompress() will now disappear :)

> Why not make ccache a proper address_space and modify do_swap_page()
> that should get rid of a lot of the intrusiveness in the find_get_page()
> family. Of course that would mean finding another bit in the swp_entry_t
> (perhaps you can use some of the page_migration bits and make ccache and
> page migration mutually exclusive).
>

This would make it hard to handle page-cache pages. So, these
intrusive changes in 4-5 lookup functions looked better to me....


Thanks for reviewing the patch!  :)


Cheers,
Nitin



More information about the Devel mailing list