[linux-mm-cc] I guess you have been following ksm.

Sun Apr 19 01:12:16 EDT 2009

On Fri, Apr 17, 2009 at 2:45 PM, Nitin Gupta <ngupta at vflare.org> wrote:
> On Fri, Apr 17, 2009 at 4:02 AM, Peter Dolding <oiaohm at gmail.com> wrote:
>> The copy on write system also appears to provide something else
>> interesting.  ksm and compcache are both after allocation.   The
>> interesting question is if Linux kernel should provide a calloc
>> function.   So that on commit its is automatically stacked.  This
>> would massively reduce the numbers of blank matching pages.  Linux
>> system already has something that deals with malloc allowing over
>> commits until accessed.
>>
>
> Not sure if I understand you here. You mean all new allocation should
> be zeroed to play better with KSM?

Not sure either, but it seems similar to my suggestion that we could
use existing techniques to zero garbage. The suggested purpose of
these techniqueswas security, but this would presumably also improve
the compression ratio of compcache.  Apparently they require only ~1%
overhead and we may be able to do even better that this if the goal is
performance rather than security
http://www.usenix.org/events/sec05/tech/full_papers/chow/chow_html/index.html

Unfortunately they have lost the code, so we would have to reimplement
it from scratch.

> For simplicity, code currently in SVN
> decompresses individual objects in a page before writing out to backing
> swap device.

One complexity is that compressed pages could get fragmented. I am not
sure if pages being adjacent on the swap device means that they are
related, but even if not, there would be some book keeping regarding
free space fragmentation.

As an aside, with decent wear leveling, swap on SSD is feasible, but
compressing pages first would seem a good idea to reduce wear of the
SSD device. I was thinking of some algorithms to write out pages to
SSD in a optimized way. One obvious technique would be to write to
pages in a round robin fashion, consolidating free space as we go.
This would theoretically lead to perfect wear leveling, although
skipping over sectors that have little free space would seem a good
idea. However most PCs have SSD devices that do their own wear
leveling. I am not sure what the best strategy for these devices would
be.

However it does seem to me that as SSD devices become more common SSD
optimised swap would be useful, as despite the obvious disavantages
SSD does have the advantage of fast random reads so swapping to SSD is
less likely to kill performance than to HDD. Modern SSD drives can
survive years of properly wear levelled writes and although SSD is
reasonably expensive per MB, a particular machine may well happen to
have substantial free SSD space but limited memory. Perhaps I should
find some SSD related people to ask about SSD optimized writes?

> For duplicate page removal COW has lower space/time overhead than
> compressing all individual copies with ramzswap approach. But this

If we wanted, we could keep only a single copy of duplicated pages in
compcache. Since we compress the pages anyway, we may be able to
assume that the non-duplcated pages are fairly random, allowing us to
implement a hash table with minor overhead. This may be worthwhile if
we have many VMs of the same OS.

However maybe it would be better to just run compcache and KSM
together and let each one handle its own strengths. (Hopefully this
would also mean less work for Nitin :)

> virtual block device approach has big advantages - we need not patch
> kernel and keeps things simple. Going forward, we can better take
> advantage of stacked block devices to provide compressed disk based
> swapping, generic R/W compressed caches for arbitrary block devices
> (compressed cached /tmpfs comes to mind) etc.

I understand tmpfs will swap out unused pages to disk, so we already
have a compressed cached tmpfs of sorts. I can see a number of
advantages to an explicitly cc'd tmpfs, e.g. the option of larger
blocks with a better compression ratio though, and smarter decisions
as to when to compress pages. However the current set up has the
advantage that it is very simple, as compcache doesn't have to worry
about any of this, and presumably tmpfs is optimized to be very fast
when it does not need to be swapped out. I am not sure if a block
device can hold onto pages the kernel hands to it without needing to
memcpy (mostly because I know very little about the  Linux kernel
internals).

-- 
John C. McCabe-Dansted
PhD Student
University of Western Australia