[linux-mm-cc] [PATCH 00/12] Avoid OOM-Killer with Ccache (V2)

IKEDA Munehiro m-ikeda at ds.jp.nec.com
Mon Jul 23 06:02:28 EDT 2007


Hi all,

I posted patches "Avoid OOM-Killer with Ccache" last Friday
(Thursday for you?).  After that, I found 4 bugs in it by some testing.
I can (probably) solve 3 of them, but I cannot solve the rest one.
So the patches are NOT perfect even now, but I re-post updated patches
as "V2" now because I don't want you to be in trouble with solved bugs.

Changes from V1
-----------------

[PATCH 01/12]
Missed necessary code to increment anon_cc_size in
should_add_to_ccache() is added.

[PATCH 07/12]
Judge condition to free page in merge_chunk() is corrected.

[PATCH 09/12]
Simply freeing procedure when PF_MEMALLOC was incorrect.
So fix it.


Known but unsolved bug
----------------------

When using both of fs_backed and anon ccache, BUG occurs like below.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
kernel BUG at
/home/ikeda/develop/ccache/src/linux-2.6.21-ccache/lib/radix-tree.c:447!
invalid opcode: 0000 [#1]
PREEMPT
CPU:    0
EIP:    0060:[<c01ec708>]    Not tainted VLI
EFLAGS: 00000087   (2.6.21-gde68b82c #32)
EIP is at radix_tree_tag_set+0x98/0xa0
eax: c036cfe4   ebx: 00000002   ecx: 00000001   edx: 000b7988
esi: c036cfe0   edi: 00000282   ebp: c0037dcc   esp: c0037db0
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process kswapd0 (pid: 87, ti=c0036000 task=c004aa30 task.ti=c0036000)
Stack: c0d95920 00000001 000b7988 c036cfe4 c0018200 c036cfe0 00000282
c0037de4
       c01407a5 00000000 c0018200 00000001 c0037ec8 c0037e00 c015014a
c0150030
       c0d95920 c0018200 c0018218 c0037ef0 c0037f10 c014341d c0037f00
00000006
Call Trace:
 [<c010482a>] show_trace_log_lvl+0x1a/0x30
 [<c01048e9>] show_stack_log_lvl+0xa9/0xd0
 [<c0104b27>] show_registers+0x217/0x390
 [<c0104dab>] die+0x10b/0x210
 [<c0104f32>] do_trap+0x82/0xb0
 [<c0105877>] do_invalid_op+0x97/0xb0
 [<c02f75b4>] error_code+0x74/0x7c
 [<c01407a5>] test_set_page_writeback+0xc5/0x150
 [<c015014a>] swap_writepage+0x8a/0xd0
 [<c014341d>] shrink_inactive_list+0x6cd/0x9d0
 [<c01437c4>] shrink_zone+0xa4/0x100
 [<c0143d92>] kswapd+0x312/0x420
 [<c012b743>] kthread+0xa3/0xd0
 [<c010441b>] kernel_thread_helper+0x7/0x1c
 =======================
Code: f0 8b 4d e8 8b 42 04 ba 01 00 00 00 83 c1 14 d3 e2 85 c2 75 08 8b
4d f0 09 d0 89 41 04 83 c4 10 89 f8 5b 5e 5f 5d c3 0f 0b eb fe <0f> 0b
eb fe 8d 74 26 00 55 89 e5 57 56 53 83 ec 4c 89 45 b4 89
EIP: [<c01ec708>] radix_tree_tag_set+0x98/0xa0 SS:ESP 0068:c0037db0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Probably I'm missing something to care about radix-tree, but I've not
found the cause yet.
This BUG likely occurs when a process whose memory is swaped-out to
ccache exists.  But I cannot say it's the sole condition so I'll
investigate further.



General description about these patches
(Just same as V1 post)
---------------------------------------

Following patches are intended to avoid OOM-Killer with Ccache.

Current Ccache has no functionality to free fs_backed chunks
unless beeing accessed.  This means compressed page-cache pages
which are used only once will never be freed from Ccache.  As the
result of it, not only Ccache is full of useless chunks, but also
Ccache fails to allocate for decompression and OOM-Killer runs.
This phenomenon is easily reproduced under a condition like
  - mem = 16MB
  - max_fs_backed_ccache = 1024 (== 4MB if PAGE_SIZE==4KB)

by running a script like below.
  #! /bin/sh
  find /var -type f -exec cat {} \; > /dev/null

* Needless to say, whether OOM-Killer will run or not depends on
  how many/what kind of files in /var.
  The more file there are, the more likely OOM-Killer runs.

To solve this problem, I implemented "eat-ccache-allocator" which
frees the oldest fs_backed chunks when fails to allocate.  In
addition to it, I introduced a feature which simply frees chunks
without decompression if the memory reclaim context is accessing
it to try to free it.

This patch set which consists of 12 patches has 3 parts.

- 1st part (01/12 - 04/12)
Clean-up part.
These patches don't change any functionalities of Ccache, but
they are needed to apply follwing patches.

- 2nd part (05/12 - 11/12)
Core functionality part of this patch series.
Features described above are implemented here.
(Last one (11/12) is just for clean-up-after in fact)

- 3rd (12/12)
Only touching the proc entry (/proc/ccache_stats) for convenience.


These patches are for source tree of
linux-2.6.21
+ patch-ccache-alpha-008-2.6.21
+ debug patch I posted on May 9th
  http://lists.laptop.org/pipermail/linux-mm-cc/2007-May/000094.html

Below patch helps you to avoid make error.
(But it's not mandatory)
http://lists.laptop.org/pipermail/linux-mm-cc/2007-May/000093.html


Memorandum in my mind
(Just same as V1 post)
---------------------

Below are remained work related these patches in my mind.

(1)
Simply freeing behavior should be applied code path of process
exiting same as memory reclaiming path.

(2)
Current chunk_head locking mechanism is based on spinlock.
Page locking mechanism in mainline kernel is based on wait-on-bit-
lock.
Should we change the mechanism like mainline kernel?
Probably we should evaluate them and decide which is preferable
because it may affects performance, IMHO.
(I have a trial implementation already in fact.)

(3)
These patches can avoid OOM-Killer caused by that Ccache is full
of fs_backed chunks at least on my system, which is UP x86.
More tests are needed on SMP or architectures other than x86.

(4)
If you "cat /proc/ccache_stats" reading many files, you can
probably see so many pages are freed when Ccache is shrunk,
even though only 1 or 2 pages are needed at the time.
This is because of the fragmentation of chunks, which means
compressed data of a page is stored in many divided chunks
which are located in many pages.
For more efficiency and higher performance, avoiding fragmen-
tation like it is next important target of development, IMHO.


Comments, suggestions, bug reports, evaluation reports, and any
others are welcome.

Happy hacking!


Best regards,
IKEDA, Munehiro


More information about the linux-mm-cc mailing list