[Server-devel] mnesia corruption with concurrent ejabberdctl usage

Martin Langhoff martin.langhoff at gmail.com
Mon Dec 28 06:26:08 EST 2009


Hi ejabberd list,

here at OLPC, we are sync'ing a user/group database held externally
into the mnesia database. The external DB is 'master', so it reads the
mnesia DB (calling ejabberdctl "srg" commands).

On a particularly loaded/slow host, we ended up seeing really bad
mnesia corruption, leading to huge beam processes (~400MB, while
serving ~50 idle users). All of this with a tiny userbase (registered
users).

This was due to the sync script running slow (due to the load) and the
next run overlapping with the first run. Added a big lock around the
script to prevent overlapping, and things went back to normal.

Testing experimentally with a couple of xterms open, calling
srg-list-groups or srg-get-info while groups are being manipulated
with srg-user-add/srg-user-del spews a ton of errors.

Is this normal? Expected? My reading of the Programming Erlang book
led me to believe mnesia ops would be sanely concurrent...

This is on:

 - Fedora 9
 - ejabberd - 2.0.3 (with a couple of minor patches, none in the
mnesia manipulation codepaths)
 - erlang-R12B-5.6.fc9.i386

cheers,


m
-- 
 martin.langhoff at gmail.com
 martin at laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff


More information about the Server-devel mailing list