[Server-devel] mnesia corruption with concurrent ejabberdctl usage
Martin Langhoff
martin.langhoff at gmail.com
Mon Dec 28 06:26:08 EST 2009
Hi ejabberd list,
here at OLPC, we are sync'ing a user/group database held externally
into the mnesia database. The external DB is 'master', so it reads the
mnesia DB (calling ejabberdctl "srg" commands).
On a particularly loaded/slow host, we ended up seeing really bad
mnesia corruption, leading to huge beam processes (~400MB, while
serving ~50 idle users). All of this with a tiny userbase (registered
users).
This was due to the sync script running slow (due to the load) and the
next run overlapping with the first run. Added a big lock around the
script to prevent overlapping, and things went back to normal.
Testing experimentally with a couple of xterms open, calling
srg-list-groups or srg-get-info while groups are being manipulated
with srg-user-add/srg-user-del spews a ton of errors.
Is this normal? Expected? My reading of the Programming Erlang book
led me to believe mnesia ops would be sanely concurrent...
This is on:
- Fedora 9
- ejabberd - 2.0.3 (with a couple of minor patches, none in the
mnesia manipulation codepaths)
- erlang-R12B-5.6.fc9.i386
cheers,
m
--
martin.langhoff at gmail.com
martin at laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
More information about the Server-devel
mailing list