[Server-devel] ejabberd's mnesia breaking - (Re: Almost-released: XS-0.5.2)

Martin Langhoff martin.langhoff at gmail.com
Tue Mar 17 23:44:56 EDT 2009


On Wed, Mar 18, 2009 at 3:55 PM, Martin Langhoff
<martin.langhoff at gmail.com> wrote:
> I am mostly happy, but there is a nasty issue with upgrades of
> ejabberd. Investigating. Had not been an issue before, and may be
> related to a big erlang update that slipped in from fedora updates.

Some notes of what seems to be happening

 - On an XS-0.5 that has had ejabberd configured, upgrading to 0.5.2
via anaconda... (I strongly suspect yum driven updates don't see this
problem.)

 - Upgrade log (attached) shows that we upgrade ejabberd-xs, then
erlang, then xs-config

 - When xs-config is upgraded, it re-runs the config "preprocessor" to
expand @@SERVERNAME@@ on any new/updated config files. During that
stage, we touch (but actually don't change) the ejabberd-xs.cfg file,
and calls 'service ejabberd condrestart' -- and right there we see an
erlang kernel crash.

 - *** ejabberd should not be running ***

 - the db in /var/lib/ejabberd/spool looks fairly b0rken

- downgrading erlang in case mnesia format had somehow changed -- the
database is still unreadable (and ejabberd 2.0.x series all have the
same db format, I've been upgrading and downgrading all the time on my
dev boxen without a single problem).

 - downgrade erlang, init a new DB, feed it some data, upgrade erlang:
no DB corruption

The main problem is why is ejabberd running under anaconda? There is a
very good chance that *that* is the src of the corruption.

 - All the %post scripts for the ejabberd-xs pkg and for xs-config
issue 'condrestart' which checks /var/lock/subsys/$svcname to see if
the service is running, and only restarts if it is running. Using
condrestart is the recommended practice in rpm packaging...

 - I checked carefully that we don't erroneously say 'start', and that
condrestart is well implemented in the ejabberd init script...

 - During init, /var/lock/subsys is cleared, so even after a hard
poweroff the init scripts should not be confused about the state of
things. I am right now trying to see if anaconda does the same. This
is the only working theory I have...

Any other suggestions? I don't think this is something I can pin on
ejabberd or erlang. Looks more like a gotcha with anaconda.

cheers,




m
-- 
 martin.langhoff at gmail.com
 martin at laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: upgrade.log
Type: application/octet-stream
Size: 4871 bytes
Desc: not available
Url : http://lists.laptop.org/pipermail/server-devel/attachments/20090318/090d1c81/attachment-0001.obj 


More information about the Server-devel mailing list