[Server-devel] Ejabberd CPU/RAM Spike -> Crashes
Martin Langhoff
martin.langhoff at gmail.com
Thu Dec 17 19:59:52 EST 2009
On Thu, Dec 17, 2009 at 9:32 PM, Devon Connolly <devcon at gmail.com> wrote:
> The server had an uptime of about 50 days before this occurred. There were
> no problems and nothing has changed in the 2 or so days since this problem
> began. Like had said previously, it seems to have occurred since reflashing
> and re-registering a student's XO, but I believe that to be a coincidence.
Hmmm, maybe something's gone wonky on the mnesia DB.
> We are using 5 wireless AP's. 4 of which are Linksys WRT54G's running
> DD-WRT and one is a D-Link modem/AP combo. DHCP is deactivated on all of
> the above.
Good.
>> - Did you also leave XOs running connected to it, or were XOs
>> completely disconnected?
>
> I believe all XO's were disconnected. It is possible some were left
> connected while in their charging cabinets, but doubtful.
Ok. Then ejabberd is getting messedup all on its own...
> Nothing non-standard really. eth0 is fixed.
good
> Although, this server came
> pre-installed from the folks involved with the Give One Get One program in
> Rwanda. I'm not sure what was modified from the stock server install. I am
> debating reinstalling the server from scratch.
Don't reinstall. If possible, let's try to debug this. If you're going
to give up, just
1 - Backup /var/lib/ejabberd -- just tar it up
2 - Use the 'domain_config' script to change the domain -- this will
re-generate the ejabberd mnesia database. What I'd do: change it to
'foo.com' and then back to the right domain.
> I attribute this behavior to the Linksys AP's as they only seem to
> handle about 20 connections per AP reliably.
yeah. we've seen that plenty.
> There is also a good amount of
> wireless interference to contend with; however, the server was working
> well.
I assume you have the different APs in different channels, and
generally avoid channel 1 (as that's where XOs engage in 'mesh' by
default...)...
>>while true; do (echo `date -u `; vmstat; ps_mem.py | grep ejabberd;
>>ejabberdctl connected-users | wc-l) >> mylog ; sleep 60 ; done;
>
> Tried the script at night with the high load, and it cannot complete as the
> ejabberd node has since crashed. ejabberdctl yields the following error:
Can you restart ejabberd and try that script?
> # ps_mem.py | grep ejabberd
>
> No output
Did you download ps_mem.py, and make it executable? (google the name
if needed) If so, you might want to grep for erl instead.
> I've included a screenshot of htop for your viewing pleasure.
> http://omploader.org/vMzBvZQ/htop_screen.jpg
ejbabberd sure looks busy there...
m
--
martin.langhoff at gmail.com
martin at laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
More information about the Server-devel
mailing list