The server had an uptime of about 50 days before this occurred. There were no problems and nothing has changed in the 2 or so days since this problem began. Like had said previously, it seems to have occurred since reflashing and re-registering a student's XO, but I believe that to be a coincidence. <br>
<br>> - Are you perhaps using an AP that does its own DHCP? One way to<br>
> check for certain is to connect an XO, and then grep /var/lib/dhcpd/<br>
> (or is it /var/spool/dhcpd/ ?) for the MAC address of the XO....<br><br>We are using 5 wireless AP's. 4 of which are Linksys WRT54G's running DD-WRT and one is a D-Link modem/AP combo. DHCP is deactivated on all of the above.<br>
<br><br>
> - Did you also leave XOs running connected to it, or were XOs<br>
> completely disconnected?<br>
<br>
I believe all XO's were disconnected. It is possible some were left connected while in their charging cabinets, but doubtful.<br><br>>Is there anything else that could be odd or non-standard in your<br>
>setup? Are you in a VM? Is eth0 on the XS configured via dhcp with a<br>
>short lease? Is there anything in the network between the XOs and the<br>
>XS?<br><br>Nothing non-standard really. eth0 is fixed. Although, this server came pre-installed from the folks involved with the Give One Get One program in Rwanda. I'm not sure what was modified from the stock server install. I am debating reinstalling the server from scratch.<br>
<br>I haven't been paying as much attention to the server lately as I should. As it had been running for about 50 days, I only checked in with the school periodically. There were problems but mainly in relation to the presence service and reliably connecting 30 - 100 laptops to the network at one time. I attribute this behavior to the Linksys AP's as they only seem to handle about 20 connections per AP reliably. There is also a good amount of wireless interference to contend with; however, the server was working well. As it is a bit under-powered, load averages generally stay within the 1.2-1.5 range.<br>
<br>As I write this, the server has an uptime of about 9 hours. Load averages have reached 25 across the board. The dump files have consumed over a gig of space filling up the root partition. <br><br>>while true; do (echo `date -u `; vmstat; ps_mem.py | grep ejabberd;<br>
>ejabberdctl connected-users | wc-l) >> mylog ; sleep 60 ; done;<br><br>Tried the script at night with the high load, and it cannot complete as the ejabberd node has since crashed. ejabberdctl yields the following error:<br>
<br>_________________________________________________________________________<br>RPC failed on the node ejabberd@schoolserver: {'EXIT',<br> {badarg,<br> [{ets,lookup,<br>
[hooks,<br> {ejabberd_ctl_process,<br> global}]},<br> {ejabberd_hooks,run_fold,4},<br>
{ejabberd_ctl,process,1},<br> {rpc,<br> '-handle_call/3-fun-0-',<br>
5}]}}<br>__________________________________________________________________<br><br>Individually issuing the commands:<br># vmstat<br>Thu Dec 17 20:07:19 UTC 2009<br>procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------<br>
r b swpd free buff cache si so bi bo in cs us sy id wa st<br>25 0 705768 63912 123132 239040 53 92 153 711 1089 539 61 38 0 1 0<br><br># ps_mem.py | grep ejabberd<br><br>No output<br><br>
I've included a screenshot of htop for your viewing pleasure. <br><br><a href="http://omploader.org/vMzBvZQ/htop_screen.jpg">http://omploader.org/vMzBvZQ/htop_screen.jpg</a><br><br>I'll give you more relevant info tomorrow.<br>
<br><br><div class="gmail_quote">On Thu, Dec 17, 2009 at 12:16 PM, Martin Langhoff <span dir="ltr"><<a href="mailto:martin.langhoff@gmail.com">martin.langhoff@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Thu, Dec 17, 2009 at 1:12 PM, Martin Langhoff<br>
<<a href="mailto:martin.langhoff@gmail.com">martin.langhoff@gmail.com</a>> wrote<br>
<div class="im">> On Thu, Dec 17, 2009 at 11:35 AM, Devon Connolly <<a href="mailto:devcon@gmail.com">devcon@gmail.com</a>> wrote:<br>
>> XS Version: 0.6<br>
>> 1 GB Physical Ram, 2GB Swap<br>
><br>
> Ok - the RAM is on the low side for an XS but should handle 150 ok.<br>
><br>
>> # ejabberdctl connected-users<br>
> ...<br>
> I counted 12 lines in the output of connected-users. That should not<br>
> cause trouble.<br>
<br>
</div>Also - can you get your hands on ps_mem.py, and run it when the<br>
machine is getting into trouble? I want to correlate the output of<br>
ps_mem.py for ejabberd vs the number of connected users, run something<br>
like this on a console<br>
<br>
while true; do (echo `date -u `; vmstat; ps_mem.py | grep ejabberd;<br>
ejabberdctl connected-users | wc-l) >> mylog ; sleep 60 ; done;<br>
<br>
untested, may need tweaking to work properly. If you run it during the<br>
day and also during the night, will be most interesting.<br>
<div><div></div><div class="h5"><br>
cheers,<br>
<br>
<br>
m<br>
--<br>
<a href="mailto:martin.langhoff@gmail.com">martin.langhoff@gmail.com</a><br>
<a href="mailto:martin@laptop.org">martin@laptop.org</a> -- School Server Architect<br>
- ask interesting questions<br>
- don't get distracted with shiny stuff - working code first<br>
- <a href="http://wiki.laptop.org/go/User:Martinlanghoff" target="_blank">http://wiki.laptop.org/go/User:Martinlanghoff</a><br>
</div></div></blockquote></div><br>