The server had an uptime of about 50 days before this occurred.  There were no problems and nothing has changed in the 2 or so days since this problem began.  Like had said previously, it seems to have occurred since reflashing and re-registering a student&#39;s XO, but I believe that to be a coincidence.  <br>

<br>&gt; - Are you perhaps using an AP that does its own DHCP? One way to<br>


&gt; check for certain is to connect an XO, and then grep /var/lib/dhcpd/<br>


&gt; (or is it /var/spool/dhcpd/ ?) for the MAC address of the XO....<br><br>We are using 5 wireless AP&#39;s.  4 of which are Linksys WRT54G&#39;s running DD-WRT and one is a D-Link modem/AP combo.  DHCP is deactivated on all of the above.<br>

<br><br>

&gt; - Did you also leave XOs running connected to it, or were XOs<br>

&gt; completely disconnected?<br>

<br>

I believe all XO&#39;s were disconnected.  It is possible some were left connected while in their charging cabinets, but doubtful.<br><br>&gt;Is there anything else that could be odd or non-standard in your<br>

&gt;setup? Are you in a VM? Is eth0 on the XS configured via dhcp with a<br>

&gt;short lease? Is there anything in the network between the XOs and the<br>

&gt;XS?<br><br>Nothing non-standard really.  eth0 is fixed.  Although, this server came pre-installed from the folks involved with the Give One Get One program in Rwanda.  I&#39;m not sure what was modified from the stock server install.  I am debating reinstalling the server from scratch.<br>

<br>I haven&#39;t been paying as much attention to the server lately as I should.  As it had been running for about 50 days, I only checked in with the school periodically.  There were problems but mainly in relation to the presence service and reliably connecting 30 - 100 laptops to the network at one time.  I attribute this behavior to the Linksys AP&#39;s as they only seem to handle about 20 connections per AP reliably.  There is also a good amount of wireless interference to contend with; however, the server was working well.  As it is a bit under-powered, load averages generally stay within the 1.2-1.5 range.<br>

<br>As I write this, the server has an uptime of about 9 hours.  Load averages have reached 25 across the board.  The dump files have consumed over a gig of space filling up the root partition.  <br><br>&gt;while true; do (echo `date -u `; vmstat; ps_mem.py | grep ejabberd;<br>


&gt;ejabberdctl connected-users | wc-l) &gt;&gt; mylog ; sleep 60 ; done;<br><br>Tried the script at night with the high load, and it cannot complete as the ejabberd node has since crashed.  ejabberdctl yields the following error:<br>

<br>_________________________________________________________________________<br>RPC failed on the node ejabberd@schoolserver: {&#39;EXIT&#39;,<br>                                               {badarg,<br>                                                [{ets,lookup,<br>

                                                  [hooks,<br>                                                   {ejabberd_ctl_process,<br>                                                    global}]},<br>                                                 {ejabberd_hooks,run_fold,4},<br>

                                                 {ejabberd_ctl,process,1},<br>                                                 {rpc,<br>                                                  &#39;-handle_call/3-fun-0-&#39;,<br>

                                                  5}]}}<br>__________________________________________________________________<br><br>Individually issuing the commands:<br># vmstat<br>Thu Dec 17 20:07:19 UTC 2009<br>procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------<br>

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st<br>25  0 705768  63912 123132 239040   53   92   153   711 1089  539 61 38  0  1  0<br><br># ps_mem.py | grep ejabberd<br><br>No output<br><br>

I&#39;ve included a screenshot of htop for your viewing pleasure.  <br><br><a href="http://omploader.org/vMzBvZQ/htop_screen.jpg">http://omploader.org/vMzBvZQ/htop_screen.jpg</a><br><br>I&#39;ll give you more relevant info tomorrow.<br>

<br><br><div class="gmail_quote">On Thu, Dec 17, 2009 at 12:16 PM, Martin Langhoff <span dir="ltr">&lt;<a href="mailto:martin.langhoff@gmail.com">martin.langhoff@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Thu, Dec 17, 2009 at 1:12 PM, Martin Langhoff<br>

&lt;<a href="mailto:martin.langhoff@gmail.com">martin.langhoff@gmail.com</a>&gt; wrote<br>

<div class="im">&gt; On Thu, Dec 17, 2009 at 11:35 AM, Devon Connolly &lt;<a href="mailto:devcon@gmail.com">devcon@gmail.com</a>&gt; wrote:<br>

&gt;&gt; XS Version: 0.6<br>

&gt;&gt; 1 GB Physical Ram, 2GB Swap<br>

&gt;<br>

&gt; Ok - the RAM is on the low side for an XS but should handle 150 ok.<br>

&gt;<br>

&gt;&gt; # ejabberdctl connected-users<br>

&gt; ...<br>

&gt; I counted 12 lines in the output of connected-users. That should not<br>

&gt; cause trouble.<br>

<br>

</div>Also - can you get your hands on ps_mem.py, and run it when the<br>

machine is getting into trouble? I want to correlate the output of<br>

ps_mem.py for ejabberd vs the number of connected users, run something<br>

like this on a console<br>

<br>

while true; do (echo `date -u `; vmstat; ps_mem.py | grep ejabberd;<br>

ejabberdctl connected-users | wc-l) &gt;&gt; mylog ; sleep 60 ; done;<br>

<br>

untested, may need tweaking to work properly. If you run it during the<br>

day and also during the night, will be most interesting.<br>

<div><div></div><div class="h5"><br>

cheers,<br>

<br>

<br>

m<br>

--<br>

 <a href="mailto:martin.langhoff@gmail.com">martin.langhoff@gmail.com</a><br>

 <a href="mailto:martin@laptop.org">martin@laptop.org</a> -- School Server Architect<br>

 - ask interesting questions<br>

 - don&#39;t get distracted with shiny stuff  - working code first<br>

 - <a href="http://wiki.laptop.org/go/User:Martinlanghoff" target="_blank">http://wiki.laptop.org/go/User:Martinlanghoff</a><br>

</div></div></blockquote></div><br>