
The virtual machine running the LUV server was killed by the kernel OOM 4 hours ago. I didn't immediately notice because the VM running my Jabber server (which notifies me of system problems) was also killed). When we had the last problem I converted the virtual machines from Xen to KVM. With KVM the VMs are regular Linux processes and they share the same memory as regular processes. So if another process allocates too much RAM then it may cause KVM memory allocation to fail. Also if the entire system runs out of RAM the kernel may stupidly decide to kill the KVM instance instead of something else. I think that part of the problem was that BOINC was configured to use up to 90% of system RAM. That was an OK setting for a Xen server where the Dom0 had nothing of note running other than BOINC and the virtual machines had RAM reserved. When running KVM this wasn't a suitable setting. I've configured BOINC to only use 40% of RAM and increased swap size. This shouldn't happen again. Also I'm going to move the Jabber server to the Dom0 so that if the DomUs die then I can still get alerts. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/