I am running VMWare Server 2.0.2 (Build 203138) on a dual core Intel i5 with Ubuntu Server 10.04 LTS system (kernel
2.6.32-22-server #33-Ubuntu SMP
). Disk Subsystem is a software RAID5 array.The system has been set up for a little over a week. For the past 5 days I have been running at leat 3 VMs (Linux and a variety of Windows OSes) with no issues whatsoever. But while I was installing Linux onto a new VM, suddenly all VMs became unresponsive, including the one I was installing to. I could not log in to the VMWare Management Interface, and the system was somewhat unresponsive via SSH. When I looked at
top
, I saw:top - 16:14:51 up 6 days, 1:49, 8 users, load average: 24.29, 24.33 17.54 Tasks: 203 total, 7 running, 195 sleeping, 0 stopped, 1 zombie Cpu(s): 0.2%us, 25.6%sy, 0.0%ni, 74.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8056656k total, 5927580k used, 2129076k free, 20320k buffers Swap: 7811064k total, 240216k used, 7570848k free, 5045884k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21549 root 39 19 0 0 0 Z 100 0.0 15:02.44 [vmware-vmx] <defunct> 2115 root 20 0 0 0 0 S 1 0.0 170:32.08 [vmware-rtc] 2231 root 21 1 1494m 126m 100m S 1 1.6 892:58.05 /usr/lib/vmware/bin/vmware-vmx -# product=2; 2280 jnet 20 0 19320 1164 800 R 0 0.0 30:04.55 top 12236 root 20 0 833m 41m 34m S 0 0.5 88:34.24 /usr/lib/vmware/bin/vmware-vmx -# product=2; 1 root 20 0 23704 1476 920 S 0 0.0 0:00.80 /sbin/init 2 root 20 0 0 0 0 S 0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0 0.0 0:00.00 [migration/0] 4 root 20 0 0 0 0 S 0 0.0 0:00.84 [ksoftirqd/0] 5 root RT 0 0 0 0 S 0 0.0 0:00.00 [watchdog/0] 6 root RT 0 0 0 0 S 0 0.0 0:00.00 [migration/1]
The VMWare process for the virtual machine I was installing into became a zombie. Yet, it was still consuming 100% of the CPU time on one of the cores, and I couldn’t reach it or any other virtual machines. (I was logged in to one virtual machine over SSH, another via X11, and a third via VNC. All three connections died). When I ran
ps -ef
and similar commands, I found that the defunctvmware-vmx
process had it’s parent PID set toinit
(1). I also usedlsof -p 21549
and found that the defunct process had no open files. Yet it was using 100% of CPU time…I was unable to kill any
vmware-vmx
processes, including the defunct one, even withkill -9
. As a last resort to resolve the situation I tried to reboot the box, howevershutdown
,halt
,reboot
, andinit 6
all failed to reboot/shutdown, even when given appropriate--force
settings. ControlAltDel produced a message about rebooting on the console, but the system would not reboot. I had to hard power-cycle the box to resolve the situation. (See my other question, Should I worry about the integrity of my linux software RAID5 after a crash or kernel panic?)What would cause a scenario like this? What else could I have done to resolve it besides a hard reboot? What can I do to prevent such a situation in the future?
Answer
Checkout this VMWare forum post and see if it helps:
http://communities.vmware.com/message/531884#531884
Disabling memory sharing is a good idea in general if you have the RAM.
I’ve compiled some optimisations I use for VMWare Server 2 on Ubuntu here:
http://www.stress-free.co.nz/vmware_server_20_optimisations
I have never experienced the problem you have described and I am running production servers with Ubuntu Server 8.04LTS and 10.04LTS (both 32bit and 64bit).
Attribution
Source : Link , Question Author : Josh , Answer Author : David Harrison