ESXTOP is a fantastic tool available for the VMware administrator when troubleshooting performance issues in a vSphere Environment. ESXTOP has a somewhat steep learning curve, but it is all worth it. In this post I want to help you get a head start with ESXTOP. If you want a really good read I recommend Duncan’s very comprehensive post on the same subject here
ESXTOP is available in two ways. Either through the ESXi Shell or through the vSphere Management Assistant with the command RESXTOP. In this article I will focus on ESXTOP from the ESXi shell. It is very simple to get access to ESXTOP.
Step 1: Get access to the ESXi Shell. This is done by opening your vSphere Client, go to host, configuration, security profile and start the ESXi Shell service on a specific ESXi host.
Step 2: Download putty (or another SSH client) and create a SSH connection on port 22 to your ESXi host. Login with root and your password.
Step 3: Type the command esxtop and hit return
Step 4: You are now looking at ESXTOP it should look similar to this:
What you are looking at is the CPU screen in ESXTOP and you are now looking for CPU specific counters. You can browse around through different pages. If you type M you will see memory metrics. N for network etc. If you type H you will see all available commands. By default ESXTOP shows a lot of “worlds” a world is similar to a process in windows task manager. To sort it out and not show “vmkernel worlds” you type lower case v. By doing this you only see the virtual machines running on this specific ESXi host.
Now you are inside ESXTOP so lets focus on some good counters to use for performance troubleshooting.
When troubleshooting CPU performance for your virtual machines the following counters are the most important.
%USED, %RDY, %CSTP
%USED tells you how much time did the virtual machine spend executing CPU cycles on the physical CPU.
%RDY is a Key Performance Indicator! Always start with this one. This one defines how much time your virtual machine wanted to execute CPU cycles but could not get access to the physical CPU. It tells you how much time did you spend in a “queue”. I normally expect this value to be better than 5% (this equals 1000ms in the vCenter Performance Graphs read about it here)
%CSTP tells you how much time a virtual machine is waiting for a virtual machine with multiple vCPU to catch up. If this number is higher than 3% you should consider lowering the amount of vCPU in your virtual machine.
When troubleshooting memory performance this is the counters you want to focus on from a virtual machine perspective.
MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s
MCTL? This column is either YES or NO. If Yes it means that the balloon driver is installed. The Balloon driver is automatically installed with VMware tools and should be in every virtual machine. If it says No in this column then figure out why.
MCTLSZ The column show you how inflated the balloon is in the virtual machine. If it says 500MB it translates to the balloon driver inside the guest operating system has “stolen” 500MB from Windows/Linux etc. You would expect to see a value of 0 (zero) in this column
SWCUR tells you how much memory the virtual machine has in the .vswp file. If you see a number of 500MB here it means that 500MB is from the swap file. This does not necessarily equals to bad performance. To figure out if you virtual machine is suffering from hypervisor swapping you need to look at the next two counters. In a healthy environment you would want this value to på 0 (zero)
SWR/s This value tells you the Read activity to your swap file. If you see a number here, then your virtual machine is suffering from hypervisor swapping.
SWW/s This value tells you the Write activity to your swap file. You want to see the number 0 (zero) here. Every number above 0 is BAD.
If you have made it this far I suggest you to look at the following document that details ALL of the counters in ESXTOP. I call it the ESXTOP Bible 🙂