Hello everyone!  My name is Brett Impens (a.k.a Jeramiah’s favorite VMware Ninja).  Today I would like to share with you how we pulled the data from vCenter to support our thin provisioning analysis.

There are a few ways to get the data you need to determine how much storage you will reclaim by using thin provisioning in vSphere.  Many of these techniques are well documented on the web and can easily be found with a little bit of searching.  They range anywhere from using Power Shell Scripts, to vCenter Plug-ins, to off the shelf software.  Knowing that Jeramiah is in Excel every day, we decided to use Excel and Microsoft Query to pull the data directly from the vCenter database.  You may be more comfortable using another method, since messing around in the vCenter database on a regular basis is not recommended.

As you read this, you may ask yourself this question.  Everything I need to see is in the VPX_GUEST_DISK table in the vCenter database.  Why wouldn’t I just browse the table and be able to view the free space whenever I want?  You can definitely do that and get the data you need.  In our case, since we are a service provider, we have multiple clusters in every data center and need to filter the data based on data center and ESX host.  We don’t need thin provisioning data for every VM in vCenter.  This query builds relationships between the key columns that allow us to filter VMs based on the data center and ESX host.

Alright, let’s get to it.  In Excel, go to the data tab.  Click on From Other Sources and choose From Microsoft Query.  Now you need to create a new data source.  Give the data source a name and choose the database driver specific to your environment.  Click the connect button and put in the server name that runs your vCenter database.  Enter in your credentials and any other database driver specifics needed to connect to the vCenter database.  Now that you have a new data source in Microsoft Query, go ahead and open it to create your query.  I used the Query Wizard to create my query.  If you are a SQL whiz, you can skip right to the Microsoft Query interface by un-checking the Use Query Wizard box at the bottom of the window.

A list of database tables and the columns included in those tables will show up on the left.  Next, you need to choose the columns that you want in the output from your query.  I chose the following tables and columns.  The tables and columns highlighted below are the key columns used to build the relationships that join multiple tables together.  I left the key columns in the output to verify my table joins were setup correctly before returning the data to excel.


If you like you can filter and sort the data on the next screens.  On the last screen before clicking the Finish button, select View data or edit query in Microsoft Query, then click Finish.  We need to make some modifications to key columns and table joins before returning the data back to Excel.

Microsoft Query attempts to build the table join relationships for you.  Unfortunately it does it wrong for our needs.  Using the Microsoft Query interface you can move the table join relationships around.  To remove a table join, click on the line between the tables and columns and hit the delete key.  To add a table join, click on the column name in one table and drag it over to a column in another table.  Once you have the table joins looking like the picture below, your output will have the data center and host associated with every VM in vCenter, along with the total capacity and free space within the OS.

ScreenHunter_01 Mar. 10 00.02

Before returning the data back to Excel you can see a preview of the results in Microsoft Query.  Check any columns that are labeled ID or VM_ID.  The numbers in those columns should match for each row returned.  Also check the columns that are labeled HOST_ID.  The numbers in those columns should match for each row returned.  If the numbers do not match, then double check your table joins.  Now that everything looks good, close Microsoft Query.  You will be prompted by Excel to choose a location and view for the newly imported data.  Take your pick.  From here on out, it’s typical Excel work.

The process of pulling this data out of vCenter is really not to hard to figure out.  Putting the table joins together in a way that pulls the host and data center of a VM along with disk usage takes a whiteboard and a little bit of trial and error.  Knowing there is a table in the database that has guest OS disk usage is something that only the most curious engineers would find.  Knowing that the VPX_GUEST_DISK table has been in the vCenter database since version 2.0.1 is something only a few people know.


22,733 total views, 15 views today

Tagged with:

Comments are closed.