« Videos from CloudStack Collaboration Conference 2013. | Main | Prototype Ops Dashboard for CloudStack »
Thursday
Sep052013

A Heat Map for CloudStack Host Status

As part of my ongoing work around visualizing the status of a large CloudStack environment, I've been developing a heat map visualization for current host status. Here's what I've come up with so far:

I'm making use of d3.js, a fantastic library that helps make it easy to bind data to page DOMs (in this case, SVG elements). You can view the relevant code on github to see the implementation details.

In the above example, there is only one pod. Clusters (there are 8 in the example) are represented by the rows of cells, and each cell represents a host.

While it was interesting to get to know d3.js better, the more interesting challenge has been in coming up with a good model for what information would cause various colors to appear. Do I focus on one capacity variable like memory? Should I just focus on host status and resource state attributes? After considering the options, I arrived at a bit of a hybrid decision about how to best approach the problem.

First, I created a view that combines a number of relevant metrics to determine the right color for each host. The primary decision point for color is the host's "resourcestate" attribute, where resource states like "Unmanaged" or "Maintenance" would grey out the host's cell. Resource states like "Error" turn the host cell red. If the resource state is "Enabled", I move on to checking the status of the host. Similar to resource state, status values that represent transitional and down conditions are immediately reflected as either gray or red.

Getting through the resource state and status checks, a healthy and "Up" host then has it's CPU and Allocated Memory capacity metrics checked. Whichever is at a higher percentage filled is then used to indicate the capacity of the host. The colors are a green -> orange -> red linear scale, with 0% being green, 50% being orange, 100% being red, and things inbetween scaled accordingly between the two closest set points.

One other thing I decided to do was to be able to select specific capacity metrics (ignoring the "which is greater" check for the overview visualization. Ex:

I'm still prototyping here, so obviously expect more testing and changes over time, but I've added this visualization to my zone dashboard:

A current issue I have with the scheme above, is that it's hard to tell the difference between a happy host that just happens to be filling up, and a host with an error. I'll have to solve that by selecting different colors perhaps.

The other issue I have is dealing with scale (there's that challenge again).  I haven't tested this out yet, but I suspect that a dashboard version of this heatmap would actually elevate the cells from being hosts to clusters (or even pods). Dynamically deciding what level of granularity to display would probably make sense.