Entries in ruby (3)

Thursday
Sep052013

A Heat Map for CloudStack Host Status

As part of my ongoing work around visualizing the status of a large CloudStack environment, I've been developing a heat map visualization for current host status. Here's what I've come up with so far:

I'm making use of d3.js, a fantastic library that helps make it easy to bind data to page DOMs (in this case, SVG elements). You can view the relevant code on github to see the implementation details.

In the above example, there is only one pod. Clusters (there are 8 in the example) are represented by the rows of cells, and each cell represents a host.

While it was interesting to get to know d3.js better, the more interesting challenge has been in coming up with a good model for what information would cause various colors to appear. Do I focus on one capacity variable like memory? Should I just focus on host status and resource state attributes? After considering the options, I arrived at a bit of a hybrid decision about how to best approach the problem.

First, I created a view that combines a number of relevant metrics to determine the right color for each host. The primary decision point for color is the host's "resourcestate" attribute, where resource states like "Unmanaged" or "Maintenance" would grey out the host's cell. Resource states like "Error" turn the host cell red. If the resource state is "Enabled", I move on to checking the status of the host. Similar to resource state, status values that represent transitional and down conditions are immediately reflected as either gray or red.

Getting through the resource state and status checks, a healthy and "Up" host then has it's CPU and Allocated Memory capacity metrics checked. Whichever is at a higher percentage filled is then used to indicate the capacity of the host. The colors are a green -> orange -> red linear scale, with 0% being green, 50% being orange, 100% being red, and things inbetween scaled accordingly between the two closest set points.

One other thing I decided to do was to be able to select specific capacity metrics (ignoring the "which is greater" check for the overview visualization. Ex:

I'm still prototyping here, so obviously expect more testing and changes over time, but I've added this visualization to my zone dashboard:

A current issue I have with the scheme above, is that it's hard to tell the difference between a happy host that just happens to be filling up, and a host with an error. I'll have to solve that by selecting different colors perhaps.

The other issue I have is dealing with scale (there's that challenge again).  I haven't tested this out yet, but I suspect that a dashboard version of this heatmap would actually elevate the cells from being hosts to clusters (or even pods). Dynamically deciding what level of granularity to display would probably make sense.

Thursday
Aug292013

Prototype Ops Dashboard for CloudStack

I've been thinking about the Apache CloudStack UI's experience for operators, and although it's great in small environments, I believe that there could be significant improvement in the way that operators review and manage larger clouds. The layout of the native CloudStack UI is more focused on a graphical experience that helps explain the internal structure to the viewer. With a larger environment, the operator is (most likely) less interested in visually seeing relationships between the various infrastructure bits. What they want, is a dashboard view into their capacity (all available aspects), with drilldowns. They also need a much cleaner way to review the event stream that's generated by the system. To those ends, I started working on a project to create a prototype that might be one way to meet these needs. As with most of my postings these days, this is a fairly incomplete project at the moment so YMMV quite a bit.

If I get the viewing of data to the point where I'm happy, the next phase would be to consider how a large scale operator might want to perform and / or track changes to the platform itself. That's a long ways off though.

The Screenshots

The initial screenshot below will eventually be the zone listing page, who's goals should be to easily view the capacity and status of each zone. It really needs to be multi-region aware, but it's not yet.

Drilling into an example zone, the zone properties and current capacity are the first elements visible. I'm not quite sure that the properties should be as prominant, but that's what I started with. Below these bits, are trends for the capacity (go figure, trending helps in doing capacity planning!).

I've also got a very raw view into the event stream from the management server started as well:

Technical Summary

I'll readily admit that I picked Ruby and Ruby on Rails purely because I wanted to get some hands on experience with Rails (and to get a little better at core Ruby). The prototype code is on github in my cs-operator-dashboard repo.

The project is broken up into three parts:

 

  1. cs-operator-dash: contains the Rails UI
  2. cs_eventconsumer: a gem that can run a daemon process to pull CloudStack events out of a RabbitMQ broker and push them into MongoDB
  3. cs_capacityretriever: a gem that can run a daemon process to periodically pull various capacity information from a CloudStack management server via my cloudstack_ruby_client gem, again pushing that data into MongoDB

 

Obviously this requires a running CloudStack management server (configured with the RabbitMQ event handler). It also requires MongoDB to store the data from the 2 daemon processes.

The basic concept is that the cs-operator-dash Rails app is largely read-only, and works with data being populated into MongoDB via the two daemon processes. cs-operator-dash uses the Mongoid ORM to map the appropriate models to the collections being created by the back-end processes.

One of the main benefits to using a "collector > datastore < UI" architecture is that it allows the data to be stored as a historical record (that's how I get the trend lines).

You can dig through the code if you're interested, and feel free to poke me with pull requests and / or github issues if you want to see something else. Remember though...  it's just a prototype. ;-)