Dash conference! July 11-12, NYC

OpenStack Dashboard

What is OpenStack?

OpenStack is an open-source cloud-computing software platform. It is primarily deployed as infrastructure-as-a-service and can be likened to a version of Amazon Web Services that can be hosted anywhere. Originally developed as a joint project between Rackspace and NASA, OpenStack is about five years old and has a large number of high-profile corporate supporters, including Google, Hewlett-Packard, Comcast, IBM, and Intel.

Nova

The core of the OpenStack project lies in the Compute module, known as Nova. Nova is responsible for the provisioning and management of virtual machines. It features full support for KVM and QEMU out of the box, with partial support for other hypervisors including VMWare, Xen, and Hyper-V.

OpenStack overview dashboard

Here are some of the things you’ll want to see in any OpenStack dashboard. If you’re a Datadog user, your OpenStack metrics will automatically populate an out-of-the-box dashboard in your Datadog account called “OpenStack - Overview” like in the screenshot below. If you’re not a current user, you can still follow along and craft your own dashboard with these useful metrics.
OpenStack dashboard

Nova metrics can be logically grouped into four categories:

Hypervisor metrics give a clear view of the work performed by your hypervisors, nova server metrics give you a window into your virtual machine instances, tenant metrics provide detailed information about user resource usage (including quotas), and finally, message queue metrics give you performance details about the underlying message-passing pipeline Nova uses to coordinate work.

Here’s a widget-by-widget breakdown of the graphs and query values in this dashboard.

Status counters

Nova, Neutron, and Keystone counters
Nova, Neutron, and Keystone APIs

These counters display the number of running Nova, Neutron, and Keystone API endpoints. Because your number of physical hosts should change infrequently, you can expect these numbers to be static. Changes in these counters point to down API endpoints, which means there is trouble in your deployment.

Hypervisor counter
Hypervisor count

The hypervisor counter reports the number of hypervisors that are up and running. This counter can also be said to reflect the number of Nova nodes running, as each Nova node has one hypervisor. Unexpected changes to this metric point to problems with your Nova cluster.

Nova server metrics

Computing nodes generally constitute the majority of nodes in an OpenStack deployment. The Nova server metrics group provides information on individual instances operating on computation nodes.
HDD read rate by instance
HDD read rate

This timeseries graph reports the average rate of read requests per second per instance. Spikes in this metric indicate that a virtual machine may have low RAM, causing it to thrash the disk with constant memory paging.

Hypervisor metrics

The hypervisor initiates and oversees the operation of virtual machines. Failure of this critical piece of software will cause tenants to experience issues provisioning and performing other operations on their virtual machines, so monitoring the hypervisor is crucial.
Top memory RSS
Top RSS
This toplist displays the current resident set size (RSS) of the nova-compute daemon (VM instance manager), grouped by host aggregate. Although this metric should fluctuate under normal conditions, any dramatic changes should be investigated.
Hypervisor load map
Hypervisor load map

This Host Map represents the system load over the last minute by hypervisor. The darker the color, the higher the load.

Used vs Free disk space
Used vs free disk
This timeseries graph reports the amount of disk space (in gigabytes) currently available for allocation, aggregated by physical host. It is plotted against the amount of disk space in use. Maintaining ample disk space is critical, because the hypervisor will be unable to spawn new virtual machines if there isn’t enough available space.
Current workload by hypervisor
Hypervisor workload
This bar graph tracks hypervisor operations: Build, Snapshot, Migrate, and Resize.
Change in running VMs
Change in VMs
This change graph tracks changes in the number of instances running on each host. Depending on your use case, unexpected changes to this metric should be investigated.
VCPUs used vs available
Available vCPUs
This timeseries graph plots the number of virtual CPUs in use against the maximum number available. Remember, OpenStack allows you to overcommit RAM and CPU resources. This means you can increase the number of resources available to your instances, at the cost of performance.

RabbitMQ metrics

RabbitMQ serves both as a synchronous and asynchronous communications channel for Nova. Failure of this component will disrupt operations across your deployment. Monitoring RabbitMQ is essential if you want the full picture of your OpenStack environment.
queue memory
Memory by queue
This timeseries graph plots the memory usage of RabbitMQ, broken down by queue. Although not often an issue, a significant spike in queue memory could point to a large backlog of unreceived (“ready”) messages, or worse.
Consumer utilization
Queue consumer utilization
This timeseries graph reports on the utilization of each queue, represented as a percentage. Ideally, this metric will be 100 percent for each queue, meaning consumers get messages as quickly as they are published. This metric is only availabile in RabbitMQ 3.3 and greater.
Consumers by queue
Consumers by queue
This toplist represents the current number of consumers per message queue. Your number of consumers should usually be non-zero for a given queue. Zero consumers means that producers are sending out messages into the void. Depending on your RabbitMQ configuration, those messages could be lost forever.

Tenant metrics

Tenant metrics are primarily focused on resource usage. Remember, tenants are just groups of users. In OpenStack, each tenant is allotted a specific amount of resources, subject to a quota. Monitoring these metrics allows you to fully exploit the available resources and can help inform requests for quota increases should the need arise.
Floating IPs used vs max
Floating IPs used
This timeseries graph plots the number of floating IPs used by the tenant against the maximum number of floating IPs allowed.
RAM used vs max
Total RAM used
This timeseries graph plots the number of floating IPs used by the tenant against the maximum number of floating IPs allowed.
Cores used vs max
Cores used by tenant
This timeseries graph plots the current number of cores in use against the maximum number of cores allocated.
Instances used vs max
Instances used
This timeseries graph plots the current number of instances running against the maximum number of instances allowed. Remember, if a tenant is close to nearing their instance limit, they can always resize the instance to a larger one, if other resource quotas permit.

Conclusion

We’ve walked you through a number of metrics which are good indicators of your cloud’s performance and health. If you’d like to see this dashboard for your OpenStack metrics, you can . This dashboard will be populated with your metrics immediately after you enable the OpenStack integration.

For a deep dive on OpenStack Nova metrics and how to monitor them, check out our three-part series on how to monitor and collect OpenStack Nova metrics.

OpenStack Dashboard