Monitor Amazon SQS message traffic with Datadog

Monitor Amazon SQS message traffic with Datadog

/ / /
Published: August 28, 2014

We just released an update to our integration making it easier than ever to monitor Amazon SQS. If you use SQS, you now have better visibility into the health of your message queues, their traffic patterns, and the state of the messages enabling you to identify and fix performance issues.

Monitor Amazon SQS

What is SQS?

Amazon’s Simple Queue Service (SQS) is a scalable, managed message queue in the AWS suite of services. It is well suited for applications that stream analytics. Like any message queue, SQS is best used to decouple services in your application and avoid a domino effect where one overloaded service takes the rest of the application down.

For instance, we use message queues at Datadog to separate metrics and event intake from processing to smooth out spikes in incoming traffic. The key to proper SQS operations is to ensure that queues are both not continuously increasing in length or going unused and starving for messages.

Key SQS metrics to monitor

All Amazon SQS metrics are prefixed with aws.sqs. in Datadog.

The SQS metrics are also tagged by queue name so that you can easily monitor your queues independently of one another (or monitor them as an aggregate).

number_of_messages_sent, number_of_messages_received

A measure of the flow of traffic into and out of your queue in the last five-minute period. If the number of empty receives stays consistently above 0, you may have too many eager consumers. You may want to space out their calls to SQS or run fewer consumers.

Monitor Amazon SQS
As the queue is emptied, consumers find no available messages.

number_of_messages_deleted

The number of messages per period that have been successfully processed and removed from the queue. This is the real measure of your consumers’ throughput.

sent_message_size

The average size of messages that are being posted to a queue. SQS message size affects how much you pay for the service.

approximate_number_of_messages_visible

The size of your active queue; a measure of the queue’s backlog. If this metric grows over time you may need to add more consumers to make a dent in the backlog (or speed up the existing consumers).

approximate_number_of_messages_delayed

The number of messages that are in the delay period and not yet ready to be picked up by consumers.

approximate_number_of_messages_not_visible

The number of messages that are currently being worked on by a consumer. Once the consumer deletes the message it will be removed from this tally.

Monitor Amazon SQS
A backlog of visible messages starts to fall

More information on SQS metrics can be found here.

Immediate insights into your message queues

We’ve created an out-of-the-box Datadog screenboard to help you stay on top of your SQS usage. All of these metrics and the dashboard are available in Datadog when you enable our SQS integration.

Datadog timeboards let you mix and match metrics from AWS SQS and your applications so you can visualize the performance of your queues and message consumption applications all in one place. Below, you can see a side by side comparison of the SQS traffic flow and throughput of an application consuming its messages.

Monitor Amazon SQS
Correlate AWS CloudTrail events with SQS performance

By overlaying AWS CloudTrail events (or other events from Datadog’s + integrations) you can confirm that spinning up a consumer instance when the queue size grows improves the message throughput in your application.

To monitor performance of your AWS SQS message patterns in conjunction with your applications and other AWS services, sign up for a and enable your AWS SQS and other available AWS integrations in Datadog.


Want to write articles like this one? Our team is hiring!