How to Collect and Monitor PostgreSQL Data With Datadog | Datadog

How to collect and monitor PostgreSQL data with Datadog

Author Emily Chang

Last updated: August 26, 2021

If you’ve already read Parts 1 and 2 of this series, you’ve learned about the key metrics to monitor in PostgreSQL, and how to start collecting this data with native and open source tools.

Datadog’s PostgreSQL integration helps you automatically collect PostgreSQL data from the statistics collector, so that you can monitor everything in one place. And, because Datadog integrates with more than 700 other technologies, you’ll be able to correlate metrics from your PostgreSQL servers with other services throughout your environment.

In this post, we’ll walk through the process of installing Datadog on your PostgreSQL servers, so you can visualize database performance in an out-of-the-box screenboard like the one shown below. We’ll also show you how you can leverage Datadog Database Monitoring to get query-level insights into your databases. Last but not least, we’ll discuss how to identify bottlenecks in your code by tracing application requests (including PostgreSQL queries) with Datadog APM.

postgresql data - postgresql dashboard

Detect and optimize slow queries with Datadog Database Monitoring

Datadog’s PostgreSQL integration

Instead of querying PostgreSQL metrics manually through the utilities covered in Part 2 of this series, you can use the Datadog Agent to automatically aggregate these metrics and make them visible in a customizable template dashboard that shows you how these metrics evolve over time.

Install the Datadog Agent

The Datadog Agent is open source software that aggregates and reports metrics from your servers, so that you can graph and alert on them in real time. Installing the Agent usually takes just a single command—to get started, follow the instructions for your platform here.

GRANT the Agent permission to monitor PostgreSQL

Next, you’ll need to give the Agent permission to access statistics from the pg_stat_database view, by following the instructions in our documentation. Basically, you’ll need to log into a psql session as a user who has CREATEROLE privileges, create a datadog user and password, and grant it read access to pg_stat_database:

create user datadog with password '<PASSWORD>';
grant SELECT ON pg_stat_database to datadog;

For security reasons, PostgreSQL restricts certain track_activities statistics to superusers (including information about the queries that other users are currently executing, available via the pg_stat_activity view). If you’re running PostgreSQL version 10.x+, and you would like to give your datadog user access to monitoring-related statistics and configuration settings that are normally reserved for superusers, you can grant it pg_monitor permissions:

grant pg_monitor to datadog;

Run this command on your PostgreSQL server to confirm that the datadog user can access your metrics:

psql -h localhost -U datadog postgres -c \ "select * from pg_stat_database LIMIT(1);" && echo -e "\e[0;32mPostgres connection - OK\e[0m" || \ || echo -e "\e[0;31mCannot connect to Postgres\e[0m"

You’ll be prompted to enter the password you just created for your datadog user; once you’ve done so, you should see the following output: Postgres connection - OK.

Configure the Agent to collect PostgreSQL metrics

After you’ve installed the Agent on each of your PostgreSQL servers, you’ll need to create a configuration file that provides the Agent with the information it needs in order to begin collecting PostgreSQL data. The location of this file varies according to your OS and platform; consult the documentation for more details.

Copy the example config file (postgres.d/conf.yaml.example) and save it as conf.yaml. Now you can customize the config file to provide Datadog with the correct information and any tags you’d like to add to your metrics.

The example below instructs the Agent to access metrics locally through port 5432, using the datadog user and password we just created. You also have the option to add custom tag(s) to your PostgreSQL metrics, and to limit metric collection to specific schemas, if desired.

conf.yaml

init_config:

instances:
  - host: localhost
    port: 5432
    username: datadog
    password: <PASSWORD>
    tags:
      - optional_tag

Save your changes, restart the Agent, and run the info command to verify that the Agent is properly configured. If all is well, you should see a section like this in the resulting output:

conf.yaml

Checks
======

  [...]

    postgres
    -----------------
      - instance #0 [OK]
      - Collected 70 metrics, 0 events & 1 service check

Diving into your PostgreSQL data with dashboards

Now that you’ve integrated Datadog with PostgreSQL, you should see metrics populating an out-of-the-box PostgreSQL screenboard, located in your list of integration dashboards. This screenboard provides an overview of many of the key metrics covered in Part 1 of this series, including locks, index usage, and replication delay. You can also clone and customize it by adding your own custom PostgreSQL metrics. We’ll show you how to set up the Agent to collect custom metrics in the next section.

Quickly reference key metrics and commands in our PostgreSQL monitoring cheatsheet.

Collecting custom PostgreSQL metrics with Datadog

Datadog’s PostgreSQL integration provides you with an option to collect custom metrics that are mapped to specific queries. In the custom_queries section of the Datadog Agent’s example PostgreSQL configuration file, you’ll see some guidelines about the components you’ll need to provide:

  • metric_prefix (required): the prefix to use across every custom metric name (by default, this is postgresql)
  • query (required): the SQL query to run on your database
  • columns (required): an ordered list of every column returned by the query above. Each item in the list should have a name (the custom metric name that will get appended to the metric_prefix) and a type. The type can be the metric type (gauge, count, rate, etc.), or submission method for the queried metric. Alternatively, you can set the type to tag to tag the metric with the data contained in this column.
  • tags (optional): add this if you’d like to tag your custom metrics with additional metadata

For example, you can send a custom query to the pg_stat_activity view to continuously gauge the number of applications connected to each of your backends, broken down by application name and user. Normally you’d query the view with something like:

SELECT
    application_name, usename, COUNT(*) FROM pg_stat_activity
WHERE
    application_name NOT LIKE 'psql' AND (application_name <> '') IS TRUE
GROUP BY
    application_name, usename;

You can set up the Agent to automatically query this data for you on a regular basis, and report the results as a custom metric in Datadog.

Add your query to the custom_queries section of the Datadog Agent’s PostgreSQL configuration file:

postgres.d/conf.yaml

[ ... ]
    custom_queries:
      - metric_prefix: postgresql
        query: SELECT application_name, usename, COUNT(*) FROM pg_stat_activity WHERE application_name NOT LIKE 'psql' AND (application_name <> '') IS TRUE GROUP BY application_name, usename
        columns:
        - name: application_name
          type: tag
        - name: pg_user
          type: tag
        - name: count_by_applications
          type: gauge
        # additional optional tags
        tags:
          - <TAG_KEY>:<TAG_VALUE>

Save and exit the file, and restart the Datadog Agent (find the command for your OS here). We should now be able to see our custom postgresql.count_by_applications metric in Datadog, tagged with the application_name and pg_user.

postgresql dashboard

Consult the documentation and this article to see examples of other custom PostgreSQL metrics you can collect.

Detect and optimize slow queries with Datadog Database Monitoring

Now that you’ve set up the Agent to track high-level PostgreSQL data from your servers, you can dive even deeper and get query-level performance insights with Datadog Database Monitoring (DBM). To get started with DBM, you’ll need to add a few parameters to your PostgreSQL configuration file (postgresql.conf) and grant the Agent additional permissions. Then, enable DBM in your Agent configuration by setting dbm to true, as shown below, and re-starting the Agent for the changes to take effect.

conf.yaml

init_config:
instances:
  - dbm: true
    host: localhost
    port: 5432
    username: datadog
    password: <PASSWORD>
    tags:
      - optional_tag

You can now navigate to the Databases view to track the performance of normalized queries across all of your databases, and identify ones that can be optimized. For each normalized query, DBM shows you detailed explain plans, which list the exact steps taken by the query planner to execute the query, along with their estimated cost. These plans help surface inefficient operations, like sequential scans, which you can avoid by creating indexes on your most frequently accessed columns.

View explain plans for each normalized query in Datadog Database Monitoring

DBM also provides timeseries graphs of normalized query performance metrics, so you can monitor trends over the long term. You can add these graphs to the out-of-the-box PostgreSQL dashboard discussed earlier and correlate them with the rest of your monitoring data in Datadog.

To learn more about DBM, read our dedicated blog post.

Tracing PostgreSQL queries with APM

Distributed tracing is another important aspect of a comprehensive PostgreSQL database monitoring strategy. Datadog APM is bundled in the same lightweight, open source Datadog Agent we installed earlier. With all of your services, hosts, and containers reporting to one unified platform, you’ll be able to view key metrics from your applications in the same place as their underlying infrastructure. You’ll also be able to trace requests as they travel across service boundaries in your environment.

Distributed tracing and APM are designed to work with minimal configuration. For example, the Python tracing client auto-instruments web frameworks like Django and Flask, as well as commonly used libraries like Redis and PostgreSQL. In the following example, we’ll show you how to start tracing a Django app that uses PostgreSQL as its database.

1. Install the Datadog Agent + Python tracing client

First, install the Datadog Agent on your app server, by following the instructions for your OS, as specified here. Now you will need to install the Python tracing client in your environment:

pip install ddtrace

2. Update your Django settings.py file

Add the tracing client’s Django integration to the INSTALLED_APPS section of your Django settings.py file:

settings.py

INSTALLED_APPS = [
[...]
    'ddtrace.contrib.django',
]

You’ll also need to add a DATADOG_TRACE section to your settings.py file, making sure to specify the name of your service, and any tags you’d like to add to your service-level metrics:

settings.py

DATADOG_TRACE = {
    'DEFAULT_SERVICE': '<MY_SERVICE>',
    'TAGS': {'env': 'myenv'},
}

And, if you haven’t done so already, make sure you’ve specified the name of your database and user permissions in the DATABASES section. You also have the option to add the name of your application if desired:

settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': '<YOUR_DB>',
        'USER': os.environ['DATABASE_USER'],
        'PASSWORD': os.environ['DATABASE_PASSWORD'],
        'HOST': 'localhost',
        'PORT': '5432',
        'OPTIONS': {
            'application_name': '<MY_APP>'
        }
    }
}

Since we specified application_name, the Agent will add this in the metadata for each trace it collects and forwards to Datadog. Also, note that DEBUG mode needs to be off in order for Datadog to trace your application.

Save and exit the file. We’re ready to trace!

3. Run the ddtrace command.

You’re just one command away from seeing traces and service-level metrics from your app in Datadog. Add ddtrace-run as a wrapper around the usual command you use to start your app server:

ddtrace-run python manage.py runserver

Inspecting PostgreSQL database queries in Datadog

The Agent will quickly start collecting metrics and traces from your application, and forwarding them to Datadog for visualization and alerting. Navigate to the Datadog APM page, select the environment you specified in the DATADOG_TRACE section of your settings.py file, and click on your service to see a dashboard of key metrics (latency, errors, and hits).

Monitor PostgreSQL data with Datadog: postgresql and django service in Datadog APM dashboard

At the bottom of the dashboard, you can see a list of various endpoints recently accessed throughout your application. In this example, the endpoints map to views in our Django app. We can click into a trace to follow the path of an individual request as it travels across various components of our app, including multiple PostgreSQL database calls. In the flame graph below, we can click into any PostgreSQL span to view the exact query that was executed.

Monitor PostgreSQL with Datadog: flame graph of a Django web app request including PostgreSQL requests

You can also filter through your sampled traces for errors, and start debugging the issue. In the example below, the stack trace shows us that our app is trying to query a nonexistent field in the database.

Monitor PostgreSQL data with Datadog: flame graph of a Django web app request with error

Auto-instrumentation gives you a head start on collecting traces from popular libraries and frameworks, but you can also set up the Agent to collect custom traces by instrumenting and tagging specific spans of your code. The Python tracing client also includes support for distributed tracing, so you can follow the path of each request across different hosts. Read the documentation for more details.

Creating custom dashboards

You can combine APM metrics with infrastructure-wide metrics on any dashboard in order to identify and investigate bottlenecks—simply click on the button in the upper right corner of any APM graph to add it to an existing timeboard.

Monitor PostgreSQL with Datadog: Combining PostgreSQL APM and infrastructure metrics in the same dashboard

Once you create dashboards that combine service-level metrics with infrastructure metrics, you’ll be able to correlate across graphs to help investigate issues. In the dashboard above, it looks like many rows were fetched recently, and requests were spending a higher percentage of time executing PostgreSQL queries. We can investigate further by viewing a trace of a request that occurred around that time:

inspecting postgresql data in a flame graph of a Django web app request containing many PostgreSQL queries

This trace shows us that many different PostgreSQL queries were running sequentially within a single request. We can go even deeper by clicking on each span to see the exact SQL query that was executed, which can help us determine how to reduce the number of database calls and optimize performance.

Alerting

Once you’ve implemented APM and started tracing your applications, you can quickly enable default service-level alerts on latency, error rate, or throughput, or set up custom alerts.

Monitor PostgreSQL data with Datadog: Enable default alerts to monitor your PostgreSQL-powered services

You can also set up automated alerts on any of the PostgreSQL data you’re collecting. For example, you can configure Datadog to automatically notify you if replication delay increases beyond a certain level, or your databases accumulate too many dead rows.

Start monitoring PostgreSQL

In this post, we’ve shown you how to use Datadog to automatically collect, visualize, and alert on PostgreSQL data to ensure the health and performance of your databases. We’ve also walked through an example of how to auto-instrument traces from an application that relies on PostgreSQL. If you’ve followed along, you should now have increased visibility into your PostgreSQL databases, as well as the applications that rely on them.

If you’re new to Datadog, you can start monitoring PostgreSQL with a .

Source Markdown for this post is available on GitHub. Questions, corrections, additions, etc.? Please let us know.