Monitor DNS logs for network and security analysis

Nicholas Thomson

The Domain Name System (DNS) translates domain names (e.g., datadoghq.com) into IP addresses via a process called DNS resolution. This translation facilitates all kinds of network communication, from enabling web browsers to connect to a desired page without requiring users to remember IP addresses, to internal communication across private infrastructure, such as Kubernetes environments. Because of containers' ephemeral lifecycles, Kubernetes regularly creates new IP addresses with each new deployment or autoscaling event. DNS can map a desired connection to a new IP address if the host's IP address changes.

DNS logs can give you critical insights into your network health, such as whether timeouts are occurring during the DNS resolution process and whether an authoritative server is returning a valid response. Additionally, DNS logs can help you stay ahead of malicious actors by providing data that can aid security investigations, such as unusual patterns or spikes in requests, inconsistencies between requested domains and resolved IP addresses, suspicious domains, long query lengths, and more.

In this post, we'll show you:

The anatomy of a DNS log
How to extract insights from DNS logs
How to identify and mitigate DNS attacks
How to troubleshoot network health issues with DNS logs
How to monitor DNS logs with Datadog

Understand the anatomy of a DNS log

DNS resolution is a multistep process. When a client machine sends a request (e.g., an attempt to connect to a server via a domain name), the request is picked up by a DNS resolver—a server that runs queries between client machines and the other DNS servers. The DNS resolver then requests the domain name from the DNS root server, which in turn responds with the appropriate top-level domain (TLD) server, which maintains information for all domain names with the same domain extension (e.g., .com, .net, etc.). The DNS resolver then sends a request to the appropriate TLD server, which will then respond with the appropriate authoritative nameserver (a server that stores the DNS records that map domain names to IP addresses). The authoritative nameserver will send the IP address of the target server to the DNS resolver (or it will throw an error if it can't find the IP address). Once the DNS resolver sends the IP address back to the client, the client can connect to the target website or application.

A DNS log will contain a wealth of information about the request, the response, and all of the intermediary steps along the way. The table below lays out important fields and their values in a DNS log:

Field	Type	Description
`ts`	time	Timestamp
`proto`	enum	The transport that was used for the DNS request (TCP or UDP). This field will always be UDP unless the response goes over 512 bytes. There are legitimate and illegitimate reasons why you'd see this. Legitimate: zone transfer. Illegitimate: DDoS attack.
`trans_id`	count	ID internal to DNS server (like a sequence number in a TCP connection). This is useful for troubleshooting a DNS server because you can use this field to identify the server in question and gather all logs pertaining to it.
`rtt`	interval	Total time elapsed between the DNS query being sent and the answer being received.
`query`	string	The domain name whose underlying IP address is being asked for.
`qtype_name`	string	Descriptive name for the query type.
`rcode`	count	The response after a query. Most values will be 0 (no error) if a query successfully returns an answer. In the event that there is an error, there are a number of values indicating what type of error has occurred (e.g., 1: Format Error, 2: Server Failure, 3: Nonexistent Domain, etc.).
`AA`	boolean	Authoritative answer, meaning the response came back from an authoritative server.
`TC`	boolean	The response has been truncated (if it's more than 512 bytes and needs to be handled by TCP).
`RD/RA`	boolean	In DNS, a recursive request is one that receives an immediate response, while an iterative request is one that may be passed on to other servers until it reaches a server that has the answer. A DNS request to a domain outside of your local system generally iterates through a number of servers to get the appropriate response. Recursion desired (RD) indicates that the client intended to make a recursive query. Recursion available (RA) indicates that the target server supported recursive queries.
`Z`	count	Reserved for headers and protocols. The DNS Security Extensions (DNSSEC) protocol often occupies parts of the z value. It will usually be 0 if you don't have DNSSEC.
`answers`	vector	Usually names and IP addresses, although some vendors use DNS to exchange information, so the answer field can on rare occasions contain messages.

Extract insights from DNS logs

Because DNS logs contain a wealth of information, it's important to know what you're looking for so you can quickly and easily extract the relevant information from a DNS log when troubleshooting an issue. Below, we've outlined some signals that you should take note of when monitoring your DNS logs.

Elevated rtt values can be a sign of network connectivity issues. For example, if you notice a spike in timeout errors that correlates with elevated rtt values in your DNS logs, you might infer that the timeout errors are occurring during the DNS resolution process, which would suggest that there is a problem with your DNS server.
The queried domain name (query field) lets you know what was requested. This can provide evidence of a threat if the queried domain name is on a list of malicious domains. Additionally, an excessive number of repeated lookups could be an indicator of malicious activity, such as a DoS attack, where a malicious actor overwhelms a target domain's servers with an abnormally high volume of DNS queries.
rcode:2 indicates a SERVFAIL error. This is a common type of error that arises when DNS cannot get a valid response from an authoritative nameserver. Logs with this value can help you root-cause an issue by determining its source (in this case, the server).
The answers field contains the information requested from DNS. IP addresses—a common value of the answers field—can help administrators locate compromised machines on a local network. On the public internet, these IP addresses can be checked against databases of malicious actors' IP addresses.
qtype_name, which contains the type of record requested, can provide useful context when searching out malicious activity. For example, text (TXT) records are frequently used for [command-and-control (C2) attacks as well as for [DNS tunneling.
A DNS message has numerous request flags (AA, TC, RD/RA, Z) that indicate things like whether the query is recursive, DNSSEC status, and more. These flags can provide important context for the DNS request, such as whether or not the requested DNS record comes from its authoritative nameserver, whether or not the request has iterated through a number of different servers, and whether the data has been modified (as surfaced by DNSSEC in the Z field of your logs).

Identify and mitigate DNS attacks

DNS logs can be an invaluable resource for discovering and addressing security issues in your system. We'll cover a number of common tactics employed by attackers and how you can use DNS logs to root them out.

Browser hijacking occurs when a bad actor redirects users to malicious websites by altering DNS resolutions. To stay ahead of this type of attack, you can monitor DNS logs for unusual patterns or spikes in requests. For example, a large number of requests to unfamiliar or suspicious domains could indicate a browser hijacking attempt.

A spike in requests to unfamiliar domains could be a sign of a browser hijacking attempt.

Additionally, you can identify redirections by monitoring DNS logs for inconsistencies between requested domains and resolved IP addresses.

Another common type of malicious technique is DNS tunneling, where attackers bypass security protocols by sending data disguised in a DNS packet to a malicious server. You can root out DNS tunneling by monitoring DNS logs for suspicious domains (such as ones identified by threat intelligence, or unexpected or unusual TLDs issuing high volumes of requests). Another thing to keep an eye out for is long query lengths. A DNS request’s domain name can only be 253 characters, so to carry out DNS tunneling, an attacker will probably need to send out a lot of malicious DNS requests. This means that an increase in DNS traffic may be a sign that DNS tunneling is occurring. Finally, long TXT values in the request or response can be a sign of DNS tunneling, so you should be on the lookout for them.

Unusual qtype_name record types (e.g. TXT records) can be a sign of Denial of Service (DoS) attacks, which occur when malicious actors prevent legitimate users from accessing a site. This is usually done by flooding that particular site with a multitude of illegitimate information requests. To prevent DoS attacks from breaching your system, you should monitor your logs for a spike in DNS query rate (an indicator of an attacker's attempt to flood DNS servers with a high volume of queries). Additionally, it's a good idea to look out for a spike in repeated queries for the same domain, as this can be a sign of malicious actors flooding a target domain with requests. Finally, you should monitor DNS server response times because elevated response times can be an indication of flooded DNS servers, and thus of DoS attacks.

Command-and-control (C2) is a cyberattack technique typically associated with malware, where an attacker establishes a means to communicate between compromised machines and an attacker-owned C2 server. To stay ahead of C2 attacks, you should monitor your DNS logs for requests to suspicious or newly registered domains, which may look randomly generated. Additionally, you should be on the lookout for long-lived DNS sessions, as these can be a sign of maintained, persistent communication with the C2 server.

Typosquatting attacks involve registering domain names that closely resemble legitimate ones in order to deceive users. You can stay ahead of typosquatting attacks by monitoring your DNS logs for variations of well-known domain names, as well as for requests to newly registered domains.

Troubleshoot network health issues with DNS logs

While DNS logs can be useful for detecting security breaches, they can also be used to diagnose and troubleshoot network performance issues in your system. We'll go over some common issues that can arise with network connectivity, and how you can use DNS logs to troubleshoot.

A timeout occurs when DNS does not respond within the configured timeout period. This likely indicates either that the target DNS server is unreachable or slow, or that there is a network issue. To troubleshoot, you can check the rtt in the DNS logs to determine whether the timeouts are occurring during the DNS resolution process, as this would suggest that there is a problem with the DNS server. Additionally, inconsistencies in DNS response times could indicate spikes in network congestion. Alternatively, you can check to see if your system is experiencing a surge in DNS queries, as this could put a strain on the DNS servers, leading to an increase in timeouts.

SERVFAIL errors occur when the authoritative server isn't returning a valid response, which could be due to misconfiguration or a network connectivity issue. To troubleshoot, first filter for DNS logs with an rcode: 2 (indicating a SERVFAIL error). Then, search this subsection of your DNS logs for common query patterns, such as common domain names, record types, or source IP addresses. These context clues can help you pinpoint the nature of the issue—for example, if the common denominator is the source IP address, you might infer that there is an issue with the DNS server. Based on the insights you're able to surface from your DNS logs, you can evaluate your DNS environment for misconfigurations, zone transfer issues, DNSSEC validation failures, and lack of resources.

Monitor DNS logs with Datadog

The best way to monitor your DNS logs is with an all-encompassing monitoring platform that combines your DNS logs with all the rest of your monitoring data from your distributed system.

Datadog integrates with a number of DNS providers, including Akamai, Cloudflare, Route 53, and Azure DNS. These integrations provide a holistic and complete view into your DNS logs for all the major DNS providers.

Datadog also has a number of products that complement DNS log analysis, such as Logging without Limits™, which enables you to index only the logs that are of value to your investigations (e.g., DNS logs that meet the criteria discussed above). Pattern Inspector enables you to find patterns in your logs that may indicate suspicious activity.

In addition, out-of-the-box dashboards for DNS providers like Cloudflare provide high-level metrics such as response time, DNS query type, and top hostnames.

The out-of-the-box Cloudflare dashboard provides a high-level view of your DNS health.

You can also use Cloud SIEM analysis to troubleshoot security issues surfaced by DNS logs. For instance, you can use threat intelligence to find common attack indicators (suspicious/known malicious IP addresses, malware hashes, etc.). Once you have surfaced these indicators, you can search DNS logs for occurrences of them, then set custom alerts on these indicators and add these alerts to a custom dashboard.

Use Cloud SIEM to troubleshoot security issues surfaced by DNS logs.

Lastly, the DNS view in CNM surfaces monitoring data from all of your DNS servers and managed services in one place, so you can analyze network-wide DNS performance without having to SSH into individual machines.

The DNS view in Datadog CNM allows you to monitor data from all your DNS servers in one place.

DNS is used for the vast majority of network requests, and most technology solutions utilize DNS to communicate between application services, servers, pods, and machines. Thus DNS typically generates a high volume of logs. To optimally manage this high volume at scale, these DNS Logs can be stored and queried using Datadog Flex Logs, which enables you to enrich, parse, and archive 100 percent of your logs while storing only what you choose to.

Use DNS logs to troubleshoot network connectivity and security issues

DNS is integral to network communication, both across the public internet and within private infrastructure. Because of its centrality to network communication, DNS communication will generate a large volume of logs. It's important to familiarize yourself with the components of DNS logs in order to be able to leverage them for troubleshooting network connectivity issues and security issues alike.

In this post, we've shown you how to extract important insights from a DNS log, how to use this information to identify and mitigate DNS attacks, and how Datadog can make monitoring your DNS logs more impactful and efficient, including how Datadog Flex Logs provides a log management solution whose costs won’t multiply when your storage volumes do.

If you're new to Datadog and want to start easily monitoring your DNS logs from a single pane of glass, you can sign up for a 14-day free trial.

Monitor DNS logs for network and security analysis

Understand the anatomy of a DNS log

Extract insights from DNS logs

Identify and mitigate DNS attacks

Troubleshoot network health issues with DNS logs

Monitor DNS logs with Datadog

Use DNS logs to troubleshoot network connectivity and security issues

Related Articles

How we built reliable log delivery to thousands of unpredictable endpoints

How we scaled fast, reliable configuration distribution to thousands of workload containers

Explore your data with Sheets, DDSQL Editor, and Notebooks for advanced analysis in Datadog

Search your historical logs more efficiently with Datadog Archive Search

Start monitoring your metrics in minutes

Get Started with Datadog

Understand the anatomy of a DNS log

Extract insights from DNS logs

Identify and mitigate DNS attacks

Troubleshoot network health issues with DNS logs

Monitor DNS logs with Datadog

Use DNS logs to troubleshoot network connectivity and security issues

Related Articles

How we built reliable log delivery to thousands of unpredictable endpoints

How we scaled fast, reliable configuration distribution to thousands of workload containers

Explore your data with Sheets, DDSQL Editor, and Notebooks for advanced analysis in Datadog

Search your historical logs more efficiently with Datadog Archive Search

Related jobs at Datadog

We're always looking for talented people to collaborate with

Start monitoring your metrics in minutes