Logging Without Limits™ | Datadog

Logging without Limits™


Published: July 12, 2018
00:00:00
00:00:00

Connecting the dots

Hi, everyone.

It’s great to be here today.

So earlier this year, we introduced log management to Datadog, uniting it with the two other pillars of observability: metrics and APM.

It was built to deliver a seamless experience for devops engineers, so you can collect, via the same Datadog Agent, over servers, containers, and cloud services with Datadog’s integrations that are now logs-aware.

But there was a bigger vision associated with doing this processing right, and this was about connecting the dots.

Indeed, we were then able to build a user interface that makes it easy for you to pivot from metrics and traces to the relevant logs, so everyone gets more efficient during incidents.

So as illustrated here, from any monitoring dashboard widget now, you are now able to jump to the most relevant logs and start troubleshooting and investigating to the Log Explorer.

This Log Explorer has been carefully designed, sticking to our simple, but not simplistic philosophy.

Easily understood by users, but extremely sophisticated, if necessary.

This helps to boost user adoption and overall efficiency in organizations.

While we continue to add many new features over these, I’m here to talk about one that everyone finds particularly painful in traditional log management.

Problems with traditional log management

We can all agree to say, that logs are key.

Everyone uses them on a daily basis.

They allow you to troubleshoot, debug in real time, but more generally, they represent the history of your company operations.

So you can rely on them for any kind of audit—security, usage, technical, you name it.

However, usually we cannot log as much as we would like.

And there are a few reasons for that.

One, large quantity of logs, as all the services, servers, frontend clients, and IoTs are logging massively.

With large variation in volumes, because of the seasonality of your business, some of your new applications, or of course, the incidents you might have.

But probably more importantly, their value varies over sources, audience, and time.

If you belong to a security team, a support team, or development team, you are going to analyze and interpret different logs in different manners depending on the value situations that you are going to face.

So generally, we observe that everyone has the same question.

How do I choose which logs to collect, and which ones to drop?

And filtering at the server level is a real pain in the neck.

Not to mention that someone is always going to lose value at some point.

Logging solutions today don’t try to understand the struggle.

You need to provision and pay for everything based on a daily volume, which requires cumbersome cost control, constantly asking teams to reduce logs.

We face the same issue ourselves using logs.

And we decided that it was too big of a problem to leave it alone.

We wanted to rethink how we ingest and serve logs.

A new approach to log management

At Datadog, we always say that collecting data is easy.

When not having it in difficult moments, is extremely expensive.

It should be true for logs as well.

So today, I am very excited to share Datadog’s new approach to Logging without Limits™.

Logging without Limits™ means that you no longer have to choose what to collect and what to ignore.

A hundred gigabytes, a terabyte, 10s of terabytes per day, everything is ingested and processed.

And we let you decide afterwards, what you do with it.

That has been made possible, because we decoupled ingestion and indexing.

So you can ingest at 10 cents per gigabyte, designed to be so affordable, that you can now send all your logs.

And then you can selectively index and retain your logs with surgical precision.

You get all the troubleshooting and analytics that you need for your daily operations.

But that’s not all.

We give you two additional key features over this overall ingested data.

First, is archives to the storage of your choice.

We have put a lot of work in building archive mechanisms able to handle these volumes. So all the organization history is now stored.

And you can always get back to it, when you need it the most.

The second one is a Live Tail. Whether you choose to index your logs or not, you can still observe everything that is happening in real time.

No need to ssh connect to your hosts anymore, to observe a deployment, a user clicking somewhere, tail your new application, it really depends on your needs.

Logging without Limits™ in action

Let me now illustrate this on a quick demo.

So I’m a Java developer, and I want to see what I archive, and what I index.

So I’m gonna go into the pipelines view here, and I’m gonna wrap up the processing part of it and go straight to the index.

So here I can see two exclusion filters.

And the first one is filtering out the debug logs.

Okay.

So let’s say now I have an incident.

Everybody is stressing out…is stressed out.

I just have to click here, that’s super easy.

And I start indexing the debug logs again, so I have the maximum visibility that I need to troubleshoot.

Now if I click again, everything is back to normal.

Let’s now focus to the second one.

I have a web application server here.

I have tons of requests hitting it.

And it doesn’t bring a lot of value to me.

So instead of filtering everything, I’m going to actually keep 10%.

Why?

Because I want to keep the trends.

I want to understand what’s happening, but I don’t want to see everything.

And that’s how I do it.

So I say that you can do everything that you want with the ingested logs, and you decide later.

So here, I decided to archive everything to the S3 bucket of my choice and the directory.

And that’s how I do it.

Live Tail all your logs in real time

So now I’m gonna move into the Live Tail to see all the ingested logs in real time.

Let’s do it.

Here it is.

I have tons of them.

So I’m going to filter.

As you can see, I have less of them, and I’m going to post that stream to look at it.

So as we can see here, the logs are very rich, and I’m going to show it.

Why?

Because it’s coming after the processing, after the pipelines.

So if I click on one log here, I can now see everything, the tags, the attributes that I have extracted and parsed, the logger name, the server name, and I have the same kind of actions, that I have in the Log Explorer.

So now if I want to, for instance, focus only on the GET method, on the request here, that’s how I will do it.

And then the Live Tail starts again, and I can see it.

I have tons of api/health here that doesn’t bring a lot of value, so I’m going to remove it, like this.

And I have some again, so I’m going to remove it again.

Okay?

And now what is very interesting is that, thanks to this very powerful filtering capability of the stream, I now have something that I can analyze.

And that’s only the remaining GET methods here.

That’s it.

Start Logging without Limits™

So I’m very happy today to say that Logging without Limits™ is generally available.

So if you go on user interface, you gonna see it.

With that you’re gonna ingest, process, and Live Tail all of your logs.

You’re gonna archive to the storage of your choice.

And you’re gonna dynamically choose what to retain and what to index.

And so now, as you can imagine, we just opened the gates.

And I can tell you that we’re already working on new ways for you to leverage these new capabilities.

And it’s just getting started.