Datadog guide to AWS re:Invent 2018

Jason Yee

AWS re:Invent is an annual gathering of tens of thousands of AWS staff, partners, and users for a full week of keynote sessions, feature announcements, customer case studies, hands-on workshops, and more. As in years past, we will be there with dozens of engineers, ready to answer your monitoring questions and show you the newest additions to Datadog.

Learn more & be inspired

The sponsor hall is a great way to learn more about the newest AWS features and partner products, and the conference sessions can be educational and inspiring. AWS re:Invent hosts hundreds of sessions where you can hear about the challenges other organizations have faced and how they’ve solved them.

With so many sessions available, how do you know which ones to attend? Below, we’ve compiled a list of interesting sessions that we’re excited to watch:

Using deep learning to track wildfires and air quality (AIM329)

ALERTWildfire is a camera-based network infrastructure that captures satellite imagery of wildfires. In this chalk talk, we discuss deep-learning techniques that use this satellite imagery along with meteorological data to track wildfires and predict air quality in real time.

How to refactor a monolith to serverless in 8 steps (API310)

Refactoring a monolith to serverless can be intimidating, but there are discrete steps that you can take to simplify the process. In this chalk talk, we outline eight steps for successfully refactoring your monolith and highlight key decision points such as language and tooling choices. Through real-world examples of successful migrations, we uncover common mistakes, useful techniques for identifying components for migration and service boundaries, and processes for migrating large amounts of data without downtime. Bring your refactoring challenges to this interactive session to see how these techniques can be applied in the context of your own application.

Scaling to billions of requests the serverless way at Capital One (DEM34-S)

Stream processing tools like Apache Spark and Flink are the default choice for big data processing, but these frameworks also come with high development and operation costs. Serverless streaming architecture is an alternative solution that brings significant reduction in these costs and allows developers to focus on business delivery, not infrastructure management. This session explores how Capital One used serverless streaming architecture to provide real-time insights for millions of customers through its intelligent assistant Eno. Learn how high-throughput streaming loads can be handled with ease as well as how message-driven architecture can be implemented using Amazon API Gateway, AWS Lambda, and Amazon Kinesis for complex asynchronous applications.

Performing chaos engineering in a serverless world (CMY301)

The principles of chaos engineering have been battle-tested for years using traditional infrastructure and containerized microservices. But how do they work with serverless functions and managed services? In this session, we cover the motivations behind chaos engineering, how we perform chaos experiments, and what some of the common weaknesses are that we can test for in our serverless applications. We also run some actual experiments in a serverless AWS environment. Join us as we move from talking about principles to performing real chaos-engineering experiments for serverless.

BPF performance analysis (OPN303-R)

Extended BPF (eBPF) is an open-source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance-analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix’s Brendan Gregg tours BPF tracing capabilities, including many new open-source performance analysis tools he developed for his new book “BPF Performance Tools: Linux System and Application Observability.” The talk also includes examples of using these tools in the Amazon Elastic Compute Cloud (Amazon EC2) cloud.

The eBPF session is particularly interesting to us, as we’re using eBPF to power our new Network Performance Monitoring tool.

Sessions with Datadog

By taking care of the servers and monitoring infrastructure for you, AWS and Datadog enable you to focus on solving important business problems. Our customers are solving some truly interesting challenges, and we love helping them share their stories. Here are the sessions we’ve partnered on:

More than rubber on the road: Tires in an IoT world (IOT204-S)

Pirelli is known for creating cutting-edge, high-quality tires with a focus on the performance needs of both high-end consumer drivers and professional drivers. But today, Pirelli tires are much more than rubber on the road—they are connected IoT devices reporting telemetry data to help drivers achieve their safety and performance goals. Between this telemetry function and Pirelli’s other applications, a huge amount of data flows into Pirelli’s systems. Ensuring that these platforms are scalable and reliable is Pirelli’s biggest challenge. In this session, Pirelli shares how these systems are built using AWS and are made possible by modern observability tooling.

Breaking the monolith with style and speed (DOP206-S)

Microservices are here to stay, but nearly all of the most successful architectures originate from the classic monolith. The promised land of microservices is filled with treasures like decoupled deploys, scalability, resilience, development velocity, and more. However, the journey there can involve prolonged seasons of pain, suffering, and even regret. This talk is the story of how Stitch Fix used all three pillars of observability to build confidence, accelerate its migration, and collaborate with other teams. Learn about the strategies that Stitch Fix used and how it incorporated logs, metrics, and traces into these strategies.

Powering digital billboards with serverless (SVS209-S)

Digital billboards are everywhere from buildings to signs to transit stops. Place Exchange, a prominent auction platform for digital billboards, runs over 50,000 concurrent auctions 24/7 for placements on connected billboards in the world’s largest cities. In this talk, the Place Exchange team shares the challenges of managing, monitoring, and scaling a hybrid environment of edge devices all powered by a 100 percent serverless auction platform.

How Auto Scaling lets Braze efficiently send 2B+ messages per day (STP09)

Braze, a digital customer engagement platform, currently automatically scales more than 10,000 servers each week and relies on Amazon EC2 Auto Scaling groups to cost-effectively handle spikes in data and messaging traffic. In this talk, Braze’s CTO and co-founder, Jon Hyman, discusses Braze’s system architecture for managing Auto Scaling. This is a process that isolates Braze’s customer base into separate “clusters” that are each tied to multiple EC2 Auto Scaling groups. Hyman also shares some lessons learned as Braze has grown over the past eight years.

See you at AWS re:Invent

There are hundreds more sessions available, and these are just a few that we think you’ll find interesting. Take a look at the full schedule and you’re sure to find many more that pique your interest. And remember to stop by our booths in the Aria and the Venetian to say hello!

Get Started with Datadog

Datadog’s AWS re:Invent 2019 guide