Datadog Guide to AWS Re:Invent 2015 | Datadog

Datadog Guide to AWS re:Invent 2015

Author Ilan Rabinovitch

Published: 9月 30, 2015

With a week to go until AWS re:Invent and with over 380 events in the session catalog, navigating the schedule can be a bit daunting. But there’s no need to fear, Datadog is here with our session picks for the 2015 AWS re:Invent conference.

But don’t spend all your time at sessions—some of the most interesting advances will be at vendor booths. For example, if you stop by the Datadog booth (which, trust us, you can’t miss), we’ll give you a live demo of our newly announced outlier detection features and new AWS integrations such as Elastic Container Service and DynamoDB. We also invite you to join us at Aquaknox on Wednesday night during the AWS Pub Crawl.

Sessions by Datadog

DVO 205 - Monitoring Evolution

Wednesday, Oct 7, 11:00 AM – Delfino 4005

Learn how the AdRoll team became a data driven organization using Datadog, EC2’s dynamic infrastructure, and related tooling. Speakers: Ilan Rabinovitch (Dir, Technical Community @ Datadog) and Brian Troutwine (Sr. Software Engineer @ AdRoll).

DVO 204 - Monitoring Strategies: Finding Signal in the Noise

Thursday, Oct 8, 11:00 AM – Murano 3305

Avoid pager fatigue with a framework for identifying what to monitor, what to alert on, and resources that will help you perform root-cause analysis. Speaker: our very own Matt Williams.

Our Schedule

So, where will you find Datadog team members when we are not at our booth? Here is our personal re:Invent schedule:

Wednesday

DVO 205 - Monitoring Evolution

Wednesday, Oct 7, 11:00 AM – Delfino 4005

Today, AdRoll runs its infrastructure by instrumentation: constantly asking empirical questions, analyzing data for answers, and designing new features with instrumentation in mind to understand how functionality will work upon release. AdRoll’s development methodology did not start out this way, however. It took a cultural shift and many new tools and processes to adopt this approach. In this session, AdRoll and Datadog will discuss how to evolve your organization from a state of “flying blind” to a culture focused on monitoring and data-based decisions.

DVO202 - DevOps at Amazon: A Look at Our Tools and Processes

Wednesday, Oct 7, 12:15 PM – Venetian H

As software teams transition to cloud-based architectures and adopt more agile processes, the tools they need to support their development cycles will change. In this session, we’ll take you through the transition that Amazon made to a service-oriented architecture over a decade ago. We will share the lessons we learned, the processes we adopted, and the tools we built to increase both our agility and reliability. We will also introduce you to AWS CodeCommit, AWS CodePipeline, and AWS CodeDeploy, three new services born out of Amazon’s internal DevOps experience.

CMP401- Elastic Load Balancing Deep Dive and Best Practices

Wednesday, Oct 7, 1:30 PM – Palazzo N

Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances for fault tolerance and load distribution. In this session, we go into detail about Elastic Load Balancing’s configuration and day-to-day management, as well as its use in conjunction with Auto Scaling. We explain how to make decisions about the service and share best practices and useful tips for success.

CMP307 - Using Spot Instances for Production Workloads

Wednesday, Oct 7, 2:45 PM – San Polo 3506

Spot instances have come a long way since they were first introduced. Leveraging multiple Auto Scaling groups along with AWS functionality enhancements, you can even use them effectively for real-time production workloads. The higher the flexibility of the workload, the greater the cost savings are in comparison to a conventional combination of Reserved and On-Demand instances. Join us in this session to explore these techniques along with configuration approaches that allow you to tune the risk/reward balance.

DVO203 - A Day in the Life of a Netflix Engineer Using 37% of the Internet

Wednesday, Oct 7, 4:15 PM – Venetian H

Netflix is a large and ever-changing ecosystem made up of: hundreds of production changes every hour, thousands of micro services, tens of thousands of instances, millions of concurrent customers, billions of metrics every minute. And I’m the guy with the pager. This is an in-the-trenches look at what operating at Netflix scale in the cloud is really like. It covers how Netflix views the velocity of innovation, expected failures, high availability, engineer responsibility, and obsessing over the quality of the customer experience. It also explains why freedom and responsibility are key, trust is required, and chaos is your friend.

GEN102 - Pub Crawl

Wednesday, Oct 7, 5:30 PM–7:30 PM – Aquanox

Join AWS and Data sponsors at the best clubs and restaurants in the Venetian and Palazzo. Hosted bar and appetizers on the house.

Thursday

DVO 204 - Monitoring Strategies: Finding Signal in the Noise

Thursday, Oct 8, 11:00 AM – Murano 3305

You need to monitor only a few machines and applications before fixing issues in your environment becomes very complicated. Throw in the type of dynamic infrastructure provided by Amazon EC2, and your static monitoring strategies will most likely not scale. Knowing which metrics to watch and how to troubleshoot based on those metrics will help you solve problems more quickly. In this session, we will look at a framework for your metrics and how to use it to find solutions to the issues that come up. We will cover the three types of monitoring data; what to collect; what should trigger an alert (avoiding an alert storm); and how to follow the resources to find the root causes of problems.

CMP406 - Amazon ECS at Coursera: Modifying the ECS Agent for Production

Thursday, Oct 8, 1:30 PM – San Polo 3506

Come see how Coursera modified the Amazon EC2 Container Service (Amazon ECS) Agent to fit complex post-processing and security requirements. Anyone can sign up for a course on Coursera for free, so our offerings require that we defend against arbitrary code execution within our containers. To do so, we adopt a defense-in-depth approach where multiple layers of security are used to prevent bad actors from abusing the system. As part of this approach, we have modified the Amazon ECS Agent to run untrusted code within the Docker containers to grade programming assignment submissions from users around the world. We also modified the ECS agent in a separate fork to support running Docker within Docker, which we do to post-process uploaded grading templates from instructional teams to harden them against other attacks and prepare them for execution within our hardened grading environment. In this session, we outline our approach, providing ideas for your own post-processing and hardening requirements.

CMP302 - Amazon EC2 Container Service: Distributed Applications at Scale

Thursday, Oct 8, 2:45 PM – Venetian H

In recent years, containers have become a key component of modern application design. Increasingly, developers are breaking their applications apart into smaller components and distributing them across a pool of compute resources. It is relatively easy to run a few containers on your laptop, but building and maintaining an entire infrastructure to run and manage distributed applications is hard and requires a lot of undifferentiated heavy lifting. In this session, we discuss some of the core architectural principles underlying Amazon ECS, a highly scalable, high performance service to run and manage distributed applications using the Docker container engine. We walk through a number of patterns used by our customers to run their microservices platforms, to run batch jobs, and for deployments and continuous integration. We explore the advanced scheduling capabilities of Amazon ECS and dive deep into the Amazon ECS Service Scheduler, which optimizes for long-running applications by monitoring container health, restarting failed containers, and load balancing across containers.

BDT403 - Best Practices for Building Real-time Streaming Applications with Amazon Kinesis

Thursday, Oct 8, 4:15 PM – Palazzo F

Amazon Kinesis is a fully managed, cloud-based service for real-time data processing over large, distributed data streams. Customers who use Amazon Kinesis can continuously capture and process real-time data such as website clickstreams, financial transactions, social media feeds, IT logs, location-tracking events, and more. In this session, we first focus on building a scalable, durable streaming data ingest workflow, from data producers like mobile devices, servers, or even a web browser, using the right tool for the right job. Then, we cover code design that minimizes duplicates and achieves exactly-once processing semantics in your elastic stream-processing application, built with the Kinesis Client Library. Attend this session to learn best practices for building a real-time streaming data architecture with Amazon Kinesis, and get answers to technical questions frequently asked by those starting to process streaming events.

Friday

ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency

Friday, Oct 9, 9:00 AM – Palazzo K

Come see how Coursera modified the Amazon EC2 Container Service (Amazon ECS) Agent to fit complex post-processing and security requirements. Anyone can sign up for a course on Coursera for free, so our offerings require that we defend against arbitrary code execution within our containers. To do so, we adopt a defense-in-depth approach where multiple layers of security are used to prevent bad actors from abusing the system. As part of this approach, we have modified the Amazon ECS Agent to run untrusted code within the Docker containers to grade programming assignment submissions from users around the world. We also modified the ECS agent in a separate fork to support running Docker within Docker, which we do to post-process uploaded grading templates from instructional teams to harden them against other attacks and prepare them for execution within our hardened grading environment. In this session, we outline our approach, providing ideas for your own post-processing and hardening requirements.

DAT304 - Amazon RDS MySQL: Best practices

Friday, Oct 9, 10:15 AM – Delfino 4102

Learn how to monitor your database performance closely and troubleshoot database issues quickly using a variety of features provided by Amazon RDS and MySQL including database events, logs, and engine-specific features. You will also learn about the security best practices to use with Amazon RDS for MySQL as well as how to effectively move data between Amazon RDS and on-premises instances. Hear from Amazon RDS customer Airbnb about the best practices they have implemented in their RDS for MySQL architectures.

CMP310 - Building Robust Data Processing Pipelines Using Containers and Spot Instances

Friday, Oct 9, 11:30 AM – San Polo 3506

It’s difficult to find off-the-shelf, open-source solutions for creating lean, simple, and language-agnostic data-processing pipelines for machine learning (ML). This session shows you how to use Amazon S3, Docker, Amazon EC2, Auto Scaling, and a number of open source libraries as cornerstones to build one. We also share our experience creating elastically scalable and robust ML infrastructure leveraging the Spot instance market.

Sessions by Datadog Customers

AdRoll

DVO 205 - Monitoring Evolution (Brian Troutwine)

Wednesday, Oct 7, 11:00 AM – Delfino 4005

CMP310 - Building Robust Data Processing Pipelines Using Containers and Spot Instances (Oleg Avdeev)

Friday, Oct 9, 11:30 AM – San Polo 3506

Coursera

CMP406 - Amazon ECS at Coursera: Modifying the ECS Agent for Production

Thursday, Oct 8, 1:30 PM – San Polo 3506

Team Internet

ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency

Friday, Oct 9, 9:00 AM – Palazzo K