Datadog's KubeCon 2020 Guide

Datadog's KubeCon 2020 guide

Learning at KubeCon

With so many keynotes, announcements, and sessions to check out, it looks like KubeCon is going to be another great learning experience this year. Below, we’ve compiled a list of sessions that we’re planning to attend, along with snippets of their abstracts and our take on what we found particularly interesting. You can also view the full schedule here.

Static Analysis of Kubernetes Manifests - Barak Schoster, Bridgecrew

Wednesday, November 18
3:00–3:35 p.m. EST

Planning, provisioning, and changing infrastructure are becoming vital to rapid cloud application development. Incorporating infrastructure as code into software development promotes transparency and immutability and helps prevent bad configurations upstream. In this talk, we’ll cover best practices for writing, testing, and maintaining infrastructure at scale using policy-as-code both in CI/CD and Kubernetes cluster runtimes. We’ll compare the two methods and review sample use cases that showcase the benefits of each. In addition, we’ll cover the current state of open source repositories and Kubernetes manifests found in the wild.

Datadog’s take: Having a good observability strategy is critical for catching potential issues in your production clusters, but ideally we should also detect at least some of those errors before they even get to production. We are looking forward to learning how static analysis of Kubernetes manifests can help users identify these issues more proactively.

Enhancing the Kubernetes Scheduler for Diverse Workloads in Large Clusters - Yuan Chen & Yan Xu, Apple

Thursday, November 19
3:45–4:20 p.m. EST

As a wide diversity of workloads are being deployed in Kubernetes, the default scheduler has become insufficient in light of scheduling performance and functionality. In this talk, Yuan Chen and Yan Xu will present their experiences and results of leveraging the Kubernetes scheduling framework and developing new plugins to create a custom scheduler. The scheduler can meet different scheduling needs of diverse workloads in large-scale clusters, from stateless to stateful services, big data jobs, and machine learning applications. They will deep dive into (1) the design and implementation of the scheduling plugins for performance optimization, custom pod placement, and group scheduling, and (2) the use of plugins and scheduling profiles to achieve a better balance between scheduling performance and quality. New features and enhancements to the scheduling framework will also be discussed.

Datadog’s take: Using a custom scheduler is an interesting idea and any time you get to look inside a super secretive company like Apple can be fun. Extending the scheduler is incredibly powerful and we look forward to seeing how they interpreted the new Scheduling Framework and why they decided to use extension points.

Kubelet Deep Dive: Writing a Kubelet in Rust - Kevin Flansburg, Moose Consulting

Thursday, November 19
5:40–6:15 p.m. EST

Kubelet is a critical part of the Kubernetes project. Kubernetes deployments can vary a great deal in terms of container runtime, self-hosted or static control plane, CNI provider, etc., but they must all have a kubelet running on each node. Many intermediate Kubernetes users could benefit from a deeper understanding of kubelet behavior. This talk discusses the development of a kubelet in Rust, and offers a deep dive into the expected behavior and implementation of the kubelet. The talk will begin with a discussion of how the kubelet fits into a Kubernetes deployment, and its relationship with Kubernetes Operators and the Container Runtime Interface (CRI). Next, the talk will cover important crates for Kubernetes development in Rust, as well as the development of Operators and use of gRPC. Finally, the talk will end with pros and cons of using Rust today for Kubernetes development.

Datadog’s take: Although most of the Kubernetes components are written in Go, Rust is becoming more and more popular within the Kubernetes community. We believe this session can help you understand the kubelet better and learn some Rust while you’re at it!

Panel: Linux in the Kubernetes Era: Does The OS Still Matter? - Tasha Drew, VMware; Kiko Reis, Canonical; Darren Shepherd, Rancher Labs; Dusty Mabe, Red Hat; and Vincent Batts, Kinvolk

Friday, November 20
4:00–4:35 p.m. EST

With the end-of-life of the original “Container Linux” (CoreOS), what is the future for the key underlying component of any Kubernetes deployment: the operating system? While many engineers are opting for general-purpose distributions like Ubuntu or CentOS, there is a healthy ecosystem of Kubernetes- and container-optimized distros, as well as new container-specific kernel features. This panel, bringing together representatives from the key Kubernetes Linux vendors, will review the current state of Linux for cloud-native workloads, the areas being worked on by the community, and the pros and cons of different approaches from an end-user perspective. Discussion topics will include improved kernel support for containers, minimal versus general purpose distros, and loosely versus tightly coupled approaches to distributing Linux+Kubernetes.

Datadog’s take: Kubernetes is sometimes referred to as “the Linux of the Orchestrators,” meaning that Kubernetes is no longer where innovation needs to happen, but it is now a commodity. But if that’s the case, do we care about the OS anymore? Let’s find out together in this interesting panel!

How to Effectively Manage Kubernetes in a Regulated Environment - Darien Ford, Capital One

Friday, November 20
4:00–4:35 p.m. EST

Kubernetes plays an important role when scaling containerized applications in a highly regulated environment. Capital One understands this firsthand, as they will complete a multi-year journey to exit on-prem data centers this year and move to the public cloud. Darien Ford will explain how Kubernetes container orchestration accelerates a safe and effective shift to cloud architecture with both developer experience and enterprise requirements in mind. As network boundaries broaden, he will cover ways container orchestration can help introduce workloads to the cloud while managing application development, testing, deployment and, of course, governance and policy compliance.

Datadog’s take: We’re interested to learn more about the challenges that Capital One faces and how they deal with them, since we are also building out GovCloud support and have a growing customer base in regulated environments.

In Search Of A `kubectl blame` Command - Nick Santos, Tilt

Friday, November 20
5:05–5:40 p.m. EST

Developers want understandable tools. Their tools should tell them, “This change here broke that pod there.” But control loops drive the Kubernetes worldview. In a control loop, Kubernetes updates the cluster to make the actual state match the desired state. Control loops do not track why the state changed. Nick Santos and the Tilt team tried to build a tool that traced the effects of each kubectl apply. He’ll tell stories about several attempts to propagate and assign blame across state changes. Most of them failed! Or broke Kubernetes updates in frustrating ways! Along the way, they learned about labels, informers, UIDs, owner refs, events, and how kubectl apply works internally. If you plan to write a tool that interprets Kubernetes API objects for humans, this talk is for you.

Datadog’s take: We all have used git blame at some point in our developer careers, but nowadays developers also deploy to production. Having new tools that allow us to audit our production clusters at all times becomes more and more critical. Tilt has created some good developer tools in the past, so this will probably be another one to add to our toolset.

Connect with Datadog during KubeCon

We always have a great time at KubeCon, and we look forward to seeing everyone at our (virtual) booth this year.

During the week of KubeCon, Datadog will also be hosting a few Kubernetes-related events and participating in two KubeCon sessions.

Datadog on Kubernetes Monitoring

Monday, November 16
12–1:00 p.m. EST

In this session, Ara Pulido, Technical Evangelist, will chat with Celene Chang and Charly Fontaine—both software engineers on the Container Integrations team at Datadog. This team is responsible for deploying and running the Datadog Agent in our Kubernetes clusters. We’ll cover how we are running the Datadog Agent in our clusters, which metrics we care about, and the monitors we have set up. By the end of the session, you will come away with new ideas and best practices for monitoring Kubernetes that you can apply in your own environment. After the session, we will host an interactive Q&A where we can all learn together.

Workshop: Autoscaling Applications Deployed to Kubernetes

Tuesday, November 17
10:00 a.m.–12:30 p.m. EST

Containers and orchestration promise greater resource efficiency and simplified deployments. Developers provide code in the form of a container, define the amount of resources required for operation, and the orchestrator does the rest. Sounds easy, but if your workloads are dynamic in nature, how can you ensure that they have sufficient resources to meet the performance and availability requirements of your customers? In this hands-on workshop with Ara Pulido and Charly Fontaine of Datadog, you’ll learn how to autoscale your application workloads on Kubernetes. We will walk through how to identify key work and resource metrics, as well as how to use horizontal and vertical pod autoscaling to maximize efficiency, while improving service reliability.

KubeCon session: PKI the Wrong Way: Simple TLS Mistakes and Surprising Consequences

Wednesday, November 18
5:45–6:20 p.m. EST

Effective management of TLS certificates and keys is a serious challenge when running Kubernetes at scale. TLS mutual authentication secures all the Kubernetes control plane components, but there are many details that must be right. In this session, Tabitha Sable, Systems Security Engineer at Datadog, will look at some of the ways common mTLS configuration mistakes can be abused and how you can reduce that risk. The presentation will begin with a tour of the basics of TLS mutual authentication and how it is used by each control plane component. Then, Tabitha will demonstrate several example misconfigurations, exploit them for your education and amusement, and share recommendations for preventing them in your own clusters. You’ll leave with a stronger understanding of this essential element of Kubernetes cluster deployment.

KubeCon session: How the OOM-Killer Deleted My Namespace, and Other Kubernetes Tales

Thursday, November 19
4:50–5:25 p.m. EST

Running Kubernetes at scale is challenging, and you can often end up in situations where you have to debug complex and unexpected issues. This requires a detailed understanding of how the different components work and interact with each other. Over the last three years, Datadog has migrated most of its workloads to Kubernetes and now manages dozens of clusters consisting of thousands of nodes each. During this journey, engineers have debugged complex issues with root causes that were sometimes very surprising. In this talk, Laurent Bernaille and Tabitha Sable will share some of these stories, including a favorite: how a complex interaction between familiar Kubernetes components allowed an OOM-killer invocation to trigger the deletion of a namespace.

See you at KubeCon

With so many great sessions at KubeCon this year, we couldn’t possibly highlight all of our favorites. But hopefully, this guide can help you get started. Check out the full schedule here.

And of course, we are a sponsor this year, and we would love for you to take the time to visit our booth. You can see a demo of our latest additions to the platform and have a chance to win an Xbox Series X, as well as a 1-in-10 chance to win an iPhone 12 Pro.

Want to work with us? We're hiring!

Datadog's KubeCon 2020 guide

Further Reading

Learning at KubeCon

Static Analysis of Kubernetes Manifests - Barak Schoster, Bridgecrew

Enhancing the Kubernetes Scheduler for Diverse Workloads in Large Clusters - Yuan Chen & Yan Xu, Apple

Kubelet Deep Dive: Writing a Kubelet in Rust - Kevin Flansburg, Moose Consulting

Panel: Linux in the Kubernetes Era: Does The OS Still Matter? - Tasha Drew, VMware; Kiko Reis, Canonical; Darren Shepherd, Rancher Labs; Dusty Mabe, Red Hat; and Vincent Batts, Kinvolk

How to Effectively Manage Kubernetes in a Regulated Environment - Darien Ford, Capital One

In Search Of A `kubectl blame` Command - Nick Santos, Tilt

Connect with Datadog during KubeCon

Datadog on Kubernetes Monitoring

Workshop: Autoscaling Applications Deployed to Kubernetes

KubeCon session: PKI the Wrong Way: Simple TLS Mistakes and Surprising Consequences

KubeCon session: How the OOM-Killer Deleted My Namespace, and Other Kubernetes Tales

See you at KubeCon

Further Reading

Start monitoring your metrics in minutes

Datadog's KubeCon 2020 guide

Further Reading

Static Analysis of Kubernetes Manifests - Barak Schoster, Bridgecrew

Enhancing the Kubernetes Scheduler for Diverse Workloads in Large Clusters - Yuan Chen & Yan Xu, Apple

Kubelet Deep Dive: Writing a Kubelet in Rust - Kevin Flansburg, Moose Consulting

Panel: Linux in the Kubernetes Era: Does The OS Still Matter? - Tasha Drew, VMware; Kiko Reis, Canonical; Darren Shepherd, Rancher Labs; Dusty Mabe, Red Hat; and Vincent Batts, Kinvolk

How to Effectively Manage Kubernetes in a Regulated Environment - Darien Ford, Capital One

In Search Of A kubectl blame Command - Nick Santos, Tilt

Related jobs at Datadog

Further Reading

Monitor applications running on VMware Tanzu Application Service

Monitor Microsoft Azure Stack HCI with Datadog

Monitor Amazon EC2 Mac Instances

Monitor Amazon EKS Distro with Datadog

In Search Of A `kubectl blame` Command - Nick Santos, Tilt