Loading…
Virtual Event
August 17–August 20, 2020
Learn More and Register to Attend This Event

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please register here.

Please note: This schedule is automatically displayed in Central European Summer Time (CEST). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Observability [clear filter]
Tuesday, August 18
 

14:30 CEST

From Alert Notification to Comparison of Good and Bad Requests in One Click - Shreyas Srivatsan, Chronosphere
Metrics are a great tool for notifying when something goes wrong. Distributed tracing provides the ability to drill down deeper into an issue when triaging an alert with a non-obvious root cause. It’s already difficult to jump from metrics raising an alert to a representative problematic trace, but even once there, users often want to compare a problematic trace with a non-problematic one to help root cause the issue. This talk demonstrates how to jump straight from an alert notification to displaying a problematic trace along with a comparison to a non-problematic trace.

This is accomplished with a combination of open source tools such as Prometheus, Jaeger, Grafana and M3. The audience will learn how recent advances in the community can enable them to reduce their time-to-mitigation by providing the relevant context of a bad request vs a good request directly from a graph.

Speakers
avatar for Shreyas Srivatsan

Shreyas Srivatsan

Technical Lead, Chronosphere
Shreyas is a technical lead at Chronosphere working on all things monitoring. Shreyas is greatly interested in monitoring of all kinds and has contributed to Prometheus, upstreaming exemplar support for the OpenMetrics parser. Prior to working on monitoring Shreyas was a senior software... Read More →



Tuesday August 18, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

17:45 CEST

Tracing is For Everyone: Tracing User Events with GraphQL and OpenTelemetry - Nina Stawski, Splunk
There's been a lot of talk about the importance of observability and tracing for microservice-based applications. The use cases involved are focused on backend engineers and DevOps. But what about us front-end engineers? Often, we get blamed first when something breaks and the lack of consistent observability tools makes it difficult to debug issues.

With the emergence of OpenTelemetry for JavaScript, more front-end developers are looking to instrument their code and connect their traces with the backend. A growing number of teams are adopting GraphQL as their interface between UI and backend as well. This talk will illustrate the process of setting up your app for tracing with OpenTelemetry, show what’s common in GraphQL instrumentation compared to other libraries and describe the potential pitfalls of the approach. Building on that, we will discuss how tracing affects user experience.

Speakers
avatar for Nina Stawski

Nina Stawski

Senior Software Engineer, Splunk
Nina currently works as a Senior UI/UX Engineer, building the enterprise-grade distributed tracing and observability platform as a part of the front-end team at Omnition. Previously she was the Expert Developer / Team Lead at SAP Conversational AI, and the Head of UI/UX and Front... Read More →



Tuesday August 18, 2020 17:45 - 18:20 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

18:30 CEST

Observability at Scale: Running OpenTelemetry Across an Enterprise - Jonah Back & Kranti Vikram, Intuit
Observability has been a huge topic of interest in the software industry over the last few years. One of the major components in observability is distributed tracing. Tools like Jaeger, Zipkin, OpenCensus, and OpenTelemetry have made it really easy to get started. Less easy, however, is getting your tracing infrastructure to a place where it can be fully leveraged across hundreds, if not thousands, of services.

This talk will cover Intuit's experience deploying tracing infrastructure using Kubernetes, Jaeger, and OpenTelemetry. It will cover a few key areas in Intuit's journey to running a highly available, multi-region tracing solution.

1) Scaling ElasticSearch to support 500M+ traces per day.
2) Secure, automated on-boarding of OpenTelemetry agents to central collectors
3) Leveraging open-source libraries to provide high quality trace data, enhanced with domain-specific attributes

Speakers
avatar for Kranti Vikram

Kranti Vikram

Staff Software Engineer, Intuit
Kranti Vikram is a software engineer with microservice expertise who enjoys dealing with problems that are challenging both in their functional, non-functional requirements and focus on tackling problems related to scale, security and performance. Currently, contributing to openzipkin... Read More →
JB

Jonah Back

Principal Software Engineer, LegalZoom
Jonah Back is a Principal Software Engineer at LegalZoom



Tuesday August 18, 2020 18:30 - 19:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability
 
Wednesday, August 19
 

13:00 CEST

Turn It Up to a Million: Ingesting Millions of Metrics with Thanos Receive - Lucas Servén Marín, Red Hat
Thanos is an open-source CNCF Sandbox project that builds upon Prometheus components to create a global-scale and highly available monitoring system. In this talk, Lucas Servén presents a solution for creating a multi-tenant horizontally scalable metrics ingestion system using the newest addition to the Thanos toolset: the Thanos Receive component. The talk considers the motivations for building a system capable of ingesting metrics from thousands of clusters, including: multi-cluster monitoring and cluster telemetry. Lucas discusses how Thanos Receive is able to satisfy these requirements and how its hash ring design allows it to scale and maintain ingestion availability even during upgrades. Finally, the talk demonstrates the practice of running an automatically scalable hash ring by leveraging the Thanos Receive Controller, Horizontal Pod Autoscaler, and the Prometheus Adapter.

Speakers
avatar for Lucas Servén Marín

Lucas Servén Marín

Principal Software Engineer, Red Hat
Lucas Servén Marín is a principal software engineer from Spain currently working for Red Hat in Berlin. By trade he is an electrical engineer, with a Masters in robotics. After two years at CoreOS, he joined Red Hat where he works on the OpenShift Monitoring team and contributes... Read More →



Wednesday August 19, 2020 13:00 - 13:35 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

13:45 CEST

Hubble - eBPF Based Observability for Kubernetes - Sebastian Wicki, Isovalent
Troubleshooting network issues in Kubernetes often requires deep insight into different layers of your stack. Hubble is a new open-source observability platform that aims to assist you in understanding what is going on in all layers of your Kubernetes network. Based on the Cilium CNI and the Linux kernel eBPF technology, it is able to obtain fine-grained visibility into network traffic and applications behavior, with low overhead and without having to modify applications.

In this talk, you will get a introduction into Hubble, and the technologies that power it, the Cilium CNI and eBPF. You will be presented with practical examples of how Hubble can be used to interactively troubleshoot complex network issues. The talk will show how to write custom Hubble metrics which allow you to benefit from eBPF's superpowers without having to write or understand any kernel code.

Speakers
SW

Sebastian Wicki

Software Engineer, Isovalent
Sebastian Wicki is a software engineer currently working on Hubble and Cilium at Isovalent. Previously he worked on distributed stream processing systems for real-time data center network monitoring and analytics at ETH Zurich.



Wednesday August 19, 2020 13:45 - 14:20 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

17:40 CEST

Monitoring GPUs at Scale for AI/ML and HPC Clusters - Bharti L Agrawal, NVIDIA
At Nvidia we have several large GPU K8s clusters for running deep learning training (AI/ML) workloads. On these clusters we need monitoring to support a range of user personas . First we have the end users (AI/ML researchers) who want to get an insight into how well their workloads used the GPUs and the system. Then we have the operations team who would like to monitor the general health of the cluster and be alerted in real time to any issues. Finally we have the stakeholders who would like to see the GPU utilization and saturation over time for capacity planning. These requirements cannot be satisfied by a standard “out of the box” setup.

In this presentation we will show how we used a combination of open source tools to address our requirements. We will discuss various deployment, maintenance, security and scale challenges we hit and how we resolved them for monitoring GPU data.

Speakers
avatar for Bharti L Agrawal

Bharti L Agrawal

Senior Staff Engineer, NVIDIA
Bharti Agrawal been in the software industry for over 20 years. Her career has taken her from working on mainframes, to web 2.0 sites, to SAAS applications, to advertising platforms. She has worked in a wide range of companies from small startups to Google and Yahoo. She currently... Read More →



Wednesday August 19, 2020 17:40 - 18:15 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability
 
Thursday, August 20
 

13:00 CEST

Make Prometheus Use Less Memory and Restart Faster - Ganesh Vernekar, Grafana Labs
These days, the most common reason for a Prometheus server to run out of memory is an excessive amount of time series in the so called head block, the part of the internal TSDB with the freshest data, which has to be kept in memory prior to consolidation into a block on disk. A large head block leads to a long restart time because the head block has to be rebuilt from the write-ahead log. On large servers, the restart time can be 10 minutes or more. Since restarts happen regularly to upgrade the binary or to change flags, the resulting interruption of sample collection is problematic. Even worse: After an OOM crash, the same replaying from the WAL has to happen, often causing another OOM crash immediately. Ganesh Vernekar will talk about the work started in late 2019 to persist parts of the head block earlier, thereby reducing both the memory footprint and the restart time.

Speakers
avatar for Ganesh Vernekar

Ganesh Vernekar

Senior Software Engineer, Grafana Labs
Ganesh has been contributing to Prometheus for over 5 years and is a Prometheus team member and maintainer of its Time Series Database (TSDB). Most recently, he worked on the new native histograms in Prometheus. He has also contributed to Cortex, Grafana Mimir, and Grafana.



Thursday August 20, 2020 13:00 - 13:35 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

13:45 CEST

Stateless Fluentd with Kafka - Steven McDonald, Usabilla
Fluentd is typically deployed as a central aggregator to which everything sends its logs for processing and routing. This superficially simple approach was found to be inadequate at Usabilla. Errors in one part of the processing chain often had knock-on effects elsewhere, leading Usabilla's SREs to search for a more failure tolerant design.

Steven will introduce the new stateless fluentd deployment at Usabilla, built around Kafka as a centralised, highly available log buffer. He will also introduce the new components that have been developed to adapt fluentd to be completely stateless, as well as how logs are reliably fed into Kafka from hosts all over the world. Finally, there will be a brief overview of the challenges still remaining.

Speakers
avatar for Steven McDonald

Steven McDonald

Site Reliability & Infra Engineer, Usabilla
Steven is an experienced systems administrator turned SRE. In his experience doing traditional managed hosting, he developed a keen interest in reliable automation and failure tolerance. Today, he puts that experience to use deploying and maintaining cloud-native infrastructure at... Read More →



Thursday August 20, 2020 13:45 - 14:20 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

14:30 CEST

Better Histograms for Prometheus - Björn Rabenstein, Grafana Labs
Robust histogram functionality was added to Prometheus long ago, to be precise in version 0.11.0. The histogram representation as we know it from Prometheus andnow also OpenMetrics is a simple yet powerful and enables many important use cases. However, it is also infuriatingly limited, mostly because of its high per-bucket cost. Björn “Beorn” Rabenstein released the above mentioned Prometheus version in February 2015 and has been dreaming about better histograms ever since. In this talk, he will explain why it is so hard for Prometheus to adopt established techniques known from various academic publications and even practical applications in other metrics-processing systems. To add a silver lining, he will share his latest findings from reignited efforts to finally break through the barriers and bring efficient high-resolution histograms to Prometheus.

Speakers
avatar for Björn Rabenstein

Björn Rabenstein

Engineer, Grafana Labs
Björn “Beorn” Rabenstein is an engineer at Grafana Labs and a Prometheus developer. Previously, he was a Production Engineer at SoundCloud, a Site Reliability Engineer at Google, and a number cruncher for science.



Thursday August 20, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

18:05 CEST

OpenTelemetry Auto-Instrumentation Deep Dive - Carlos Alberto Cortez & Alex Boten, LightStep
Auto-instrumentation allows users to monitor their applications without the need to modify the code base, and immediately start gathering observability data . As part of the OpenTelemetry initiative (resulting from the merge of OpenTracing and OpenCensus), auto-instrumentation libraries will become a core feature, and will be offered across different languages (Java, Python, Ruby, Node.js, .Net, etc).

In this deep dive you will learn about the architecture of these auto-instrumentation libraries, out-of-the-box OSS libraries integrations (such as Spring, Django or Rails); how to configure them to export telemetry data to different tracing and metrics backends (such as Jaeger or Prometheus), as well as interesting challenges, such as the possibility to share OSS integrations between auto and manual instrumentation.

Speakers
avatar for Carlos Alberto Cortez

Carlos Alberto Cortez

Open Source Software Engineer, LightStep
Carlos works as an Open Source Software Engineer at LightStep, being a maintainer for OpenTelemetry Java, and previously as a core maintainer in OpenTracing. Carlos has worked for the last 13 years in different areas, such as desktop frameworks, compilers, class libraries and distributed... Read More →
avatar for Alex Boten

Alex Boten

Sr. Staff Software Engineer, ServiceNow Cloud Observability, formerly Lightstep
Alex Boten is a senior staff software engineer that has spent the last ten years helping organizations adapt to a cloud-native landscape by mashing keyboards. From building core network infrastructure to mobile client applications and everything in between, Alex has first-hand knowledge... Read More →



Thursday August 20, 2020 18:05 - 18:40 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability

18:50 CEST

Migrating to OpenTelemetry From a Custom Distributed Tracing Pipeline - Francis Bogsanyi, Shopify
Shopify built its own distributed tracing pipeline in 2016, including custom instrumentation, a custom propagation format, trace collection, down sampling, augmentation, cleansing and fanout to multiple analytics backends. Over the past year, they have been migrating their entire tracing pipeline to OpenTelemetry, after a brief sojourn with OpenCensus. This talk describes the motivation for the migration, the advantages of working with and building upon the OpenTelemetry project, and concrete details of the migration process.

Speakers
avatar for Francis Bogsanyi

Francis Bogsanyi

Staff Production Engineer, Shopify
Francis Bogsanyi is a Technical Lead at Shopify responsible for distributed tracing and performance tooling. He previously led the team owning Redis and Memcached infrastructure at Shopify. Francis built and Open Sourced a Lua implementation in Go and leveraged it to build Shopify's... Read More →



Thursday August 20, 2020 18:50 - 19:25 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Observability
 
  • Timezone
  • Filter By Date KubeCon + CloudNativeCon Europe 2020 Aug 17 -20, 2020
  • Filter By Venue Virtual
  • Filter By Type
  • 101 Track
  • Application + Development
  • Breaks
  • Case Studies
  • CI/CD
  • CNCF Membership Benefits Office Hours
  • Co-Located Events
  • Community
  • Customizing + Extending Kubernetes
  • Experiences
  • Expo Hall
  • FinOps Summit
  • Keynote Sessions
  • Lightning Talk Sessions
  • Machine Learning + Data
  • Maintainer Track Sessions
  • Meet the Maintainers
  • Networking
  • Observability
  • Operations
  • Performance
  • Runtimes
  • Security + Identity + Policy
  • Serverless
  • Service Mesh
  • Storage
  • Tutorials
  • Content Experience Level

Twitter Feed

Filter sessions
Apply filters to sessions.