KubeCon + CloudNativeCon Europe 2020: Full Schedule

Virtual Event
August 17–August 20, 2020
Learn More and Register to Attend This Event

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please register here.

Please note: This schedule is automatically displayed in Central European Summer Time (CEST). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

13:45 CEST

Is Sharing GPU to Multiple Containers Feasible? - Samed Güner, SAP

Provisioning GPUs for ML workloads in data center can be very costly and more costly if they are not fully utilized. Thus, maximizing the GPU utilization is a must for ML workloads.

This session will show how a single GPU can be used to run multiple ML workloads, especially ML inference, in parallel and will deep dive into the understanding of how GPUs are provisioned and attached using K8s device plugins. It will show how the nvidia device plugin can be extended to schedule multiple ML workloads to a single GPU and collect desired GPU information with Prometheus.

This session will highlight and deep dive into native GPU sharing using K8s device plugin without additional technologies such as vGPUs from VMware.

Speakers

Samed Güner

Senior DevOps Engineer, SAP

Samed Güner is currently working as a software engineer at SAP Artificial Intelligence embedding large scale AI into enterprise applications. He mainly works on infrastructure and K8s with a strong focus on leveraging DevOps principles. He previously worked on projects such as distributed... Read More →

Closed Caption Transcript ROBS 9800 txt

August 18 Is Sharing GPU to multiple Containers feasible pdf

Tuesday August 18, 2020 13:45 - 14:20 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Intermediate (Mid-level experience)
Link to Video Recording https://youtu.be/MDkltK5JLCU

14:30 CEST

Enabling Multi-user Machine Learning Workflows for Kubeflow Pipelines - Yannis Zarkadas, Arrikto & Yuan Gong, Google

Kubeflow is an open source machine learning platform built on Kubernetes. Every service in Kubeflow is implemented either as a Custom Resource Definition (CRD) (e.g., TensorFlow Job) or as a standalone service (e.g., Kubeflow Pipelines).

As enterprises start to adopt Kubeflow, the need for access control, authentication, and authorization is emerging. Kubernetes CRDs come with their own auth story, but what about Services with their own API and database, like Kubeflow Pipelines? In this talk, we explore how we enabled multi-user workflows for Kubeflow Pipelines, in a Kubernetes-native way.

We present how we combined open-source, cloud-native technologies to design and implement a flexible, Kubernetes-native solution for services with their own API and database. The talk will include a live demo.

Speakers

Yannis Zarkadas

Software Engineer, Arrikto

Yannis is a software engineer at Arrikto, working with Kubeflow and the Kubernetes sig-storage group. He loves contributing to open source projects and has authored the Cassandra Operator in Rook and the official Scylla Operator, which he is currently maintaining.

Yuan Gong

Software Engineer, Google Cloud

I'm a software engineer at Google Cloud working on Kubeflow Pipelines project (https://github.com/kubeflow/pipelines).

[kubecon amsterdam] Enabling Multi user Machine Learning Workflows for Kubeflow Pipelines pdf

Closed Caption Transcript ULND 1515 txt

Tuesday August 18, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Intermediate (Mid-level experience)
Link to Video Recording https://youtu.be/U8yWOKOhzes

13:45 CEST

Taming Data/State Challenges for ML Applications and Kubeflow - Skyler Thomas, Hewlett Packard Enterprise

The Kubeflow project brings incredibly powerful Machine Learning frameworks like TensorFlow and PyTorch to Kubernetes. The ability to parallelize training and the ability to scale workflows up and down is revolutionary. However, state and persistent storage are a much bigger challenge for machine learning workloads because of their training data, library files, and models. We will discuss what it took to create AI/ML environments running thousands of pods and that request petabytes of training data.

We will explore the various state and storage challenges that crop up when you are building Kubeflow applications. We will discuss where distributed persistent storage solutions fit in the picture. We will address various storage api's including: POSIX/CSI solutions, NFS, S3, and HDFS fit into solutions. Data security and privacy issues will be discussed.

Speakers

Skyler Thomas

Distinguished Technologist, Hewlett Packard Enterprise

Skyler Thomas is a Distinguished Technologist and Hewlett Packard Enterprise. He the chief architect for Kubernetes based Artificial Intelligence and Machine Learning at HPE. He joined HPE in the MapR acquisition where he he helped customers design ML simulation environments with... Read More →

Close Captioning Transcript DGDN 8464 txt

Taming State Challenges pdf

Wednesday August 19, 2020 13:45 - 14:20 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Advanced (Expert level experience)
Link to Video Recording https://youtu.be/fPpcodldlVg

14:30 CEST

How to Use Kubernetes to Build a Data Lake for AI Workloads - Peter MacKinnon & Uday Boppana, Red Hat

The recent popularity of hybrid cloud architectures have left organizations with traditional data lakes behind. How do you build a data architecture that is cloud provider agnostic and is flexible enough to run in across public and private cloud data centers? How do you make this data available to data scientists and developers in a way that simplifies the creation of intelligent applications?

This talk will walk through a new way of building data lakes for the hybrid cloud using Rook and Ceph community projects running on Kubernetes. With a single data architecture deployment that can run in any cloud or across multiple clouds, IT and data engineers can use open source tools on Kubernetes such as Rook, Ceph, Hive metastore, Spark, and Presto to provide unified access to massive amounts of data across multiple data centers for data scientists and developers.

Speakers

Peter MacKinnon

Principal Software Engineer, Red Hat Inc.

Pete MacKinnon is a Principal Software Engineer in the AI Center of Excellence at Red Hat. He is actively involved in the Kubeflow and Open Data Hub open source projects. He works closely with Red Hat customers and partners to successfully bring their machine learning and analytics... Read More →

Uday Boppana

Senior Principal Product Manager, Red Hat

Uday Boppana is a Senior Principal Product Manager at Red Hat, responsible for Big Data and AI/ML data services . He has experience working in AI/ML, hybrid cloud, data center, data services and storage solutions in different roles and with a variety of technologies. In prior roles... Read More →

August 19 How to Use Kubernetes to Build a Data Lake for AI Workloads pdf

Close Captioning Transcripts VOLQ 6664 txt

Wednesday August 19, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Beginner (Very basic information)
Link to Video Recording https://youtu.be/0HIelZ3qMLE

16:55 CEST

Production Multi-node Jobs with Gang Scheduling, K8s, GPUs and RDMA - Madhukar Korupolu & Sanjay Chatterjee, NVIDIA

With the growing scale of DL and ML applications, distributed execution of jobs across multiple nodes becomes increasingly critical -- to solve bigger problems faster -- as illustrated by the recent MLperf results. However running such workloads in a production K8s cluster shared by multiple jobs/users has several challenges.

In this talk, we’ll give an overview of this area -- including distributed Tensorflow, Pytorch, Horovod, MPI -- and the use of GPU nodes with NCCL and RDMA for accelerated performance. We’ll describe our end-to-end flow for multi-node jobs in K8s including gang scheduling, quotas, fairness and backfilling implemented in our custom scheduler for GPUs. Our cluster includes high-speed networking through RoCE and SR-IOV / Multus CNI. We’ll share our design choices, learnings and operational experience including failure handling, performance and telemetry.

Speakers

Madhukar Korupolu

Distinguished Engineer, NVIDIA

Madhukar is an architect at NVIDIA working on GPU clusters for AI and ML workloads. Areas of interest and experience include AI / ML infra, GPU acceleration, Cloud computing, Distributed systems, Kubernetes, HPC, CDNs etc with previous stints at Google, IBM and Akamai. He holds a... Read More →

Sanjay Chatterjee

Senior Engineer, NVIDIA

Sanjay Chatterjee is a senior engineer at NVIDIA. He works on runtime system infrastructure and core Kubernetes components to support highly scalable HPC and DL/AI workloads. Previously he worked on DoE/DARPA funded research and advanced technology projects for exascale systems. His... Read More →

CLOSE CAPTIONING TRANSCRIPT GURR 5501 txt

Multi node jobs with K8s Kubecon 2020 pdf

Wednesday August 19, 2020 16:55 - 17:30 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Intermediate (Mid-level experience)
Link to Video Recording youtu.be/yWOcsNv1tKo

17:40 CEST

Pwned By Statistics: How Kubeflow & MLOps Can Help Secure Your ML Workloads - David Aronchick, Microsoft

While machine learning is spreading like wildfire, very little attention has been paid to the ways that it can go wrong when moving from development to production. Even when models work perfectly, they can be attacked and/or degrade quickly if the data changes. Having a well understood MLOps process is necessary for ML security!

Using Kubeflow, we will demonstrate how to the common ways machine learning workflows go wrong, and how to mitigate them using MLOps pipelines to provide reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with less risk, than ever before.

Speakers

David Aronchick

Head of OSS Machine Learning, Microsoft

David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this.Previously, he led product management for Kubernetes, launched Google Kubernetes Engine and... Read More →

Owned By Statistics Using MLOps to Make Machine Learning More Secure 15 min pptx

CLOSE CAPTIONING TRANSCRIPT YTVE 1893 txt

Owned By Statistics Using MLOps to Make Machine Learning More Secure 15 min pdf

Wednesday August 19, 2020 17:40 - 18:15 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Beginner (Very basic information)

13:00 CEST

Hunting For New Particles Leveraging Legacy Infrastructure with Kubernetes - Clemens Lange, CERN

In the search for unknown particles in the CERN Large Hadron Collider’s particle collisions, billions of events need to be analysed. Even though large parts of CERN’s computing infrastructure are deployed using Kubernetes, physics analysis jobs are still being run on classical high throughput computing batch systems. While developing a fully cloud native computing approach, one still needs to have access to the ten-thousands of cores available on the legacy batch system to have sufficient resources for the data processing.

In this presentation, Clemens will demonstrate how complex physics analysis workflows that are written and scheduled using Kubernetes can make use of classical batch systems. The audience will also learn what complexity a realistic physics analysis can reach, and the important role that software containers and Kubernetes play in the context of open science.

Speakers

Clemens Lange

Research Physicist, CERN

Clemens is a particle physicist working on the CMS experiment at the CERN Large Hadron Collider. He currently searches for new, yet unknown phenomena in the data recorded with the CMS detector, developing and calibrating algorithms for their reconstruction and identification. His... Read More →

Aug20 Hunting for New Particles Leveraging Legacy Infrastructure with Kubernetes Clemens Lange pdf

Aug20 Demo Videos pdf

Close Captioning Transcripts UDZP 4041 txt

Thursday August 20, 2020 13:00 - 13:35 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Advanced (Expert level experience)
Link to Video Recording https://youtu.be/a-bV1hazosI

14:30 CEST

Kubeflow 1.0 Update by a Kubeflow Community Product Manager - Josh Bottum, Arrikto

This session will provide a Kubeflow 1.0 Update by a Kubeflow Community Product Manager. The presentation will include a review of the Kubeflow Community and feature development process, the Kubeflow user survey results, and Kubeflow 1.0 features. The talk will highlight significant business benefits and review use cases from top deployments. It will also include a live demonstration of a workflow to build, train and deploy a versioned Kubeflow Pipeline.

Speakers

Josh Bottum

Vice President, Arrikto

I am a Kubeflow Community Product Manager and VP at Arrikto. We simplify storage architectures and operations for K8s platforms.

CLOSE CAPTION TRANSCRIPTS GOQH 1387 2 txt

Thursday August 20, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Advanced (Expert level experience)
Link to Video Recording https://youtu.be/99sA0KXmF7A

17:20 CEST

SpoK - Running Big Data Applications @ Scale on K8s - Srivathsan Canchi & Nagaraj Janardhana, Intuit

At Intuit, customer data sets are growing exponentially with the growth of the business and the capabilities offered. Processing this data and making it available for downstream applications such as ML, Analytics, Exploration etc. is crucial. Following the trend of running the services workload on Kubernetes, we built a data processing platform with Spark on Kubernetes as the backbone. This allowed us to reap all the benefits of well established processes for CI/CD, security and cluster management. With these we were able to reduce the cost footprint of our data processing jobs by 30%, while simultaneously increasing the speed to production.

Speakers

Nagaraj Janardhana

Software Engineer, Intuit

Nagaraj is Principal engineer at Intuit, Mountain View responsible for designing and developing ML and Featurization platforms. In the past he has been involved with developing Data Ingestion and Processing platforms, Identity and Subscription Platforms at Intuit. He has contributed... Read More →

Srivathsan Canchi

Engineering Leader, ML Platform, Intuit

Srivathsan leads the machine learning platform engineering team at Intuit. The ML platform includes real-time distributed featurization, scoring and feedback loops. He has a breadth of experience building high scale mission critical platforms. Srivathsan also has extensive experience... Read More →

CLOSE CAPTION TRANSCRIPTS TQEM 1752 txt

Thursday August 20, 2020 17:20 - 17:55 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Beginner (Very basic information)
Link to Video Recording https://youtu.be/7ds0ad-EB2M

18:05 CEST

MLPerf Meets Kubernetes - Xinyuan Huang & Elvira Dzhuraeva, Cisco

Kubeflow is maturing as a cloud native Machine Learning (ML) platform that simplifies the journey of development, deployment, and management of ML on Kubernetes. As Kubeflow gets increasingly adopted, practitioners are looking beyond functions and starting to explore its performance and cost efficiency in the real world. MLPerf is a state-of-the-art benchmark suite that aims to set an industry standard for end-to-end performance evaluation of ML systems with real-world workloads covering both training and inference phases of ML lifecycle. This talk will provide a brief overview about MLPerf, followed by detailed discussions about how MLPerf can be adapted to evaluate ML performance on Kubeflow and Kubernetes, as well as how the performance results can be leveraged to guide the future design and optimization of cloud native ML platforms based on Kubeflow and Kubernetes.

Speakers

Xinyuan Huang

Technical Leader, Cisco

Xinyuan Huang is a Technical Leader at Cisco, where he leads the performance evaluations and optimizations in cloud and AI/ML systems. He is an active member in MLPerf and Kubeflow community.

Elvira Dzhuraeva

Technical Product Engineer AI/ML, Cisco

Elvira Dzhuraeva is a AI/ML Technical Product Engineer at Cisco where she leads cloud and on-premise ML and AI strategy. She is a Technical Product Manager at Kubeflow and a member in MLPerf community.

Thursday August 20, 2020 18:05 - 18:40 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

Machine Learning + Data

Content Experience Level Intermediate (Mid-level experience)
Link to Video Recording https://youtu.be/ZCffCr73-Zk