Loading…
Virtual Event
August 17–August 20, 2020
Learn More and Register to Attend This Event

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please register here.

Please note: This schedule is automatically displayed in Central European Summer Time (CEST). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Machine Learning + Data [clear filter]
Tuesday, August 18
 

13:45 CEST

Is Sharing GPU to Multiple Containers Feasible? - Samed Güner, SAP
Provisioning GPUs for ML workloads in data center can be very costly and more costly if they are not fully utilized. Thus, maximizing the GPU utilization is a must for ML workloads.

This session will show how a single GPU can be used to run multiple ML workloads, especially ML inference, in parallel and will deep dive into the understanding of how GPUs are provisioned and attached using K8s device plugins. It will show how the nvidia device plugin can be extended to schedule multiple ML workloads to a single GPU and collect desired GPU information with Prometheus.

This session will highlight and deep dive into native GPU sharing using K8s device plugin without additional technologies such as vGPUs from VMware.

Speakers
avatar for Samed Güner

Samed Güner

Senior DevOps Engineer, SAP
Samed Güner is currently working as a software engineer at SAP Artificial Intelligence embedding large scale AI into enterprise applications. He mainly works on infrastructure and K8s with a strong focus on leveraging DevOps principles. He previously worked on projects such as distributed... Read More →



Tuesday August 18, 2020 13:45 - 14:20 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

14:30 CEST

Enabling Multi-user Machine Learning Workflows for Kubeflow Pipelines - Yannis Zarkadas, Arrikto & Yuan Gong, Google
Kubeflow is an open source machine learning platform built on Kubernetes. Every service in Kubeflow is implemented either as a Custom Resource Definition (CRD) (e.g., TensorFlow Job) or as a standalone service (e.g., Kubeflow Pipelines).

As enterprises start to adopt Kubeflow, the need for access control, authentication, and authorization is emerging. Kubernetes CRDs come with their own auth story, but what about Services with their own API and database, like Kubeflow Pipelines? In this talk, we explore how we enabled multi-user workflows for Kubeflow Pipelines, in a Kubernetes-native way.

We present how we combined open-source, cloud-native technologies to design and implement a flexible, Kubernetes-native solution for services with their own API and database. The talk will include a live demo.

Speakers
avatar for Yannis Zarkadas

Yannis Zarkadas

Software Engineer, Arrikto
Yannis is a software engineer at Arrikto, working with Kubeflow and the Kubernetes sig-storage group. He loves contributing to open source projects and has authored the Cassandra Operator in Rook and the official Scylla Operator, which he is currently maintaining.
avatar for Yuan Gong

Yuan Gong

Software Engineer, Google Cloud
I'm a software engineer at Google Cloud working on Kubeflow Pipelines project (https://github.com/kubeflow/pipelines).



Tuesday August 18, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
 
Wednesday, August 19
 

13:45 CEST

Taming Data/State Challenges for ML Applications and Kubeflow - Skyler Thomas, Hewlett Packard Enterprise
The Kubeflow project brings incredibly powerful Machine Learning frameworks like TensorFlow and PyTorch to Kubernetes. The ability to parallelize training and the ability to scale workflows up and down is revolutionary. However, state and persistent storage are a much bigger challenge for machine learning workloads because of their training data, library files, and models. We will discuss what it took to create AI/ML environments running thousands of pods and that request petabytes of training data.

We will explore the various state and storage challenges that crop up when you are building Kubeflow applications. We will discuss where distributed persistent storage solutions fit in the picture. We will address various storage api's including: POSIX/CSI solutions, NFS, S3, and HDFS fit into solutions. Data security and privacy issues will be discussed.

Speakers
avatar for Skyler Thomas

Skyler Thomas

Distinguished Technologist, Hewlett Packard Enterprise
Skyler Thomas is a Distinguished Technologist and Hewlett Packard Enterprise. He the chief architect for Kubernetes based Artificial Intelligence and Machine Learning at HPE. He joined HPE in the MapR acquisition where he he helped customers design ML simulation environments with... Read More →



Wednesday August 19, 2020 13:45 - 14:20 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

14:30 CEST

How to Use Kubernetes to Build a Data Lake for AI Workloads - Peter MacKinnon & Uday Boppana, Red Hat
The recent popularity of hybrid cloud architectures have left organizations with traditional data lakes behind. How do you build a data architecture that is cloud provider agnostic and is flexible enough to run in across public and private cloud data centers? How do you make this data available to data scientists and developers in a way that simplifies the creation of intelligent applications?

This talk will walk through a new way of building data lakes for the hybrid cloud using Rook and Ceph community projects running on Kubernetes. With a single data architecture deployment that can run in any cloud or across multiple clouds, IT and data engineers can use open source tools on Kubernetes such as Rook, Ceph, Hive metastore, Spark, and Presto to provide unified access to massive amounts of data across multiple data centers for data scientists and developers.

Speakers
avatar for Peter MacKinnon

Peter MacKinnon

Principal Software Engineer, Red Hat Inc.
Pete MacKinnon is a Principal Software Engineer in the AI Center of Excellence at Red Hat. He is actively involved in the Kubeflow and Open Data Hub open source projects. He works closely with Red Hat customers and partners to successfully bring their machine learning and analytics... Read More →
avatar for Uday Boppana

Uday Boppana

Senior Principal Product Manager, Red Hat
Uday Boppana is a Senior Principal Product Manager at Red Hat, responsible for Big Data and AI/ML data services . He has experience working in AI/ML, hybrid cloud, data center, data services and storage solutions in different roles and with a variety of technologies. In prior roles... Read More →



Wednesday August 19, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

16:55 CEST

Production Multi-node Jobs with Gang Scheduling, K8s, GPUs and RDMA - Madhukar Korupolu & Sanjay Chatterjee, NVIDIA
With the growing scale of DL and ML applications, distributed execution of jobs across multiple nodes becomes increasingly critical -- to solve bigger problems faster -- as illustrated by the recent MLperf results. However running such workloads in a production K8s cluster shared by multiple jobs/users has several challenges.

In this talk, we’ll give an overview of this area -- including distributed Tensorflow, Pytorch, Horovod, MPI -- and the use of GPU nodes with NCCL and RDMA for accelerated performance. We’ll describe our end-to-end flow for multi-node jobs in K8s including gang scheduling, quotas, fairness and backfilling implemented in our custom scheduler for GPUs. Our cluster includes high-speed networking through RoCE and SR-IOV / Multus CNI. We’ll share our design choices, learnings and operational experience including failure handling, performance and telemetry.

Speakers
avatar for Madhukar Korupolu

Madhukar Korupolu

Distinguished Engineer, NVIDIA
Madhukar is an architect at NVIDIA working on GPU clusters for AI and ML workloads. Areas of interest and experience include AI / ML infra, GPU acceleration, Cloud computing, Distributed systems, Kubernetes, HPC, CDNs etc with previous stints at Google, IBM and Akamai. He holds a... Read More →
avatar for Sanjay Chatterjee

Sanjay Chatterjee

Senior Engineer, NVIDIA
Sanjay Chatterjee is a senior engineer at NVIDIA. He works on runtime system infrastructure and core Kubernetes components to support highly scalable HPC and DL/AI workloads. Previously he worked on DoE/DARPA funded research and advanced technology projects for exascale systems. His... Read More →



Wednesday August 19, 2020 16:55 - 17:30 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

17:40 CEST

Pwned By Statistics: How Kubeflow & MLOps Can Help Secure Your ML Workloads - David Aronchick, Microsoft
While machine learning is spreading like wildfire, very little attention has been paid to the ways that it can go wrong when moving from development to production. Even when models work perfectly, they can be attacked and/or degrade quickly if the data changes. Having a well understood MLOps process is necessary for ML security!

Using Kubeflow, we will demonstrate how to the common ways machine learning workflows go wrong, and how to mitigate them using MLOps pipelines to provide reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with less risk, than ever before.

Speakers
avatar for David Aronchick

David Aronchick

Head of OSS Machine Learning, Microsoft
David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this.Previously, he led product management for Kubernetes, launched Google Kubernetes Engine and... Read More →



Wednesday August 19, 2020 17:40 - 18:15 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
 
Thursday, August 20
 

13:00 CEST

Hunting For New Particles Leveraging Legacy Infrastructure with Kubernetes - Clemens Lange, CERN
In the search for unknown particles in the CERN Large Hadron Collider’s particle collisions, billions of events need to be analysed. Even though large parts of CERN’s computing infrastructure are deployed using Kubernetes, physics analysis jobs are still being run on classical high throughput computing batch systems. While developing a fully cloud native computing approach, one still needs to have access to the ten-thousands of cores available on the legacy batch system to have sufficient resources for the data processing.

In this presentation, Clemens will demonstrate how complex physics analysis workflows that are written and scheduled using Kubernetes can make use of classical batch systems. The audience will also learn what complexity a realistic physics analysis can reach, and the important role that software containers and Kubernetes play in the context of open science.

Speakers
avatar for Clemens Lange

Clemens Lange

Research Physicist, CERN
Clemens is a particle physicist working on the CMS experiment at the CERN Large Hadron Collider. He currently searches for new, yet unknown phenomena in the data recorded with the CMS detector, developing and calibrating algorithms for their reconstruction and identification. His... Read More →



Thursday August 20, 2020 13:00 - 13:35 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

14:30 CEST

Kubeflow 1.0 Update by a Kubeflow Community Product Manager - Josh Bottum, Arrikto
This session will provide a Kubeflow 1.0 Update by a Kubeflow Community Product Manager.  The presentation will include a review of the Kubeflow Community and feature development process, the Kubeflow user survey results, and Kubeflow 1.0 features.  The talk will highlight significant business benefits and review use cases from top deployments.  It will also include a live demonstration of a workflow to build, train and deploy a versioned Kubeflow Pipeline.

Speakers
avatar for Josh Bottum

Josh Bottum

Vice President, Arrikto
I am a Kubeflow Community Product Manager and VP at Arrikto. We simplify storage architectures and operations for K8s platforms.



Thursday August 20, 2020 14:30 - 15:05 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

17:20 CEST

SpoK - Running Big Data Applications @ Scale on K8s - Srivathsan Canchi & Nagaraj Janardhana, Intuit
At Intuit, customer data sets are growing exponentially with the growth of the business and the capabilities offered. Processing this data and making it available for downstream applications such as ML, Analytics, Exploration etc. is crucial. Following the trend of running the services workload on Kubernetes, we built a data processing platform with Spark on Kubernetes as the backbone. This allowed us to reap all the benefits of well established processes for CI/CD, security and cluster management. With these we were able to reduce the cost footprint of our data processing jobs by 30%, while simultaneously increasing the speed to production.

Speakers
avatar for Nagaraj Janardhana

Nagaraj Janardhana

Software Engineer, Intuit
Nagaraj is Principal engineer at Intuit, Mountain View responsible for designing and developing ML and Featurization platforms. In the past he has been involved with developing Data Ingestion and Processing platforms, Identity and Subscription Platforms at Intuit. He has contributed... Read More →
avatar for Srivathsan Canchi

Srivathsan Canchi

Engineering Leader, ML Platform, Intuit
Srivathsan leads the machine learning platform engineering team at Intuit. The ML platform includes real-time distributed featurization, scoring and feedback loops. He has a breadth of experience building high scale mission critical platforms. Srivathsan also has extensive experience... Read More →



Thursday August 20, 2020 17:20 - 17:55 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259

18:05 CEST

MLPerf Meets Kubernetes - Xinyuan Huang & Elvira Dzhuraeva, Cisco
Kubeflow is maturing as a cloud native Machine Learning (ML) platform that simplifies the journey of development, deployment, and management of ML on Kubernetes. As Kubeflow gets increasingly adopted, practitioners are looking beyond functions and starting to explore its performance and cost efficiency in the real world. MLPerf is a state-of-the-art benchmark suite that aims to set an industry standard for end-to-end performance evaluation of ML systems with real-world workloads covering both training and inference phases of ML lifecycle. This talk will provide a brief overview about MLPerf, followed by detailed discussions about how MLPerf can be adapted to evaluate ML performance on Kubeflow and Kubernetes, as well as how the performance results can be leveraged to guide the future design and optimization of cloud native ML platforms based on Kubeflow and Kubernetes.

Speakers
avatar for Xinyuan Huang

Xinyuan Huang

Technical Leader, Cisco
Xinyuan Huang is a Technical Leader at Cisco, where he leads the performance evaluations and optimizations in cloud and AI/ML systems. He is an active member in MLPerf and Kubeflow community.
avatar for Elvira Dzhuraeva

Elvira Dzhuraeva

Technical Product Engineer AI/ML, Cisco
Elvira Dzhuraeva is a AI/ML Technical Product Engineer at Cisco where she leads cloud and on-premise ML and AI strategy. She is a Technical Product Manager at Kubeflow and a member in MLPerf community.


Thursday August 20, 2020 18:05 - 18:40 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
 
  • Timezone
  • Filter By Date KubeCon + CloudNativeCon Europe 2020 Aug 17 -20, 2020
  • Filter By Venue Virtual
  • Filter By Type
  • 101 Track
  • Application + Development
  • Breaks
  • Case Studies
  • CI/CD
  • CNCF Membership Benefits Office Hours
  • Co-Located Events
  • Community
  • Customizing + Extending Kubernetes
  • Experiences
  • Expo Hall
  • FinOps Summit
  • Keynote Sessions
  • Lightning Talk Sessions
  • Machine Learning + Data
  • Maintainer Track Sessions
  • Meet the Maintainers
  • Networking
  • Observability
  • Operations
  • Performance
  • Runtimes
  • Security + Identity + Policy
  • Serverless
  • Service Mesh
  • Storage
  • Tutorials
  • Content Experience Level

Twitter Feed

Filter sessions
Apply filters to sessions.