Loading…
Back To Schedule
Wednesday, August 19 • 16:55 - 17:30
Is There a Place For Distributed Storage For AI/ML on Kubernetes? - Diane Feddema & Kyle Bader, Red Hat

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Containerized machine learning workloads running on Kubernetes receive benefits such as portability, declarative configuration, less administrative toil, all with marginal performance impact. The best published results for performance sensitive machine learning workloads, e.g. MLPerf v0.6, were obtained by reading the datasets from local SSDs. While the MLPerf datasets fit comfortably on a single SSD, it’s a luxury not afforded to folks training models against petabyte scale datasets. We’ll share our experience running MLPerf training jobs in Kubernetes, against datasets stored by Kubernetes stateful storage services orchestrated by Rook. Highlights include the performance and scalability tradeoffs associated with local and open source distributed storage, and how machine learning formats like RecordIO and TFRecord provide performance utility and model validation flexibility.

Speakers
avatar for Kyle Bader

Kyle Bader

Data Foundation Architect, Red Hat
Kyle is the Data Foundation Architect covering both OpenShift Data Foundation and Red Hat Ceph Storage products at Red Hat. His focus is at the intersection of open source, distributed storage systems, data engineering, and machine learning.
avatar for Diane Feddema

Diane Feddema

Principal Software Engineer, Red Hat
Diane Feddema is a principal software engineer at Red Hat Inc Canada, in the AI Center of Excellence. Diane is currently focused on developing and applying machine learning techniques for performance analysis using hardware accelerators, automating these analyses and displaying data... Read More →



Wednesday August 19, 2020 16:55 - 17:30 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Storage