Loading…
Virtual Event
August 17–August 20, 2020
Learn More and Register to Attend This Event

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please register here.

Please note: This schedule is automatically displayed in Central European Summer Time (CEST). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Thursday, August 20 • 13:00 - 13:35
Help! Please Rescue Not-ready Nodes Immediately - Xiaoyu Zhang, Alibaba & Di Xu, Ant Financial

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
For a Kubernetes cluster, nodes are crucial to make pods running properly. So it is indispensable to monitor nodes status and detect node problems. Node problem detector (NPD), an open source project in Kubernetes community, is a good answer to address this issue. Nowadays NPD has already been well accepted and widely used in production environments.

Actually identifying the problem is only the first step. What we need to do next is to handle those problems and rescue the nodes.

In this talk, we will list common problems and share how we establish rules to decide whether a node is ready or not and how to fix them if recoverable. Moreover, we will introduce some use scenarios on how we make a 99.9% uptime guarantee with ten thousand nodes in a single cluster. We will  share some experience on how to recover the nodes within 10 minutes as well.

Speakers
avatar for Xiaoyu Zhang

Xiaoyu Zhang

Principal Engineer, Tencent
Xiaoyu Zhang is a principal engineer in Tencent Cloud. He worked for Alibaba Cloud as a senior engineer. He's a member of the Kubernetes organization. He mainly works on Kubernetes project and focuses on docs, kubectl, controller-manager, storage and runtime areas. He had multiple... Read More →
avatar for Di Xu

Di Xu

Senior Engineer, Tencent
Currently, he is working at Tencent as a staff engineer, leading a small team working on open source cloud native projects and distributed cloud platform development. Also, he is a top 50 code contributor in Kubernetes community. He had spoken many times at open source conferences... Read More →



Thursday August 20, 2020 13:00 - 13:35 CEST
InXpo https://onlinexperiences.com/Launch/Event.htm?ShowKey99259
  Operations