ElasticSky

5. October 20211. October 2021

GPU enabled K8s Clusters in vSphere with Tanzu

Using GPU in container workloads is an important demand by developers who work with machine learning and artificial intelligence.

You can create a custom VM class where a VI admin can define a vGPU specification for that class. Developers can use this class to assign GPU resources to the workload. The vm class will define node placement an vGPU profile.

This not only available to GPU enabled TKG clusters, but also for standalone VMs. The use of custom classes will simplify the consumption of GPU resources in ML/AI applications.

See a sample class below

kind: TanzuKubernetesCluster
apiVersion: run.tanzu.vmware.com/v1
metadata:
  name: GPU-Cluster
spec:
  topology:
    workers:
      count: 3
      class: gpu-vmclass
  distribution: v1.20.2

This class can be consumed for example in a VM

kind: VirtualMachine
metadata:
  name: gpu-vm
  namespace: tkg-dev
spec:
  networkInterfaces:
  - networkName: "dev-network"
    networkType: vsphere-distributed
  classname: gpu-vmclass
  imageName: ubuntu-custom-gpu    
  storageClass: GPU-vm-policy

This blogpost used to be part of my recent vSphere7 Update3 What’s new artice, but has been withdrawn at VMware’s request with an extended embargo until October 5 2021.

28. September 202115. March 2024

vSphere 7 Update 3 – What’s New

This blogpost was under embargo until 28th of September 2021 8:00am (PT) / 17:00 (CEST). The fact that you can read this now means that vSphere 7 Update 3 has (probably) already been released.

[Update 29th Sept 2021]: Download is not yet available. Maybe we need to wait until VMworld2021 next week.

What’s New

VMware vSphere 7 Update3 comes with a wide range of innovations. They can be categorized into the sections below:

Tanzu with Kubernetes
Lifecycle, Upgrade and Patching
Artificial Intelligence & Machine Learning
Resource Management
Availability & Resiliency
Security & Compliance
Guest OS and Workloads
Storage
Networking
vSphere Management & APIs

Another bunch of features goes into vSAN. But these features will be covered in an extra post.

5. September 20215. September 2021

My personal selection for VMworld 2021

VMworld 2021 is now taking place virtually for the second time. Even if this is not a real substitute for an on-site event, it is a good source of information in times of pandemic.

VMworld will take place virtually in several time zones between October 5 and 7, 2021. Participation is free of charge. All you need to do is register on the VMworld portal page.

My selection

This list are my selected favorites. The order is purely alphabetical and no ranking. A shift towards Kubernetes, Modern Apps and GPUaaS has been noticeable in recent years. The latter was added in the current year by a Bitfusion with Kubernetes project. However, some classics like Frank’s 60 Minutes of NUMA, talks by Cormac Hogan and Duncan Epping are still part of the must-attend program.

10 Things You Need to Know About Project Monterey [MCL1833] – Sudhanshu Jain, Niels Hagoort
60 Minutes of Non-Uniform Memory Access (NUMA) 3rd Edition [MCL1853] – Frank Denneman
Antrea and NSX-T update for Container Networking [CODE2743] – Tuan Loc Nguyen, Rahul Dondeti
Architect the Enterprise Data Center for AI with VMware and NVIDIA [VI1501] – James Brogan, Joe Cullen
Attach GPU Anywhere with vSphere Bitfusion Extension [VMTN2801] – Tiejun Chen
Build and Publish a PowerShell Module to the PowerShell Gallery [CODE2756] – David Stamen
Deep Dive on Logical Routing in NSX-T [NET1443] – Francois Tallet, Nicolas Michel
Deep Dive on vSphere with Tanzu Updates [APP2063] – Karthik Balachandran
Extreme Performance Series: Performance Best Practices [MCL1635] – Mark Achtemichuk, Valentin Bondzio
Get the Most Out of VMware NSX Data Center with Advanced Load Balancing [NET1791] – Dan Watson
How to measure and improve the performance of your HCX migrations? [VMTN3225] – Agnieszka Koziorowska
Live Coding: Terraforming Your vSphere Environment [CODE2755] – Kyle Ruddy
Loop, Swoop and Pull – PowerCLI Will be as Easy as Tying Your Shoes! [CODE2744] – Justin Sider
Maximize GPU Utilization with VMware Tanzu/Kubernetes and vSphere Bitfusion [VI1624] – Earl Ruby
Modernize Windows Apps: Introduction to Windows Containers on Kubernetes [APP1999] – Stuart Preston
NSX Advanced Threat Prevention: Deep Dive [SEC1376] – Stijn Vanveerdeghem
NVMe/TCP – The Future of Storage Connectivity [MCL2766S] – Paul Turner, Ihab Tarazi
Project Monterey: Present, Future and Beyond [MCL1401] – Sudhanshu Jain, Simer Singh
The Future of VM Provisioning – Enabling VM Lifecycle Through Kubernetes [APP1564] – Myles Gray, Nikitha Suryadevara
VEBA Revolutions – Unleashing the Power of Event-Driven Automation [CODE2773] – William Lam, Michael Gasch
VMware Cloud Foundation Tips and Tricks from the Trenches [MCL1025] – Dharmesh Bhatt, Paudie O’Riordan
VMware vSAN – Dynamic Volumes for Traditional and Modern Applications [MCL1084] – Duncan Epping, Cormac Hogan
vSAN Technical Deep Dive [MCL1654] – Biswapati Bhattacharjee, Junchi Zhang
Want to deploy your SDDCLab in about an hour? [VMTN3192] – Luis Chanu, Rutger Blom
What’s New in NSX-T [NET2354] – Varun Santosh, Soumee Phatak
What’s New in vSphere [APP1205] – Himanshu Singh, Ken Werneburg

8. July 20215. September 2021

VMware Bitfusion and Tanzu – Part 3: Utilize GPU from Kubernetes Pods and TKGS

This will be a multi-part post focused on the VMware Bitfusion product. I will give an introduction to the technology, how to set up a Bitfusion server and how to use its services from Kubernetes pods.

Part 1 : A primer to Bitfusion
Part 2 : Bitfusion server setup
Part 3 : Using Bitfusion from Kubernetes pods and TKGS. (this article)

We saw in parts 1 and 2 what Bitfusion is and how to set up a Bitfusion Server cluster. The challenging part is to make this Bitfusion cluster usable from Kubernetes pods.

In order for containers to access Bitfusion GPU resources, a few general conditions must be met.

I assume in this tutorial that we have a configured vSphere-Tanzu cluster available, as well as a namespace, a user, a storage class and the Kubernetes CLI tools. The network can be organized with either NSX-T or distributed vSwitches and a load balancer such as the AVI load balancer.

In the PoC described, Tanzu on vSphere was used without NSX-T for simplicity. The AVI load balancer, now officially called NSX-Advanced load balancer, was used.

We also need a Linux system with access to Github or a mirror to prepare the cluster.

The procedure in a nutshell:

Create TKGS cluster
Get Bitfusion baremetal token laden and create K8s secret
Load Git project and modify makefile
Deploy device-plugin to TKGS-cluster
Pod deployment