I see a lot of people at the moment looking at alternative ways to run Virtual Machines (VMs). Where VMs used to dominate, VMware vSphere was almost a no-brainer. As more people people start dipping their toes into containers, it starts to make more sense to run VMs on that same platform… If only that was possible… And it is: Enter KubeVirt.
But with KubeVirt on the rise, now I see many people who loose the idea of what is really happening under the covers. Are KubeVirt VMs “slower” because it runs VMs on top of a container platform? (Spoiler alert: They’re not 😉)
KubeVirt described in a short and sweet way
The easiest to understand way I’ve seen KubeVirt VMs described (thank you Red Hat !) is actually pretty simple to understand:
“A VM is a process running on Linux”
“A container manages a process running on Linux”
And so one plus one makes two. So to make this clear once and for all, there is no “layering” of different hypervisors of any kind unless you’d run the k8s cluster itself virtualized (which would not make a lot of sense unless you’re playing in a lab). KubeVirt has all the tools to configure, build and run a container that manages a VM process. The upshot of this is that managing a VM becomes pretty much the same as managing pods / containers in k8s:
In this blog post I won’t be diving any deeper in the inner workings of KubeVirt (like describing libvirt and QEMU), but it is very much parallel to running a VM through KVM or OpenStack.
Adding storage and networking to a KubeVirt VM
As a KubeVirt VM is embedded in a container, it can (actually must) use any resources provided to that container. So virtual networking: Extends to your VMs. CSI Storage: Extends to your VMs… So adding storage to a VM is nothing more than adding a PVC to a k8s pod.
So yes, you can still use your CSI-driver enabled external storage array to claim volumes that will run your VMs. Some “limits” apply though as we will see next.
Live migration of VM workloads in KubeVirt
Can you do live migration of a VM in KubeVirt? The answer is a definitive YES. However there is also a “but”: In order to perform live migrations (which is what you want if you are serious about running VMs in production) you need RWX (multi-writer) supported storage.
The reason for this isn’t the fact that a migrating VM and its newly created twin want to perform writes to their disk in parallel; it is merely about the idea that both CAN write to the same volume so during the cut-over from original to new instance there is nothing specific to handle at the storage layer.
Lots of people think NFS mounted storage once they read RWX. But there is another way that we can support RWX, and that is the almost forgotten about RAWBLOCK mode. In RAWBLOCK mode you get a DISK device surfaced up into the container with an option to enable it for Multi-Writer (RWX). This seemed pretty useless for most “modern” use cases; why would you have multiple containers write to the same disk (instead of a file system which is a more general use case for RWX) at the same time? A quorum disk might be a use case here, but in reality I’ve not seen this use case at customers. But when we project the RAWBLOCK idea onto a VM running in KubeVirt, everything comes together!
“KubeVirt is the first really good use case I’ve seen for RAWBLOCK PVCs”
A VM would actually PREFER to have a direct disk mounted, as it needs a block device to boot and an NFS volume would simply add a layer where you need to convert blocks into segments inside a file which can (and probably will) hinder performance. Now add the ability for RAWBLOCK to support RWX and you have a direct block device into a KubeVirt-managed VM that is live migration capable!
From a Dell perspective, we support RAWBLOCK in RWX mode on all our block-capable arrays: PowerStore, PowerMax, PowerFlex and Unity (XT).
The “need” to run KubeVirt on Bare Metal
Will KubeVirt run on a k8s cluster that runs on for example VMware vSphere? Yes it will, but… Would that make sense? For a lab environment yes, for production… Not so much: You’d simply run the workload on the underlying hypervisor and not stack two hypervisors on top of each other. If you still want to travel this path, you’d have to enable “Expose hardware assisted virtualization to the guest OS” at the CPU level (assuming VMware vSphere) to get this working as you’d be nesting virtualization:
The more “normal” way to run KubeVirt VMs is on bare metal. This would enable the VMs to run directly on the underlying bare-metal installed Linux OS, so performance (and architecture) would be similar to a KVM Hypervisor approach. Add CSI-enabled external storage to the setup and you can solidly run your VMs!
Running k8s on bare metal does add complexity though… Many customers have been running platforms like Red Hat OpenShift on a hypervisor just to avoid this added complexity of managing bare metal servers. Come to think of this… Running a K8s platform on bare metal with a storage array underneath… Dell just might have a solution for that 😉
Using the Dell APEX Cloud Platform (ACP) for Red Hat OpenShift to run VMs
The Dell APEX Cloud Platform is like a black box containing Dell PowerEdge MC node servers, Dell PowerFlex storage and managing “foundation software” ready to run a platform of choice. Currently available platforms are the ACP for MS Azure and ACP for Red Hat OpenShift (with the option for VMware coming soon).
This truly is a “scale-out black box” that deploys and maintains itself. Basically feed a new system a config file and the ACP’s foundation software will deploy the complete system (“day 1”). After this initial run, the system remains managed as one (“day 2”): Fully lifecycle managed from OpenShift right down to the firmware levels and everything in between! And as the exact configuration is known by both Dell and Red Hat, all update packages come fully pre-tested (so you don’t have to). You simply move from one “known good state” to the next. Add a flexible way to pay for the solution on top (either through CAPEX, OPEX or a mix with a “pay what you use” option). What’s not to like!! This is as close as you can get to run apps in a “cloud-like way” on-premise (or co-located).
By default the APEX solution for Red Hat OpenShift allows you to run containers only. But adding the Operator to enable Red Hat OpenShift Virtualization is very easy to do, after which you can run VMs at full speed, directly from RWX-enabled block volumes coming from the PowerFlex storage system underneath.
“SCALE OOOUUTTTT”
As both layers (Red Hat OpenShift and Dell PowerFlex) scale out linearly, you are set for both right now AND the future. I really should say SCALE OOOUUUTTT as we have PowerFlex instances in the field that have grown linearly in both capacity and performance and are now beyond the 1000 (yes one thousand!) node mark without ever breaking the linear scaling of performance. Imagine the amount of IOPS flying out of that 😎