Building a Home VM Infrastructure with KubeVirt
KubeVirt is a tool for managing VM infrastructure. With KubeVirt, you can manage VMs on Kubernetes in the same way as containers. I tried KubeVirt to easily spin up VMs at home, so I’ll share the method and my impressions.
What is KubeVirt?
When you describe VMs as manifests, KubeVirt’s Controller creates the VMs for you. The VMs exist on the same network as containers, so you can manage communication with containers and access control using Kubernetes mechanisms.
A CLI tool called virtctl (kubectl virt) is provided, which allows you to start and stop VMs, and connect to VMs via ssh, console, vnc, etc.
There is also a subproject called Containerized Data Importer (CDI). It provides a DataVolume resource that abstracts PersistentVolumeClaim (PVC), enabling you to download VM images and clone DataVolumes for use when starting VMs.
For architecture details, this slide deck is very helpful, so I’ll leave the details to that resource.
Environment
There are two nodes, both with virtualization features enabled. All manifests on my home Kubernetes cluster are managed in my-k8s-cluster and deployed with ArgoCD. By following the README to create a cluster, you should be able to create nearly the same environment (except for IP addresses and domains). Links to the code referenced in this article are also included for your reference.
NFS Server
As preparation, we need to set up a StorageClass and PersistentVolume (PV) for storing VM data. This time, we’ll set up an NFS server on one of the nodes and make it available as a StorageClass using the NFS CSI driver for Kubernetes. Run the following on the node:
sudo apt install nfs-kernel-server
sudo mkdir -p /export/nfs
sudo chmod 777 /export/nfs
cat << EOF >> /etc/exports
/export/nfs 192.168.0.0/24(rw,no_root_squash,no_subtree_check)
EOF
sudo systemctl enable nfs-blkmap.service --now
sudo exportfs -a
192.168.0.0/24 is the network where the nodes reside.
Then apply the following manifest:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/storage-class/mandoloncello-nfs.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: mandoloncello-nfs
provisioner: nfs.csi.k8s.io
parameters:
server: mandoloncello.node.internal.onoe.dev # Node address
share: /export/nfs
mountPermissions: "777"
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.2
allowVolumeExpansion: true
Since Dynamic Volume Provisioning is enabled, there’s no need to prepare PVs manually. When there’s a PVC waiting to be bound, the Driver creates a PV and binds it.
Multus
Since we want to connect VMs not only to the container network but also to the host network, we’ll set up Multus, a Meta CNI Plugin. With Multus, you can attach multiple NICs to a container. KubeVirt natively supports Multus, allowing you to attach multiple NICs to VMs as well.
First, create a bridge called br0 on all nodes (reference: Creating a bridge-connected VM with KVM). Then apply Multus following the official instructions, and also apply a NetworkAttachmentDefinition for br0.
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/multus/bridge.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: underlay-bridge
namespace: kube-public
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "underlay-bridge",
"type": "bridge",
"bridge": "br0",
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/24"
}
}
As an example, let’s add the following annotation to an arbitrary nginx Pod and apply it:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/example/nginx.yaml#L16-L25
annotations:
k8s.v1.cni.cncf.io/networks: |
[
{
"name": "underlay-bridge",
"namespace": "kube-public",
"interface": "eth1",
"ips": [ "192.168.0.171" ]
}
]
When you enter the nginx Pod and check the IP addresses, you’ll see the following:
$ kubectl exec -it nginx-55bb7d4dbd-n4blx -- /bin/bash
root@nginx-55bb7d4dbd-n4blx:/# apt update && apt install iproute2
...
root@nginx-55bb7d4dbd-n4blx:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
link/ether fe:ee:b0:c3:79:f5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.10.141.152/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::fcee:b0ff:fec3:79f5/64 scope link
valid_lft forever preferred_lft forever
3: eth1@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether aa:3a:8d:5c:bf:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.0.171/24 brd 192.168.0.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2400:2650:8022:3c00:a83a:8dff:fe5c:bfc1/64 scope global dynamic mngtmpaddr
valid_lft 293sec preferred_lft 293sec
inet6 fe80::a83a:8dff:fe5c:bfc1/64 scope link
valid_lft forever preferred_lft forever
eth0@if34 (10.10.141.152) is from the regular container network. In addition, there’s eth1@if35 (192.168.0.171), which is the host network NIC.
The same can be done for VMs.
Installing KubeVirt and CDI
Apply KubeVirt following the official instructions.
The configuration manifest (kubevirt.io/v1.KubeVirt) has been partially modified as follows:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/kubevirt/kubevirt-cr.yaml
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
configuration:
network:
permitBridgeInterfaceOnPodNetwork: false
developerConfiguration:
featureGates:
- ExpandDisks
imagePullPolicy: IfNotPresent
The changes from the defaults are permitBridgeInterfaceOnPodNetwork: false and ExpandDisks. Both will be explained later.
CDI is also applied following the official instructions.
The configuration manifest (cdi.kubevirt.io/v1beta1.CDI) is as follows:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/kubevirt/cdi-cr.yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
name: cdi
spec:
config:
podResourceRequirements:
limits:
cpu: '1'
memory: 5Gi
imagePullPolicy: IfNotPresent
infra:
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: CriticalAddonsOnly
operator: Exists
workload:
nodeSelector:
kubernetes.io/os: linux
The change from the defaults is config.podResourceRequirements. The default limits were too small, causing OOMKills during VM image downloads, so they were increased.
Downloading the VM Image
Before creating a VM, apply a DataVolume to download the VM image:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-image.yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: ubuntu-image-2404
spec:
storage:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: mandoloncello-nfs
source:
http:
url: https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
We’re using Ubuntu 24.04 (noble) this time. I initially tried using an ISO file, but booting didn’t work properly, so I’m using a cloud image (img file) instead.
By the way, KubeVirt also provides image files as Container Disks. We won’t use it this time, but you can create a VM from a Container Disk as follows:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/example/vm.yaml#L44
volumes:
- name: containerdisk
containerDisk:
image: quay.io/containerdisks/ubuntu:22.04
Creating the VM
Now let’s apply the manifest that defines the VM. The full manifest is here, but since it’s long, I’ll explain it section by section.
DataVolume
We clone the DataVolume for the VM image we created earlier and use it for VM creation. While you could define another DataVolume separately, DataVolumes can be described as templates within the VM manifest. You can clone as follows:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L9-L23
dataVolumeTemplates:
- metadata:
name: vm-pg-1
spec:
storage:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 64Gi
storageClassName: mandoloncello-nfs
source:
pvc:
name: ubuntu-image-2404
namespace: playground
This is where the ExpandDisks setting comes into play. The DataVolume for the VM image is set to 5 GiB, while this DataVolume is set to 64 GiB. The PVC requests 64 GiB, but the VM that runs on it can only see 5 GiB. By enabling ExpandDisks, the VM can see the full 64 GiB.
Resource
Configure the CPU and Memory for the VM. The CPU and Memory specified here are what the VM sees.
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L30-L34
domain:
cpu:
cores: 8
memory:
guest: 8Gi
You can also set requests and limits in a separate section, just like regular Pods. These are used for VM scheduling and don’t represent the actual resources visible to the VM. The values of domain.cpu and domain.memory must be between the requests and limits. If domain.cpu and domain.memory are not set, the resources.requests values will be visible to the VM instead.
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L55-L61
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: '8'
memory: 8Gi
Volume
Configure the disks as follows:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L36-L45
disks:
- disk:
bus: virtio
name: disk0
bootOrder: 1
- cdrom:
bus: sata
readonly: true
name: cloudinitdisk
bootOrder: 2
The actual backing storage is configured as follows:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L68-L72
volumes:
- name: disk0
persistentVolumeClaim:
claimName: vm-pg-1
- cloudInitNoCloud:
The first disk, disk0, is the PVC from earlier. The second, cloudinitdisk, is used for CloudInit. It is configured as follows:
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L74-L86
userData: |
#cloud-config
hostname: vm-pg-1
users:
- name: onoe
ssh_import_id: gh:hiroyaonoe
lock_passwd: false
passwd: $6$salt$IxDD3jeSOb5eB1CX5LBsqZFVkJdido3OUILO5Ifz5iwMuTS4XMS130MTSuDDl3aCI6WouIL9AjRbLCelDCy.g.
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
uid: 1000
ssh_pwauth: true
disable_root: false
Network
As explained earlier, in addition to the regular container network, we connect to the host network using Multus. In addition to the manifest configuration, we also use CloudInit for setup.
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L46-L52
interfaces:
- name: default
masquerade: {}
bootOrder: 3
- name: underlay
bridge: {}
bootOrder: 4
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L62-L67
networks:
- name: default
pod: {}
- name: underlay
multus:
networkName: kube-public/underlay-bridge
# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L87-L95
networkData: |
version: 2
ethernets:
enp1s0:
dhcp4: true
enp2s0:
dhcp4: false
addresses: [192.168.0.162/24]
gateway4: 192.168.0.1
The first interface, default (enp1s0), is the container network. It connects to the container network through a virt-launcher Pod that is created when the VM starts. In masquerade mode, the VM is on the same network as virt-launcher (separate from the container network, defaulting to 10.0.2.0/24). The VM receives an IP address from virt-launcher via DHCP. Communication between the VM and the container network is achieved through NAT by virt-launcher. If masquerade mode is not used, some CNI Plugins may not be able to communicate properly, and the setting to enforce masquerade mode is permitBridgeInterfaceOnPodNetwork: false.
The second interface, underlay (enp2s0), is the host network. It is associated with the NetworkAttachmentDefinition we created earlier (namespace: kube-public, name: underlay-bridge). The address is statically assigned using CloudInit.
Starting the VM
Once the DataVolume download and clone are complete, setting running: true will create a VirtualMachineInstance. A virt-launcher Pod is also created. This Pod uses libvirtd and qemu to create the actual VM. As explained earlier, virt-launcher also manages the VM’s network.
Let’s actually connect to the VM. There are several methods including console, vnc, and ssh. Here we’ll use ssh. There are also several ways to SSH in. The first is using virtctl with virtctl ssh vm-pg-1. The second is SSHing through the host network. The third is exposing port 22 as a NodePort Service and SSHing through the container network. virtctl is generally the easiest, but if you want to use specific SSH options, the second or third method is better.
Let’s connect to the VM and check various things:
onoe@vm-pg-1:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
...
onoe@vm-pg-1:~$ free -h
total used free shared buff/cache available
Mem: 7.8Gi 520Mi 7.2Gi 1.1Mi 300Mi 7.2Gi
Swap: 0B 0B 0B
onoe@vm-pg-1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 794M 1.1M 793M 1% /run
/dev/vda1 61G 1.5G 60G 3% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/vda16 881M 61M 758M 8% /boot
/dev/vda15 105M 6.1M 99M 6% /boot/efi
tmpfs 794M 12K 794M 1% /run/user/1000
onoe@vm-pg-1:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
link/ether b6:29:60:82:0b:eb brd ff:ff:ff:ff:ff:ff
inet 10.0.2.2/24 metric 100 brd 10.0.2.255 scope global dynamic enp1s0
valid_lft 86301118sec preferred_lft 86301118sec
inet6 fe80::b429:60ff:fe82:beb/64 scope link
valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 62:d0:72:71:e8:f5 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.162/24 brd 192.168.0.255 scope global enp2s0
valid_lft forever preferred_lft forever
inet6 2400:2650:8022:3c00:60d0:72ff:fe71:e8f5/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 297sec preferred_lft 297sec
inet6 fe80::60d0:72ff:fe71:e8f5/64 scope link
valid_lft forever preferred_lft forever
Everything is configured correctly.
Impressions of Using KubeVirt
One advantage of KubeVirt is the ability to run VMs on the container network. However, when migrating an existing VM infrastructure to Kubernetes, rather than migrating VMs as-is to KubeVirt on Kubernetes, it might be easier in terms of both migration cost and future management cost to simply replace VMs with containers and run them on Kubernetes. KubeVirt may be valuable for use cases where VMs are absolutely necessary, but if the number of such VMs is small, building individual networks might be sufficient.
Another advantage is managing VMs as Infrastructure as Code. While this is also achievable with tools like Terraform, it’s convenient to manage VMs using the familiar Kubernetes framework. However, this isn’t limited to KubeVirt – the same applies to PVs, StatefulSets, etc. – managing stateful resources in Kubernetes, which uses declarative configuration management, can be quite challenging. Kubernetes aims to converge to the desired state through reconciliation, but it doesn’t guarantee that the resource actually exists. This might not be an issue with proper management, but personally it feels like it could be painful.
My impressions turned out somewhat negative, but I haven’t used OpenStack or similar tools, and those who operate large-scale VM infrastructure in production environments might have different opinions.
Conclusion
Working with KubeVirt, NFS, Multus, and related technologies was a great learning experience. While it’s overkill for a home VM infrastructure, it’s fun, so I plan to keep running it.
