Building a Home VM Infrastructure with KubeVirt

KubeVirt is a tool for managing VM infrastructure. With KubeVirt, you can manage VMs on Kubernetes in the same way as containers. I tried KubeVirt to easily spin up VMs at home, so I’ll share the method and my impressions.

The Japanese version of this article is available here.

What is KubeVirt?

When you describe VMs as manifests, KubeVirt’s Controller creates the VMs for you. The VMs exist on the same network as containers, so you can manage communication with containers and access control using Kubernetes mechanisms.

A CLI tool called virtctl (kubectl virt) is provided, which allows you to start and stop VMs, and connect to VMs via ssh, console, vnc, etc.

There is also a subproject called Containerized Data Importer (CDI). It provides a DataVolume resource that abstracts PersistentVolumeClaim (PVC), enabling you to download VM images and clone DataVolumes for use when starting VMs.

For architecture details, this slide deck is very helpful, so I’ll leave the details to that resource.

Environment

There are two nodes, both with virtualization features enabled. All manifests on my home Kubernetes cluster are managed in my-k8s-cluster and deployed with ArgoCD. By following the README to create a cluster, you should be able to create nearly the same environment (except for IP addresses and domains). Links to the code referenced in this article are also included for your reference.

NFS Server

As preparation, we need to set up a StorageClass and PersistentVolume (PV) for storing VM data. This time, we’ll set up an NFS server on one of the nodes and make it available as a StorageClass using the NFS CSI driver for Kubernetes. Run the following on the node:

sudo apt install nfs-kernel-server
sudo mkdir -p /export/nfs
sudo chmod 777 /export/nfs
cat << EOF >> /etc/exports
/export/nfs 192.168.0.0/24(rw,no_root_squash,no_subtree_check)
EOF
sudo systemctl enable nfs-blkmap.service --now
sudo exportfs -a

192.168.0.0/24 is the network where the nodes reside.

Then apply the following manifest:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/storage-class/mandoloncello-nfs.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mandoloncello-nfs
provisioner: nfs.csi.k8s.io
parameters:
  server: mandoloncello.node.internal.onoe.dev # Node address
  share: /export/nfs
  mountPermissions: "777"
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.2
allowVolumeExpansion: true

Since Dynamic Volume Provisioning is enabled, there’s no need to prepare PVs manually. When there’s a PVC waiting to be bound, the Driver creates a PV and binds it.

Multus

Since we want to connect VMs not only to the container network but also to the host network, we’ll set up Multus, a Meta CNI Plugin. With Multus, you can attach multiple NICs to a container. KubeVirt natively supports Multus, allowing you to attach multiple NICs to VMs as well.

First, create a bridge called br0 on all nodes (reference: Creating a bridge-connected VM with KVM). Then apply Multus following the official instructions, and also apply a NetworkAttachmentDefinition for br0.

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/multus/bridge.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: underlay-bridge
  namespace: kube-public
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "underlay-bridge",
      "type": "bridge",
      "bridge": "br0",
      "ipam": {
          "type": "host-local",
          "subnet": "192.168.0.0/24"
      }
    }

As an example, let’s add the following annotation to an arbitrary nginx Pod and apply it:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/example/nginx.yaml#L16-L25
      annotations:
        k8s.v1.cni.cncf.io/networks: |
          [
            {
              "name": "underlay-bridge",
              "namespace": "kube-public",
              "interface": "eth1",
              "ips": [ "192.168.0.171" ]
            }
          ]

When you enter the nginx Pod and check the IP addresses, you’ll see the following:

$ kubectl exec -it nginx-55bb7d4dbd-n4blx -- /bin/bash
root@nginx-55bb7d4dbd-n4blx:/# apt update && apt install iproute2
...
root@nginx-55bb7d4dbd-n4blx:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether fe:ee:b0:c3:79:f5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.10.141.152/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::fcee:b0ff:fec3:79f5/64 scope link
       valid_lft forever preferred_lft forever
3: eth1@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether aa:3a:8d:5c:bf:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.171/24 brd 192.168.0.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2400:2650:8022:3c00:a83a:8dff:fe5c:bfc1/64 scope global dynamic mngtmpaddr
       valid_lft 293sec preferred_lft 293sec
    inet6 fe80::a83a:8dff:fe5c:bfc1/64 scope link
       valid_lft forever preferred_lft forever

eth0@if34 (10.10.141.152) is from the regular container network. In addition, there’s eth1@if35 (192.168.0.171), which is the host network NIC. The same can be done for VMs.

Installing KubeVirt and CDI

Apply KubeVirt following the official instructions. The configuration manifest (kubevirt.io/v1.KubeVirt) has been partially modified as follows:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/kubevirt/kubevirt-cr.yaml
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
  namespace: kubevirt
spec:
  configuration:
    network:
      permitBridgeInterfaceOnPodNetwork: false
    developerConfiguration:
      featureGates:
      - ExpandDisks
  imagePullPolicy: IfNotPresent

The changes from the defaults are permitBridgeInterfaceOnPodNetwork: false and ExpandDisks. Both will be explained later.

CDI is also applied following the official instructions. The configuration manifest (cdi.kubevirt.io/v1beta1.CDI) is as follows:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/kubevirt/cdi-cr.yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  name: cdi
spec:
  config:
    podResourceRequirements:
      limits:
        cpu: '1'
        memory: 5Gi
  imagePullPolicy: IfNotPresent
  infra:
    nodeSelector:
      kubernetes.io/os: linux
    tolerations:
    - key: CriticalAddonsOnly
      operator: Exists
  workload:
    nodeSelector:
      kubernetes.io/os: linux

The change from the defaults is config.podResourceRequirements. The default limits were too small, causing OOMKills during VM image downloads, so they were increased.

Downloading the VM Image

Before creating a VM, apply a DataVolume to download the VM image:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-image.yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ubuntu-image-2404
spec:
  storage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 5Gi
    storageClassName: mandoloncello-nfs
  source:
    http:
      url: https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img

We’re using Ubuntu 24.04 (noble) this time. I initially tried using an ISO file, but booting didn’t work properly, so I’m using a cloud image (img file) instead.

By the way, KubeVirt also provides image files as Container Disks. We won’t use it this time, but you can create a VM from a Container Disk as follows:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/example/vm.yaml#L44
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/containerdisks/ubuntu:22.04

Creating the VM

Now let’s apply the manifest that defines the VM. The full manifest is here, but since it’s long, I’ll explain it section by section.

DataVolume

We clone the DataVolume for the VM image we created earlier and use it for VM creation. While you could define another DataVolume separately, DataVolumes can be described as templates within the VM manifest. You can clone as follows:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L9-L23
  dataVolumeTemplates:
    - metadata:
        name: vm-pg-1
      spec:
        storage:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 64Gi
          storageClassName: mandoloncello-nfs
        source:
          pvc:
            name: ubuntu-image-2404
            namespace: playground

This is where the ExpandDisks setting comes into play. The DataVolume for the VM image is set to 5 GiB, while this DataVolume is set to 64 GiB. The PVC requests 64 GiB, but the VM that runs on it can only see 5 GiB. By enabling ExpandDisks, the VM can see the full 64 GiB.

Resource

Configure the CPU and Memory for the VM. The CPU and Memory specified here are what the VM sees.

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L30-L34
      domain:
        cpu:
          cores: 8
        memory:
          guest: 8Gi

You can also set requests and limits in a separate section, just like regular Pods. These are used for VM scheduling and don’t represent the actual resources visible to the VM. The values of domain.cpu and domain.memory must be between the requests and limits. If domain.cpu and domain.memory are not set, the resources.requests values will be visible to the VM instead.

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L55-L61
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: '8'
            memory: 8Gi

Volume

Configure the disks as follows:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L36-L45
          disks:
          - disk:
              bus: virtio
            name: disk0
            bootOrder: 1
          - cdrom:
              bus: sata
              readonly: true
            name: cloudinitdisk
            bootOrder: 2

The actual backing storage is configured as follows:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L68-L72
      volumes:
      - name: disk0
        persistentVolumeClaim:
          claimName: vm-pg-1
      - cloudInitNoCloud:

The first disk, disk0, is the PVC from earlier. The second, cloudinitdisk, is used for CloudInit. It is configured as follows:

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L74-L86
          userData: |
            #cloud-config
            hostname: vm-pg-1
            users:
            - name: onoe
              ssh_import_id: gh:hiroyaonoe
              lock_passwd: false
              passwd: $6$salt$IxDD3jeSOb5eB1CX5LBsqZFVkJdido3OUILO5Ifz5iwMuTS4XMS130MTSuDDl3aCI6WouIL9AjRbLCelDCy.g.
              shell: /bin/bash
              sudo: ALL=(ALL) NOPASSWD:ALL
              uid: 1000
            ssh_pwauth: true
            disable_root: false

Network

As explained earlier, in addition to the regular container network, we connect to the host network using Multus. In addition to the manifest configuration, we also use CloudInit for setup.

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L46-L52
          interfaces:
          - name: default
            masquerade: {}
            bootOrder: 3
          - name: underlay
            bridge: {}
            bootOrder: 4

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L62-L67
      networks:
      - name: default
        pod: {}
      - name: underlay
        multus:
          networkName: kube-public/underlay-bridge

# https://github.com/hiroyaonoe/my-k8s-cluster/blob/30daa0da2767d6f4b490b781a1b3f119dd1ac427/argocd/manifests/playground/vm-pg-1.yaml#L87-L95
          networkData: |
            version: 2
            ethernets:
              enp1s0:
                dhcp4: true
              enp2s0:
                dhcp4: false
                addresses: [192.168.0.162/24]
                gateway4: 192.168.0.1

The first interface, default (enp1s0), is the container network. It connects to the container network through a virt-launcher Pod that is created when the VM starts. In masquerade mode, the VM is on the same network as virt-launcher (separate from the container network, defaulting to 10.0.2.0/24). The VM receives an IP address from virt-launcher via DHCP. Communication between the VM and the container network is achieved through NAT by virt-launcher. If masquerade mode is not used, some CNI Plugins may not be able to communicate properly, and the setting to enforce masquerade mode is permitBridgeInterfaceOnPodNetwork: false.

The second interface, underlay (enp2s0), is the host network. It is associated with the NetworkAttachmentDefinition we created earlier (namespace: kube-public, name: underlay-bridge). The address is statically assigned using CloudInit.

Starting the VM

Once the DataVolume download and clone are complete, setting running: true will create a VirtualMachineInstance. A virt-launcher Pod is also created. This Pod uses libvirtd and qemu to create the actual VM. As explained earlier, virt-launcher also manages the VM’s network.

Let’s actually connect to the VM. There are several methods including console, vnc, and ssh. Here we’ll use ssh. There are also several ways to SSH in. The first is using virtctl with virtctl ssh vm-pg-1. The second is SSHing through the host network. The third is exposing port 22 as a NodePort Service and SSHing through the container network. virtctl is generally the easiest, but if you want to use specific SSH options, the second or third method is better.

Let’s connect to the VM and check various things:

onoe@vm-pg-1:~$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   8
  On-line CPU(s) list:    0-7
...

onoe@vm-pg-1:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.8Gi       520Mi       7.2Gi       1.1Mi       300Mi       7.2Gi
Swap:             0B          0B          0B

onoe@vm-pg-1:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           794M  1.1M  793M   1% /run
/dev/vda1        61G  1.5G   60G   3% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/vda16      881M   61M  758M   8% /boot
/dev/vda15      105M  6.1M   99M   6% /boot/efi
tmpfs           794M   12K  794M   1% /run/user/1000

onoe@vm-pg-1:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b6:29:60:82:0b:eb brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.2/24 metric 100 brd 10.0.2.255 scope global dynamic enp1s0
       valid_lft 86301118sec preferred_lft 86301118sec
    inet6 fe80::b429:60ff:fe82:beb/64 scope link
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 62:d0:72:71:e8:f5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.162/24 brd 192.168.0.255 scope global enp2s0
       valid_lft forever preferred_lft forever
    inet6 2400:2650:8022:3c00:60d0:72ff:fe71:e8f5/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 297sec preferred_lft 297sec
    inet6 fe80::60d0:72ff:fe71:e8f5/64 scope link
       valid_lft forever preferred_lft forever

Everything is configured correctly.

Impressions of Using KubeVirt

One advantage of KubeVirt is the ability to run VMs on the container network. However, when migrating an existing VM infrastructure to Kubernetes, rather than migrating VMs as-is to KubeVirt on Kubernetes, it might be easier in terms of both migration cost and future management cost to simply replace VMs with containers and run them on Kubernetes. KubeVirt may be valuable for use cases where VMs are absolutely necessary, but if the number of such VMs is small, building individual networks might be sufficient.

Another advantage is managing VMs as Infrastructure as Code. While this is also achievable with tools like Terraform, it’s convenient to manage VMs using the familiar Kubernetes framework. However, this isn’t limited to KubeVirt – the same applies to PVs, StatefulSets, etc. – managing stateful resources in Kubernetes, which uses declarative configuration management, can be quite challenging. Kubernetes aims to converge to the desired state through reconciliation, but it doesn’t guarantee that the resource actually exists. This might not be an issue with proper management, but personally it feels like it could be painful.

My impressions turned out somewhat negative, but I haven’t used OpenStack or similar tools, and those who operate large-scale VM infrastructure in production environments might have different opinions.

Conclusion

Working with KubeVirt, NFS, Multus, and related technologies was a great learning experience. While it’s overkill for a home VM infrastructure, it’s fun, so I plan to keep running it.

Hiroya Onoe