Project

General

Profile

Setting up systemd-nspawn VMs » History » Revision 19

Revision 18 (Peter Amstutz, 05/08/2025 02:53 AM) → Revision 19/21 (Peter Amstutz, 05/08/2025 02:54 AM)

h1. Setting up systemd-nspawn VMs 

 This page describes how to use systemd-nspawn to create VMs for development and testing. This page is a guide, *not* step-by-step instructions. *If you just copy+paste commands without actually reading the instructions, you will BREAK YOUR OWN NETWORKING and I will not be held responsible.* 

 {{toc}} 

 h2. One-time supervisor host setup 

 h3. Install systemd-nspawn and image build tools 

 <pre>sudo apt install systemd-container debootstrap 
 </pre> 

 @systemd-container@ packages systemd-nspawn and friends. @debootstrap@ is used to build VMs. 

 "Install Ansible":https://dev.arvados.org/projects/arvados/wiki/Hacking_prerequisites#Install-Ansible the same way we do for development. I'm fobbing you off to that page so you know what version of Ansible we're standardized on. 

 h3. Enable systemd network services 

 Unsurprisingly systemd-nspawn integrates well with other systemd components. The easiest way to get your VMs networked is to install systemd's network services: 

 <pre>sudo systemctl enable --now systemd-networkd systemd-resolved 
 </pre> 

 Note systemd-networkd only manages configured interfaces. On Debian the default configuration should play nice with NetworkManager. systemd-resolved and NetworkManager also cooperate. 

 If you refuse to do this, refer to the "Networking Options of systemd-nspawn":https://www.freedesktop.org/software/systemd/man/latest/systemd-nspawn.html#Networking%20Options to evaluate alternatives. 

 h3. NAT and firewall 

 systemd-networkd runs a DHCP server that provides private addresses to the virtual machines. You will need to configure your firewall to allow these DHCP requests, and to NAT traffic from those interfaces. These steps are specific to the host firewall; if yours isn't documented below, feel free to add it. 

 h4. ufw 

 For NAT, make sure these lines in @/etc/ufw/sysctl.conf@ are all set to @1@: 

 <pre>net/ipv4/ip_forward=1 
 net/ipv6/conf/default/forwarding=1 
 net/ipv6/conf/all/forwarding=1 
 </pre> 

 If you changed any, restart ufw. Then these are the rules you need: 

 <pre><code class="sh">for iface in vb-+ ve-+ vz-+; do 
   sudo ufw rule    allow in on "$iface" proto udp to 0.0.0.0/0 port 67,68 comment "systemd-nspawn DHCP" 
   sudo ufw route allow in on "$iface" 
 done 
 </code></pre> 

 h3. Filesystem 

 systemd-nspawn stores both images and containers under @/var/lib/machines@. It works with any filesystem, but if the filesystem is btrfs, it can optimize various operations with snapshots, etc. "Here's a blog post outlining some of the gains":https://idle.nprescott.com/2022/systemd-nspawn-and-btrfs.html. 

 I would recommend any deployment, and especially production deployments, have a btrfs filesystem at @/var/lib/machines@. Since this is likely to grow large, a dedicated partition is a good idea too. 

 h3. Resolving VM names 

 You can configure your host system to resolve the names of running VMs so you can easily SSH into them, open them in your browser, write them in Ansible inventories, etc. Edit @/etc/nsswitch.conf@, find the @hosts@ line, and make sure that @mymachines@ appears before any @dns@ or @resolve@ entries. See "nss-mymachines(2)":https://www.freedesktop.org/software/systemd/man/latest/nss-mymachines.html. 

 h3. Alternative configuration: virtual bridge with your local network 

 You can create a "virtual bridge" that acts as an Ethernet switch for your containers and virtual machines.    This means your containers will get virtual ethernet devices with their own MAC addresses (generated by the Linux kernel) and that allow them to request their own IP addresses from your home/office router (the router doesn't need any configuration).    The nice thing about this is that it avoids the painful complexity of IP masquerading and NAT, and makes it much easier for other devices on your local network to access services running in the container.    The drawbacks are that your container is more exposed (since that's the point) and it may be harder to control over how IP addresses are assigned than a completely dedicated private network. 

 Create the following file called @br0.xml@ 

 <pre> 
 <network> 
   <name>br0</name> 
   <forward mode='bridge'/> 
   <bridge name='br0'/> 
 </network> 
 </pre> 

 Then use @virsh@ to define the bridge network: 

 <pre> 
 virsh net-define br0.xml --validate  
 virsh net-start br0  
 systemctl restart libvirtd.service 
 </pre> 

 For DNS, I recommend using @mDNS@ (generally implemented with the @avahi@ daemon) and having it publish the hostname on the local network.    Edit @/etc/avahi/avahi-daemon.conf@ 

 <pre> 
 [publish] 
 publish-workstation=yes 
 </pre> 

 Then all workstations with @mDNS@ clients will see the container or VM as <hostname>.local. 

 h2. Build a systemd-nspawn container image 

 The Arvados source includes an Ansible playbook to create an image from scratch with @debootstrap@. Write this variables file as @nspawn-image.yml@ and edit the values as you like: 

 <pre><code class="yaml"> 
 ### Stuff you probably want to customize ### 
 # The name of the user account to create in the VM.    The default value is "admin". 
 #image_username: "admin" 

 # A hash of the user's password. The default is no password. 
 # You need to do this or you won't be able to use 'sudo'. 
 # 
 # ansible all -i localhost, -m debug -a "msg={{ 'mypassword' | password_hash('sha512', 'mysecretsalt') }}" 
 # 
 # See also <https://docs.ansible.com/ansible/latest/reference_appendices/faq.html#how-do-i-generate-encrypted-passwords-for-the-user-module> 
 image_passhash: "!" 

 # SSH public key string or URL which will be provisioned as an authorized key for  
 # the user account above.    You probably want this. 
 #image_authorized_keys: image_authorized_keys: "FIXME" 

 ### Stuff you may want to customize. ### 
 # The codename of the release to install. 
 debootstrap_suite: bookworm 

 # The name of the image that will show up in "machinectl list-images" as well as 
 # "machinectl start" and "machinectl stop" 
 # Default name is the distribution version being set up (e.g. debian-bookworm), but  
 # you can also call this whatever you want, like "my-arvados-test" 
 image_name: "debian-{{ debootstrap_suite }}" 

 # The mirror to install the release from. 
 # The commented-out setting below is appropriate for Ubuntu. 
 debootstrap_mirror: "http://deb.debian.org/debian" 
 #debootstrap_mirror: "http://archive.ubuntu.com/ubuntu" 

 ### Additional user account customization ### 
 # Other settings for the created user. 
 #image_gecos: "" 
 #image_shell: /usr/bin/bash 
 </code></pre> 

 With your Ansible virtualenv activated, run: 

 <pre><code class="sh">ansible-playbook --ask-become-pass --extra-vars @nspawn-image.yml arvados/tools/ansible/build-debian-nspawn-vm.yml 
 </code></pre> 

 If this succeeds, you have @/var/lib/machines/MACHINE@ with a base install and configuration. 

 h3. Consider Cloning 

 This is probably a good time to mention, you should think about these machine subdirectories more like VM disks rather than Docker images. If you simply boot your new VM and start making changes to it, those changes will be permanent. If you want an ephemeral VM you need to explicitly ask for that. Personally I prefer to never boot this bootstrapped VM directly, instead I run @machinectl clone BASE_NAME MACHINE@—then I treat @BASE_NAME@ like an "image" that I never touch, and @MACHINE@ more like a traditional stateful VM. 

 h2. Configure the VM 

 VMs are configured using the file at @/etc/systemd/nspawn/MACHINE.nspawn@. The defaults are pretty good and you don't have to write much. The main thing you'll want to do is tell it how to resolve DNS, and consider other networking: 

 <pre><code class="ini">[Exec] 
 ResolvConf=bind-uplink 

 [Network] 
 # If you want multiple VMs to be able to talk to each other, 
 # put them all in the same zone: 
 #Zone=YOURZONE 

 # If you set up a virtual bridge 
 #Bridge=br0 

 [Files] 
 # If you want to make things on the host available in the VM, 
 # do that here: 
 Bind=/dev/fuse 
 #BindReadOnly=/home/YOU/SUBDIR 
 </code></pre> 

 Refer to "systemd.nspawn":https://www.freedesktop.org/software/systemd/man/latest/systemd.nspawn.html for all the options. 

 h2. Privilege a Container 

 If you want to run FUSE, Docker, or Singularity inside your VM, that requires additional privileges. We have an Ansible playbook to automate that too. To grant privileges for all these services, with your Ansible virtualenv activated, run (@-e@ is the short version of @--extra-vars@): 

 <pre><code class="sh">ansible-playbook -e container_name=MACHINE arvados/tools/ansible/privilege-nspawn-vm.yml 
 </code></pre> 

 You can exclude some privileges by setting @SERVICE_privileges=absent@. For example, if you don't intend to run Singularity in this VM: 

 <pre><code class="sh">ansible-playbook -e "container_name=MACHINE singularity_privileges=absent" arvados/tools/ansible/privilege-nspawn-vm.yml 
 </code></pre> 

 See the comments at the top of source:tools/ansible/privilege-nspawn-vm.yml for details. 

 h2. Interacting with VMs 

 "machinectl":https://www.freedesktop.org/software/systemd/man/latest/machinectl.html is the primary command to interact with both containers and the underlying disk images: 

 <pre><code class="sh">machinectl start MACHINE 
 machinectl stop MACHINE 
 machinectl shell YOU@MACHINE 

 machinectl clone MACHINE1 MACHINE2 
 machinectl remove MACHINE [MACHINE2 ...] 
 </code></pre> 

 Refer to the man page for full details. Note that running containers run under the <code>systemd-nspawn@MACHINE</code> systemd service, and you can interact with that with all the usual tools. (Try <code>journalctl -u systemd-nspawn@MACHINE</code>.)