All of the code Link to heading
While my infra is constantly changing and this guide might not be up to date once the infra evolves, you can check out the code by looking at the most recent commit I am on as of time of writing this - link to the commit. I used this version just today to set up the local environment so it should be ok
Why Link to heading
Having run Proxmox for quite some time now, I have become fed up with it.
After running way too much pointless stuff to make my other stuff more secure:
- like setting up the aforementioned FreeIPA to act as my private CA to have my own certificates for all of my services to not have to deal with Let’s Encrypt’s very low thresholds
- like running a 3-node FreeIPA cluster on AlmaLinux. Because Kerberos authentication decided to stop functioning on the single node one.
- like configuring my domain and adding all of my VMs and bare-metal hosts, to have in-transit encryption for my NFSv4 file shares
- and setting up LDAP to have all of my users (me and my service account because why not) and have the same password to all of my self-hosted services only then to realize that Proxmox Backup Server, which I relied on for backups, USES A DIFFERENT IMPLEMENTATION OF LDAP THAN THE PROXMOX HYPERVISOR AND THE PBS REFUSING TO WORK WITH FREEIPA. I have decided to ditch this perl script glued piece of shit in favor of more sophisticated tools like Kubernetes.
After hours of research I found my Kubernetes distribution - Talos. One caveat though, I wanted to have full disk encryption. Well, at least for my root. If I am ever robbed, I will have enough of things on my mind. I don’t want to bother myself with thinking about someone having access to my data, maybe even passwords.
Fortunately, Talos offers disk encryption. However, it is extremely limiting. It’s either:
- static passphrase which gets saved in plaintext on one of the unencrypted system partitions… I am not even going to comment on that
- TPM2 which needs secureboot which I don’t have and I don’t want to lose access to all of my data because my motherboard dies
- KMS which uses a proprietary protocol. Proprietary as in custom made and FOSS for Talos. This would require me to run something like OpenBAO/Hashicorp Vault which has huge resource requirements and would make a great chicken-and-egg problem for me in setting up my infrastructure. I thought that Talos was all about being the EDGE solution. Clevis/Tang uses nearly 0 resources…
- random string based on node UUID which is not that much different from the statis passphrase one
All of the above are useless for me. They all want the Talos cluster to decrypt itself automagically without any human interaction. I get that. But not with these kinds of trade-offs. I want to input the passphrase myself. Well, I don’t, because my servers are headless. I want to use NBDE with Clevis and Tang. I don’t want to reinvent the wheel. Why does this need to be so complicated?
So it was either:
- develop a custom solution and maybe try to get it upstream into Talos, adding Clevis as a new way to decrypt system disks or
- virtualizing the node in Proxmox which I hate because it’s going to cause problems with using the GPU in the VM. I also come to hate Proxmox
I am gonna work on the first option once I don’t have anything else to do in my free time but for the time being I am gonna set up Devian Trixie, install Proxmox on top of it and then set up NBDE.
Standard operation - overview Link to heading
The whole workflow of me decrypting my stuff should be like this:
- my always-on server decides to reboot or something - it needs some kind of human intervention
- I open up my laptop (disk encrypted duh) on the local network and bring up Tang
- Clevis running in initramfs sends requests to the Tang server and hopefully decrypts the system disk
- profit
- well, if I lose access to Tang keys or anything, I can connect the display and input an emergency passphrase. LUKS2 offers 32 keyslots so that’s more than enough
How to set this thing up - overview Link to heading
- create a Tang server on my laptop
- create a preseeded Debian .iso so that I only need to click enter during installation - I hate installing stuff and I would rather automate it hence the preseed
- install Debian - one click install. Maybe a few more if there are more disks available but more on that later
- install Proxmox on top of it
- configure the host with Ansible
- bind Clevis to the Tang server
- add emergency passphrase
- delete temporary bootstrap passphrase
- profit
How to set this thing up - the long version Link to heading
Debian preseed Link to heading
I found some shell script online but I lost the link. Anyway, it grabs a Debian .iso, preseeds it with the preseed.cfg file and then creates the custom .iso. Neat.
#!/bin/sh
set -e
ISO="$1"
NEW_ISO="$2"
WD="$(mktemp -d)"
OLD_PWD="$(pwd)"
echo $ISO
echo $WD
echo $OLD_PWD
7z x -o"$WD" "$ISO"
cd "$WD"
gunzip install.amd/initrd.gz
cp "$OLD_PWD/preseed.cfg" .
echo preseed.cfg | cpio -o -H newc -A -F install.amd/initrd
rm preseed.cfg
gzip install.amd/initrd
find -follow -type f -print0 | xargs --null md5sum >md5sum.txt
# Extract MBR template file to disk
dd if="$OLD_PWD/$ISO" bs=1 count=432 of="isohdpfx.bin"
xorriso -as mkisofs -o "$NEW_ISO" -isohybrid-mbr "isohdpfx.bin" \
-c isolinux/boot.cat -b isolinux/isolinux.bin -no-emul-boot -boot-load-size 4 \
-boot-info-table -eltorito-alt-boot -e boot/grub/efi.img -no-emul-boot \
-isohybrid-gpt-basdat "$WD"
cd "$OLD_PWD"
cp -f "$WD/$NEW_ISO" .
rm -r "$WD"
One caveat, I want to have prod and dev environments which are going to have different configs. So I created a python script that takes preseed.cfg.j2 file, templates it using jinja2 and outputs the preseed.cfg file for my .iso:
#!/usr/bin/env python
import os
import sys
import argparse
from jinja2 import Environment, FileSystemLoader, exceptions
def render_template(template_file, output_file, context):
"""
Renders a Jinja2 template with a given context and writes it to an output file.
"""
try:
# Get the directory containing the template file
# This allows Jinja2 to find the template correctly
template_dir = os.path.dirname(os.path.abspath(template_file))
template_name = os.path.basename(template_file)
# Set up the Jinja2 environment
# The FileSystemLoader loads templates from the file system
env = Environment(loader=FileSystemLoader(template_dir))
# Load the template
template = env.get_template(template_name)
# Render the template with the provided context
rendered_content = template.render(context)
# Write the rendered content to the output file
with open(output_file, "w") as f:
f.write(rendered_content)
print(f"Successfully rendered template '{template_file}' to '{output_file}'")
except exceptions.TemplateNotFound:
print(f"Error: Template not found at '{template_file}'", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"An unexpected error occurred: {e}", file=sys.stderr)
sys.exit(1)
def main():
"""
Main function to parse arguments and initiate template rendering.
"""
parser = argparse.ArgumentParser(
description="Render a Jinja2 template using environment variables."
)
# Define command-line arguments
parser.add_argument("template_file", help="Path to the input Jinja2 template file.")
parser.add_argument("output_file", help="Path for the output rendered file.")
args = parser.parse_args()
# Use all environment variables as the context
# We pass `dict(os.environ)` to create a simple dictionary
# that Jinja2 can easily work with.
context = dict(os.environ)
render_template(args.template_file, args.output_file, context)
if __name__ == "__main__":
main()
The above is vibe coded because I didn’t feel like spending much time on some simple jinja2 template script.
One last thing, I don’t want to download the original Debian .iso manually and then have to run 2 scripts. So here’s the super simple script that will run the render.py and the mkpreseediso.sh scripts (the .iso must be downloaded manually because it only needs to be downloaded only once):
#!/bin/bash
set -e
OLD_ISO="$1"
NEW_ISO="$2"
python render.py preseed.cfg.j2 preseed.cfg
./mkpreseediso.sh "$OLD_ISO" "$NEW_ISO"
Now, all of the environment variables used in the template file need to be written to a file that will be imported during script execution. Next, the script can be run - bash -c 'set -a; source fearless-local.env.gitkeep; set +a; ./main.sh debian-13.1.0-amd64-netinst.iso fearless-local.iso'
I don’t guarantee that the above files won’t change. Up to date and best effort docs will be available in my git repository.
Now, Debian can be installed. If you only have a single disk then selecting the standard ‘Install’ should make the installer go through the install process without any more clicking required. Neat.
After the reboot a temporary passphrase will be needed to be provided. This one I am not bothered with. Because if I make a typo, I will be able to just try again. If I did this during installation, then I would have to go through the whole installation process once again. I guess that makes sense.
Anyway, Debian should be installed now and the host should be up and running. We can now proceed to setting up the Tang server.
Tang server setup Link to heading
Tang was initially created by some clever guys at RedHat so since there were no ‘official’ Docker images for me to use, I decided to use the RedHat 10 Tang Docker image from the ‘subscription’ repository. This means that I am at the mercy of the big red fedora but I don’t care. I have Tang keys backed up and if the need arises, I will be able to switch to a custom Docker container. Maybe running AlmaLinux or something.
I already had the following in the readme for tang in my repo so I figured I’ll just paste it here because it’s self-explanatory:
- Log in to RedHat registry
- Run the tang container
prod -
podman run -d -p 7500:8080 -v tang-keys:/var/db/tang --name tang registry.redhat.io/rhel10/tang - Generate tang.yml file
podman kube generate tang -f tang.ymlpodman kube generate tang-dev -f tang-dev.yml - Kill and remove the container
podman kill tang && podman container rm tangpodman kill tang-dev && podman container rm tang-dev - Run
podman kube play tang.yml
More on standard operations when using Tang in some later sections.
Host configuration, Clevis-Tang binding and emergency passphrase Link to heading
The next thing is to run an Ansible playbook. I am too lazy to copy paste this stuff here so if you want to take a look at its contents then go through git. It should have been shared at the top of the article
Anyway, once this is done, Proxmox should be installed, Clevis should be bound on the last keyslot to the Tang server, emergency passphrase should be in the first keyslot and the temporary bootstrap passphrase should have been deleted.
Now, Tang can be stopped - podman kube down tang.yml
Standard operation Link to heading
When at home and I need to reboot my server, I can unlock my laptop, run the Tang server and boot up the server and then stop it:
- Run
podman kube play tang.yml - Boot up the host running Clevis
- Profit
- Stop tang
podman kube down tang.yml
Volumes are persistent so this should work for quite some time. Now, volumes holding the keys should be backed up. Well, exported, then encrypted and then backed up. But I have not set this up yet. I want to focus on getting my services up and running on Talos Linux.
Emergency operation Link to heading
In case my laptop dies before I set up the keys backup solution, the emergency passphrase can be used for, as the name implies, emergency operations.
Simply connect a display to the server and boot it up. Then, the emergency passphrase can be input when in the initramfs
Conclusion Link to heading
Thanks to some questionable protocol decisions at SideroLabs I have to go back to using Proxmox because of reasons. I hate this approach and I will try to set this thing up natively in Talos but today is not the day. Next, I’ll set up Talos as a VM and passthrough the HBA directly to it and make a one box for everything kind of thing