2025 Selfhosting Year in Review // James Wynn

I haven’t written much about my homelab hobby on here, but it takes up a good portion of my free time. I think of it much like gardening - pulling weeds, planting, feeding, grooming. Its relaxing and rewarding in its own way.

The cluster itself is 4 x86 nodes and one Raspberry Pi 4 with a NAS providing media storage. It is powered by Kubernetes and FluxCD and hosts a myriad services and tools that I rely on every day. Noteably it hosts my instances of GoToSocial, NextCloud, Home Assistant, Forgejo, and Atuin, along with a few dozen other things.

This year I’ve made quite a lot of changes to the homelab. Hardware wise, I upgraded one of the nodes and reorganized all the servers and network hardware.

Before Reorganizing

Before reorganizing the server rack

After Reorganizing

After reorganizing the server rack

Deployments

Application	Description	Notes
Actual Budget	Envelope-based Budgeting	Really good, and I still use it regularly
LinkWarden	Bookmark manager / Archiver	It’s great, but I prefered KaraKeep
KaraKeep	Bookmark manager / Archiver	Really really good, with good browser plugins and mobile apps
Endurain	Fitness Tracker	Imagine Strava or Garmin Connect, but local - it’s brilliant
Ghostfolio	Investment Manager	Really good, though with some sharp edges
ntfy-alertmanager	Alertmanager to Ntfy Bridge	This works perfectly and was simple to set up
Friendica	Social Network	Wasn’t for me
Iceshrimp.NET	Social Network	This is really promising, but ultimately wasn’t what I was looking for
Monica	Personal CRM	This is a really cool concept - for someone else
Multus	Complex Networking	This allows creating multiple network interfaces to pods - for IoT
Shlink	URL Shortener	Really slink and useful
Calibre-Web	Book Management	Great, but ultimately migrated to BookLore
BookLore	Book Management	Slick, modern, and featureful
Continuwuity	Matrix Server	Migrated from Synapse and haven’t looked back
Gatus	Monitoring	Great and Kubernetes native - migrated from Uptime Kuma
Maddy	SMTP Relay	It can do way more, but works great for simplifying mail for the cluster
Atuin Server	Atuin Sync	Atuin is great, and syncing across machines is even better
Echo Server	Network Diagnostic	This has been really helpful for diagnosing the network stack
External Secrets	Secret Management	Much more powerful and flexible than using SOPS for everything
Music Assistant	Music for Home Assistant	This shows so much promise, but I haven’t gotten it working well yet
Govee2MQTT	Govee Bridge	Manage Govee lights from Home Assistant
Double Take	Facial Recognition	Not accurate enough and too slow
DeepStack	Facial Recognition	Provides the training for Double Take - Using Frigate+ now
Dragonfly	Redis Alternative	This is a great clustered alternativ to Redis - should’ve switched way back

DNS - Because its always DNS

I started using ExternalDNS with CloudFlare provider for external apps, and Mikrotik provider for internal apps. There are definitely some mixed feelings about using CloudFlare, but I’m already in bed with them for other things, so its a bit moot. Before using the Mikrotik provider I was using the DNS on my Synology NAS and that kept causing more issues. Before that I used AdGuard with my own home-rolled “External DNS” implementation. It worked OK, but AdGuard kept going down.

I had also been using a wildcard DNS forward for my homelab, but that caused lots of intermittent issues and probably only increased “security” marginally. Now the external services can be more easily discovered, but all the other issues went away.

My old single instance of AdGuard was deployed on Kubernetes and was used as DNS for all “clients” on my network, but that wasn’t robust enough. I deployed a second instance and the adguard-sync project to keep them in sync. That has been pretty rock solid then.

Dev

There is a cool project that provides nginx with a preconfigured s3 backend provider, so I deployed that and pointed at a cluster-deployed instance of Minio. Now I can upload a site to S3 and it is automatically available. This works pretty well, and I used it for a couple of pretty basic sites. Then I found out about grebdoc.dev and git-pages, which would probably work much better. I’ll have to circle back on that next year!

GitOps

After all the Bitnami drama I finally migrated away from all there crap. That forced me to get rid of several things, notably Redis. I tried Valkey, but ultimately went with Dragonfly. It solved every need I had, provided a simple cluster, and used an operator.

The repo itself got a huge overhaul too. I moved all the flux sources to be adjacent to their helm releases, moved all the flux kustomizations from the flux-system namespace to their own namespaces, and created a flux component to inject my cluster-wide secrets and configmaps into each namespace.

Most of my deployments rely on BJW-S’s app-template, but I hadn’t kept up with the version changes. Version 4 was out and I still had some deployments on all three prior versions! This upgrade took quite a while, but I got it done. Then I migrated to using the app-template OCI repo.

The last big thing was finally migrating to the Flux Operator. That definitely made things much smoother. And while I was doing that I found the configuration error that had kept my github receiver from working!

The ultimate goal of this repository is to be public facing, but I’ve never been satisfied enough with the state of things to publish it yet. Hopefully soon. As part of these preparations I moved some of my deployments to a separate “private-apps” repo. That repo is now also selfhosted on my Forgejo instance!

Infra Changes

One really big change was the move from Flannel to Cilium. It took most of a week to figure out what I needed to do and to get it all working correctly. Part of that work required me to add BGP configuration to my router and servers. That wasn’t too hard, but I wasn’t very knowledgeable of BGP before.

My nodes were regularly having issues with evictions due to exhausted ephemeral memory. Unfortunately I couldn’t find very good documentation or monitoring for this out of the box. Eventually I found the k8s-ephemeral-storage-metrics project, and that helped monitor the situation a bit better. Then I added some default limits to all namespaces. That meant instead of nodes evicting pods constantly, the pods would just get killed. That was a nice improvement, since I couldn’t monitor the pods ephemeral usage very easily. Then it was just the tedious process of adjusting the temp storage limits and mounts for each deployment. I rarely see these issues any more now.