This is the only time I hate having decided to self host.

I figured most of you could relate to this.

I was updating my Proxmox servers from 7.4 to 8. First one went without problems. That second one though… Yea, not so much… I THINK it’s GRUB but not sure yet.

Now my Nextcloud, NAS, main reverse proxy and half my DNS went down. And no time to fix it before work. Lovely 🤕 Well I now know what I’ll be doing when I get home.

Out of morbid curiosity, What are some of ya’lls self hosting horror stories.?

Image

Image alternative text

BlueSialia, 11 months ago

I have a beefy Unraid server for Dockers and VMs. The idea was to have it replace all my computers. At home the VMs output the image to a monitor so that’s my desktop. And remotely I connect my phone to my home VPN and connect my phone to a lapdock and use it as a thin client to connect to my VMs. Nomachine for Linux/work, Moonlight for Windows/gaming.

Well, it’s been over a year of not being able to have my server reach an uptime higher that 15 days and I have no fucking idea why. There are no traces of any error anywhere.

reply

report

activity

copy /kbin url

copy original url

Loading...

Nagairius, 11 months ago

That’s too bad. I am at the beginning of exactly that path. I have my Unraid running my containers and just started building VMs for myself, but I’ve had much better success in uptime.

reply

report

activity

copy /kbin url

copy original url

Loading...

soldersmoker, 11 months ago (edited 11 months ago)

I’m using 3 Ubiquiti APs and running my own management instance on my server in a docker container.

I still haven’t been able to figure out why, except for maybe crappy Ubiquiti firmware, but if that container goes down or loses connectivity then the APs flood my router with traffic and my whole network goes down.

Even wired connections don’t work since the router is locked up, and when my server comes back up it won’t be able to reestablish connection because the router is still dead.

The only way I’ve found to fix it is to power cycle the APs which is obviously a huge pain.

Can’t get any support from Ubiquiti on it since I’m not using one of their controllers even though it’s obviously a firmware issue. Definitely do not recommend.

reply

report

activity

copy /kbin url

copy original url

Loading...

greybeard, 11 months ago

That’s an odd one. I’ve delt with Unifi at a lot of scales and never heard of them acting up when the controller goes down. Do you perhaps use a guest network with an intercept page? That’s the only thing I can imagine possibly causing any issue.

reply

report

activity

copy /kbin url

copy original url

Loading...

soldersmoker, 11 months ago

No guest network, I have a really simple setup at home in general, the 3 Ubiquiti APs are the only ones broadcasting, firmware is up to date and everything

reply

report

activity

copy /kbin url

copy original url

Loading...

bezerker03, 11 months ago

Had my entire home setup (all my arr services, nextcloud, home assistant, monitoring, etc) all running in my k3s setup on like 5 vms at home. Had velero backups of it etc.

Fast forward to i have no idea what happened and my masters just died. Nothing should sync anymore etc. Nobody in k3s community had an idea either. So lost my entire cluster and the backups weren’t too useful since the cluster itself was dead.

Rebuilt with Talos. But man that sucked.

reply

report

activity

copy /kbin url

copy original url

Loading...

RonnyZittledong, 11 months ago

It is times like these the love I have for my pikvm is renewed ever stronger.

reply

report

activity

copy /kbin url

copy original url

Loading...

LuckyCharmsNSoyMilk, 11 months ago

I really gotta get on building one (or two or three) of those. My employer has free colocation and I’m tech support for my parents’ server. Sigh.

reply

report

activity

copy /kbin url

copy original url

Loading...

Molecular0079, 11 months ago

Oh man, I empathize with you. Sometimes your self-hosted services go down at really bad times and you just don’t have time to fix it in the moment. Then the fact that its broken starts nagging at you throughout the rest of the day. Hope you get your stuff back up without too much fuss.

My current horror story is that my QNAP TS-453 Pro NAS that was hosting my Jellyfin and Nextcloud shut off on its own several weeks back and then refused to boot up. Turns out there’s a known manufacturing defect in the Intel J1900 chip the NAS uses that causes clock drift and every TS-451 and TS-453 NAS that was ever sold is basically a ticking time bomb and it was my time to get bit. QNAP never issued a recall even though they knew about the issue and is refusing to help customers affected by it. Now I am hoping that I can use the resistor fix in that forum post to briefly revive my NAS so that I can then backup all the data into a DIY NAS that I am still ordering parts for. Picked up some good deals but man DIY is still expensive. Hopefully, it’s worth it as I never want to use turnkey solutions again after this experience.

reply

report

activity

copy /kbin url

copy original url

Loading...

ech0, 11 months ago

The fact that QNAP knew about this and didn’t warn their customers would cause me to boycott them for life. This isn’t just like a gaming PC. This is a NAS. Some peoples entire lives are on there.

There are lots of reasons to avoid QNAP but that’s rough.

So glad I went DIY with Ryzen and Unraid

reply

report

activity

copy /kbin url

copy original url

Loading...

Molecular0079, 11 months ago

That’s why I am doing a DIY NAS now. I don’t think I’ll ever buy another QNAP ever again after this experience. Is your DIY a mini-ITX by the way? I’ve been having a hell of a time figuring out whether I can get PCI-E bifurcation for my nvme SSDs while using a 5600G CPU.

What are you thoughts about Unraid btw? I’ve been looking into TrueNAS Scale.

reply

report

activity

copy /kbin url

copy original url

Loading...

LuckyCharmsNSoyMilk, 11 months ago

Not OP but Unraid is fantastic. I know ZFS expansion is coming at some point but being able to slap in another drive and add it to the pool and have parity “just work” is worth the money. Plus it makes Docker containers much easier to manage (Not like Portainer is that hard, but it’s nice to have configs already set to go).

reply

report

activity

copy /kbin url

copy original url

Loading...

Molecular0079, 11 months ago

Nice, I’ll definitely take a look at it. Both my Jellyfin and Nextcloud are setup using docker-compose so having easier Docker management is definitely a plus for me.

reply

report

activity

copy /kbin url

copy original url

Loading...

jjakc, 11 months ago

The ZFS update is live now

reply

report

activity

copy /kbin url

copy original url

Loading...

loug, 11 months ago

This was a loooot of pcs affected by that one. Synology was also hit for example.

reply

report

activity

copy /kbin url

copy original url

Loading...

RotaryKeyboard, 11 months ago

Ugh, this happened to me during a minor release. For whatever reason I had to lug the PC into my office, connect keyboard and mouse, boot it up, and press a key. Then it would boot normally again. I get jealous of those of you with servers that have those remote KVM capabilities.

reply

report

activity

copy /kbin url

copy original url

Loading...

VexCatalyst, 11 months ago

My issue wasn’t quite that easy but it wasn’t as headache inducing as I had thought. Turns out, last time I had rejiggered my services I had failed to delete a now unused fstab entry. One pound sign, save file and a reboot later and everything was back up and running correctly. I lucked out! Now tiem to move my Nextcloud backups off that machine!

reply

report

activity

copy /kbin url

copy original url

Loading...

chiisana, 11 months ago

I’ve been carrying an OMV VM since Proxmox 5. Between one of the major version updates, usrmerge made a mess and forced me to reinstall the boot disk, re-hook everything up, and while not ideal, it works. Updated again recently, and my disks started to fall into read only mode. Tried the usual, rebooting into single user mode, fsck the volume, remounting, etc. and “hey look, it came back online!” only for it to go back into read only mode again. Since it was a virtual disk on a RAID6 array, and nothing else was breaking, it was really boggling my mind. It kept doing that despite still having a couple TB of free space available… or at least so I thought.

Turns out:

I had the virtual disk allocated to 19TB of my 24TB available space to work with. The qcow file lazy-write so despite it showing 19TB on disk in ls, it only used as much as the VM actually used. Usage grew to 16TB, the qcow file tried to write more data, but 16TB is the ext4 file size limit on my system. Oops.

I ended up ordering 3 more drives, expanding to 8x8TB on RAID6 w/ 48TB ish workable space, copied the data out into separate volumes, with none of them exceeding 15TB in size, then finally deleting the old “19TB” volume. Now I have over 25TB of space to grow, and new found appreciation for the 16TB limit :)

reply

report

activity

copy /kbin url

copy original url

Loading...

Add comment