Skip Ribbon Commands
Skip to main content

Cory's Blog

:

Quick Launch

Stenoweb Home Page > Cory's Blog > Posts > The Perils of Late Night Troubleshooting
October 15
The Perils of Late Night Troubleshooting

Last night as I was waiting to fall asleep, I noticed the Internet had stopped working. It had been a big long day and I was in pain and it was cold but I got up because I knew my roommate was likely still doing things.

I had just taken some nyquil and had done plasma exchange earlier so I was kind of in a compromised state from those things, as usual. In such a state of undress, I wander to the shelf the server is on, and the window is open so I have to fix that, and a disk has fallen off the RAID so I pull that out and put it back in, and disk activity is crazy, so I mis-diagnose it as that the server is in the process of falling over at the behest of its storage.

Turns out, that really wasn't what was happening, and my first hint should have been when I got up and my desktop didn't respond, because last time that happened, I was able to gracefully restart the server to no effect, but when I finally rebooted the desktop things returned to normal.

So I finally notice that and at this point I've held the power button down to hard reboot the server, which is of course Bad™ for all of the virtual machines on it, and then I have to wait for it to come up, at which point it still hasn't occurred to me to cut the network connection to the machine in my bedroom and turn it off, then just in case power cycle the modem and the other wireless device, and it all came back.

Of course, it didn't all come back and now the thing I'm paying for is that one of my virtual machines appears unrecoverable. In fact, it is the lynchpin virtual machine. I spared no saltiness in telling my roommate that due to trying to troubleshoot while on nyquil after having to get back up, I made some wrong decisions. When I left the house earlier today, the backup software was estimating it would take about ten hours to restore the file.

It's fortunate for me that I didn't have to restore cronk, (or worse: maron) but it is still annoying. Calvin and I will lose a few days of email, unless Outlook is exceptionally smart about cached messages (I could go save them, but I don't think I got anything important.) I'll keep a copy of the "bad" VHD files in case something comes up, but I don't think it will.

The main thing this does it make me think about how important it is to start splitting functions out into different virtual machines, and perhaps switch to a more traditional Windows Infrastructure set up. I hadn't told my roommate, but I've also been considering switching to a better router that I can trust giving DNS and DHCP back to, which would mean my client computers would be less impacted by events like this. In fact, being able to do more routine maintenance like rebooting the individual virtual machines and the hypervisor machine more often (and have it not impact my roommate's access) would probably be a benefit.

Such a change won't prevent my desktop from causing problems, as I believe it did last night, but it might prevent those problems from impacting everything else on the network. I have more thoughts to share on the "network improvement" front but will hold off for another time.

Comments

There are no comments for this post.