How Are Your Backups? – Zach Bundun

 

Saturday – 9AM

It was a Saturday morning waking up at a friend’s place after a little night of laughs and some drinks. I was awakened by a phone call from the On-Call tech regarding the last time the drives were swapped at a customer of ours.

“It has been a while since our last QTR, why?”

“You must not have seen the news, they burned down last night”

*Heart drops into stomach*

“Oh, well I’ll have to check the drive in our safe to be certain,” we hang up and I’m suddenly not feeling too great. It is possible that our quarterly backup for this customer is as old as 89 days, hopefully they have been performing their daily drive swap!

I read through e-mails from our GM to the customer regarding the fire and condolences, and e-mails to me asking about the backups too.

Well, I’d better get to the office. Checking the news and pictures of what once was their office, it was looking grim. So I get up, explain that it’s time to go, and start my journey over to the office.

10:00 AM

This was on the west side of the city on the Anthony Henday as I headed north, there is still some smoke coming out of the East in the direction of 50th Street. That must have been some fire.

11:00 AM

This is where things get interesting. On checking the backup files and times with our quarterly swap, it dawns on me that we are sitting with fairly outdated data. Not good for our situation, what could possibly be left from the fire?

After speaking with our customer contact I assured him there ARE backups… at least. The real unfortunate part here is they have not been performing their own daily swaps, which means they have even older data somewhere safe, or worse, that both drives were onsite and possibly toast.

Things got a little bit better though as it sounds like the Fire Marshal has identified the server room and a large portion of the equipment to still be intact. Was it still usable though, is my only thought.

He couldn’t pull any of the items out quite yet due to the investigation and some hot spots, but this certainly led to a fairly sleepless Saturday night as all recent customer backups and data rested in a now open room exposed to the elements.

Thank goodness for clear weather that evening.

Sunday – 8AM

The next morning I am up early anticipating the call from our contact to go meet onsite and obtain the server and backup drive, making very clear that the Fire Marshal grabs the correct equipment as I was not allowed to go in there myself.

9:30AM

I arrive onsite to see a very unrecognizable pile of rubble from which their main office stood, devastating to see. Luckily, for some reason, the front area of the office where the server room was remained largely intact, which was a relieving sight.

The contact comes walking out with server in hand, and we place it in the back of the van. That will likely be the stinkiest server I will hopefully ever have to work with again, all caked in black soot. No obvious burn marks or melted pieces, so that is a good sign, perhaps our rack provided some insulation.

The more important piece for me, the NetGear Backup appliance, was brought out to me from the Fire Marshal and it was in much better shape than the server. Still stinky and dirty, but much more salvageable. The drives on the inside were looking great.

One of the Western Digital daily swaps suffered some melting though, which collected itself in a small puddle and cooled off into an odd shape. Toast.

11:30AM

I hurry back to the office to begin one of the longest Sundays I think I have ever had. First I have to get that data off the drives. I pick the cleaner one of the two inside the NAS and throw it into a standby NAS to see if it will load up.

1:00PM

Success! And to my further relief there is backup data that is dated to the very Friday night just mere hours before the fire. Copy data now.

The joy and nervousness of watching a simple file copy was something else, as this drive could potentially fail at any moment. I had to leave it in peace as I couldn’t bear to watch, so to take my mind off of it I decided to do a thorough cleaning of the server just in case I had to boot it up.

4:00PM

This was about 3 hours of meticulous cleaning on its own, as all of the internal fans looked to have sucked in extremely thick black smoke and proceeded to spread it on EVERYTHING. Lots of rubbing alcohol, as even the dust can had zero effect on the soot.

I managed to clean out every single fan, stick of ram, and the board as best as I could. The drives were well protected and were not dirty at all. I wonder how that copy is going.

“Copy Complete” OK things are really turning out better than expected at this point. Now I just need a place for these servers to live on again. A quick call to the GM (very happy) and our Datacenter Techs to spin up the environment in our Datacenter.

6:00PM

In the afternoon GM and the contact visited me at the office to discuss our plans to restore into the Cloud and he was very happy to proceed.

A very long restoration lay ahead as I start the DC restore directly into our VMWare environment.

7:00PM

The server is looking to be pretty clean at this point, satisfied with what I see I decide to plug it in and see what happens. Power it up, standard boot cycle with loud fans, BIOS screen coming up, and no major errors. Not bad.

The fans spool down and amazingly the thing is purring quieter than ever. Did I just restore a physical server from a fire? The VMWare OS loads up and it is ready to continue on like nothing happened.

For fun, I powered on the VM’s to see what was going on, and it seems they shut down unexpectedly for some odd reason. Go figure.

Well now we have some additional safety nets should anything go wrong as the server is alive once again, backups worked well and replicated to several disks from their NAS, things are looking great.

10:00PM

In the evening I finished the restore of the DC into our Cloud environment, and just as how everything else this day lined up very well, it lives and breathes with no issue in the Cloud. I make some minor configuration changes and begin to restore the TS. Now I need some rest.

The restoration is slated to finish around 3AM Monday morning, so I set my alarm for 3:30 so I can finish setting it up to. Commence a few hours’ sleep, which never felt better.

Monday – 3:30AM

I wake up at 3:30 raring to see where things sit. Do we have a fully functioning customer environment living in our datacenter after a brush with death? You know it. Everything is up and running in the Cloud, DNS changes made to point to us, and now these servers are ready to rock from 9PM Friday. We lost a couple hours between the backup and the fire, but hey this ain’t too bad.

Go back to sleep and very excitedly wake up to go and help our friends get back in business.

9:00AM

They ended up renting a conference hall in a couple of different hotels over the next few weeks and got connected back to their server using spare computer equipment we scrambled for them. Some of the users were back and working like nothing happened no later than 10AM Monday. A fine accomplishment if you ask me.

Today

The customer has since leased a new large piece of land just outside of the city and are fully restored with brand new equipment, with a very clean environment. They signed a full Service Plus deal as they were on Service-Only before, and they realize our commitment to ensuring success of our clients.

How are your backups?

 

Zach Bundun
IT Consultant

Leave a Reply

Your email address will not be published. Required fields are marked *