View Full Version : Uh oh.
Miktar
18-10-2007, 02:06 PM
As of 2:00pm, our server RAID failed too many drives *at the same time*. The statistical odds, are like lightening striking your car outside.
Our server contains a *lot* of data, and we're busy reconciliating our various backups now. I realized however, that if these drives do not come back online, I will lose all my rAge 2007 footage, which was so huge it could only be stored on the server.
NAG and SACM are fine, from the looks of it, and we're still examining the server to see what we can recover. NAG and SACM won't be late, don't worry - but this is going to really put a damper on the office atmosphere.
Gallagher
18-10-2007, 02:14 PM
NOOOOOOOOOOOOOOOOOOOO! :'( Sad times indeed... Prob why gallery.tidemedia.co.za/ is offline?
-Bouncer-
18-10-2007, 02:17 PM
That really sucks.... :( I hope that all isn't lost. Is there any specific reason why it happend?
Miktar
18-10-2007, 02:18 PM
Um, no. Our hosting is in Germany, FrancoisWiid. All our websites are on one server: the forums, gallery, sacm, nag, oldskool - so if you can see the forums, you *should* see the gallery.
But I didn't know the gallery was down, I'll look at it now.
The server that failed, is the one in our office.
Miktar
18-10-2007, 02:18 PM
That really sucks.... :( I hope that all isn't lost. Is there any specific reason why it happend?
A lot of theories, no solid evidence yet.
Miktar
18-10-2007, 02:23 PM
We're sending it off to a data recovery lab now - we can't spin the drives up ourselves, this is going to require proper drive recovery equipment, which we don't have.
CaptainCrunch
18-10-2007, 02:27 PM
Eish, not good.
Also experienced that at my previous job, while we were having backup issues to boot.
Faulty RAID controller killed 2 drives simultaneously.
Coupla thousand Rands(quote was somewhere in the region of R8000) worth of expert drive recovery...gave us most but not all of the data back.
Azimuth
18-10-2007, 02:29 PM
Hey, didn't NAG run an article a few years back about data recovery? According to you, total irrecoverable data loss is virtually impossible. So I'm cautiously optimistic. Which is really quite out of character for me.
Frozenfireside
18-10-2007, 02:29 PM
What RAID level do you run? 5? 10?
If you are running RAID5 (which is very popular) then the odds of loosing your data are in the billions because it works so well.
You need a-lot! of HDD's to fail to loose data.
brazed
18-10-2007, 09:20 PM
Holding thumbs, I'm sure we all are. Best of luck.
James Donaldson
18-10-2007, 09:34 PM
I just had the image of a worried looking Dragotaur pacing in front of a Server Operating Theatre.
Hope it pulls through. We'll send balloons and get well cards
Miktar
19-10-2007, 10:27 AM
We'll know what the data recovery guys say today. I realized this morning, all my Steam games are backed up to there, as well as ALL my MP3s I've been buying over the last two years. ;..;
Takiro
19-10-2007, 07:22 PM
Ouch dude! you have my simpathy
woah bummer man, best of luck.....if you find the person responsible point me at him, I'll use my chuck norris like powers against him
then again Disleckia's got ninja powers so you could use him instead
Ruandre
20-10-2007, 11:17 AM
Damn, that sucks. Here's hoping you recover most (if not all) of the files.
Miktar
22-10-2007, 10:02 AM
We're still waiting. It's going to cost over R30,000 to recover the data (more than it cost to build the server).
NAG server = Abyss
Supermicro H8DCE-O
Opteron 265
4GB (2x2GB) ECC DDR-400
Radeon x600 (you don't use system memory for video on a server kids, that's just retarded overhead to your memory bus)
Areca ARC-1230 RAID controller
8x Western Digital RE2 500GB in RAID6
Thermaltake Eureka
Aopen AO-700 PSU
The server falls under my domain because I was the one who built this particular machine, and we don't have a dedicated IT staff so you can imagine what my life is like right now. We already had one drive (#6) down from a previous failure, and I was waiting for a replacement, which came in that day. Found out another drive (#8) had failed in the mean time, killing the redundancy. The third drive (#7) failed while I was standing there trying to shut it down so I could install the replacement and get some redundancy back.
Two drives (#7 and #8) were just click on power, they wouldn't even spin up properly. Plus I don't know if #8 had just failed or failed a while ago, so no idea what state its data is in. The only hope we have is to recover #7, which is the one that failed last and brought the array down.
So far evidence points to a faulty set of drives. 4 of the drives (#1-4) came new when we built the server. The other 4 (#5-8) were recycled from our old standalone RAID5 NAS server that used to hold a lot of our data. All 3 drives that failed were from that set, and had been showing random minor problems beforehand. I suspect they ran too hot in the NAS for the time we used them there and when the server room heated up they just crapped out.
Frozenfireside
22-10-2007, 11:31 AM
Thankyou for choosing AMD.
RAID 6?? *scuttles away to lecturer and asks about it*
Moral of the story? Information is more valuable then hardware.
Thankyou for choosing AMD.
RAID 6?? *scuttles away to lecturer and asks about it*
Moral of the story? Information is more valuable then hardware.
Mmm yes I never doubted that, a machine is only worth what you use it for. :)
I'm wondering where the charred remains of our previous dual Xeon server are. I'm thinking of possibly using that so we can have two machines up, if we could get enough parts to fix it.
As for RAID6, dual parity stripes. Same as a RAID5 except there's two parity calcs per stripe set, which rotate the same way. Why bother? Because a two drive failure can still kill a RAID10 or 0+1 if you're unlucky and take down a complete mirror. But it takes three drives to kill a RAID6. What are the odds you hit the two exact drives to kill the RAID10 vs a third drive? Math time!
Assuming eight drives, each drive has an equivalent Mean Time Between Failure (MTBF):
RAID10 you need to kill two drives in a matched pair. Just two kill two drives you need a probability of the MTBF squared (though there's some complexity of what unit of time you use for this function but let's just use hours and we can say it's fairly improbable).
Then we have to hit a mirror. So what are the odds that the second drive failure is the mirrored partner of the first? 7 remaining functional drives, so 1/7.
So RAID10's failure probability is 1 / ( MTBF^2 * n-1) where n is the number of drives in the array.
What about RAID6? You need to kill any 3 drives at once. That's MTBF cubed.
RAID6 = 1 / (MTBF^3)
So, for MTBF > n-1, RAID6 is more secure. Most drives have an MTBF of around a million hours. Let's say it takes a pessimistic 10 hours to rebuild the RAID. So that's a 100,000:1 chance of a fail during rebuild. I doubt anyone's running a 100,002 drive RAID10 anywhere.
Oh and if you want the lightning strike number on what Abyss' failure was, the WD RE2 has a 1.2 million hour MTBF, and I know it only takes our array 3 hours to rebuild. So 1.2M/3 cubed = 64,000,000,000,000,000 : 1.
Icenflame
22-10-2007, 01:02 PM
what are the chances... sheesh that never good news... as it's mentioned the data on drives is far more valuable that the actual hardware. I mean all the memories and saved games the tunes one listened to while banging out articles it's all memories...
Well best of luck in retrieving what is most important and hope it works out for you guys...
Frozenfireside
22-10-2007, 01:18 PM
I reckon you start again GLDM.
Get your data back and replace any old hardware.
It's old hardware that caused this crap and you want to put old hardware back into the system.
It's not cheap but compare it to data recovery and it makes alot of sense.
A beer says it fails again if you put old hardware back in.
I reckon you start again GLDM.
Get your data back and replace any old hardware.
It's old hardware that caused this crap and you want to put old hardware back into the system.
It's not cheap but compare it to data recovery and it makes alot of sense.
A beer says it fails again if you put old hardware back in.
We'll be dumping the 4 older drives as soon as it's back. I'm thinking of using the 2nd machine as just another backup layer. We're also expecting to get another array of drives sometime next year.
Frozenfireside
22-10-2007, 02:53 PM
hmmm.
Up to you but unless your backup to the second server is efficient, you might not benifit as much as you want from the 2nd layer but considering your current situation-you might have to take what you can get and be happy.
Next year<?> You might need it now.
Azimuth
22-10-2007, 02:57 PM
You say this as though it weren't already glaringly apparent.
hmmm.
Up to you but unless your backup to the second server is efficient, you might not benifit as much as you want from the 2nd layer but considering your current situation-you might have to take what you can get and be happy.
Next year<?> You might need it now.
Yes well if you can materialize a free rackmount drive cage and about 4TB of redundant disk space out of thin air right now, go ahead. Otherwise we'll have to wait until we can afford to get all that together.
We did have a backup system that was supposedly in place, only we found out it wasn't being done as often as prescribed. Someone got tired of waiting for backups to write to the portable units for offsite storage every night before they could go home, and so we were missing a bit more data than I would have liked. Also we've kinda added a few more things that weren't budgeted in the offsite storage space, so I'll have to take another look at how we're going to handle that. We may have to switch to an on/off rotation if we've got so much data that backup is taking hours instead of minutes now.
I'm probably going to argue for adding an additional drive for hotspare when we fit the replacement drives in the server. In a perfect world we'd have a cluster running DFS but then we'd also have an IT staff and not just have our writers wearing extra hats. However that's pretty unlikely to happen so we're kinda stuck with whatever we can keep running. If we hadn't been running RAID6 though, we would probably have lost everything 5 or 6 times over back when we were having the power issues in that server room.
Frozenfireside
22-10-2007, 06:07 PM
I would make the 'lazy sod' pay for the whole cost of this little incident.
Funny the 'free beer for a year on return of stolen notebook with data on it' popping up at the same time as this.
as for the additional storage~Yeah, that would cost a bit. Didnt know the extent of your hardware requirements. Miktar must have alot of MP3's! :-P
Still I hope to do someform of data protection company in the future and I would make sure your data was backed up :-)
Whats the current storage capacity of the system?
Miktar
23-10-2007, 10:38 AM
as for the additional storage~Yeah, that would cost a bit. Didnt know the extent of your hardware requirements. Miktar must have alot of MP3's! :-P
Under 1GB.
But each NAG issue uses up far more gigage than you think - and remember, we're not just one magazine.
Frozenfireside
23-10-2007, 01:30 PM
yeah your like 4 are you not?
How much space is one issue? (eg november)
Miktar
23-10-2007, 02:41 PM
yeah your like 4 are you not?
How much space is one issue? (eg november)
Dunno. I can't check. >.< The server is *gone*, after all.
ShadowMaster
23-10-2007, 02:48 PM
Under 1GB.
But each NAG issue uses up far more gigage than you think - and remember, we're not just one magazine.
Would I be right in guessing about 10gigs(DVD + Mag)?
Miktar
23-10-2007, 04:04 PM
The DVD alone is 9GB. I just found out the issues were, on average, 11GB per issue.
Frozenfireside
23-10-2007, 06:06 PM
11 Gigs!
Lots of pretty pictures do take up space but wow.
Uuuh-off topic but what was the reasons beind choosing a double sided DVD over dual layer?
I prefer Dual layer to use. I'm sure its cost.
Miktar
24-10-2007, 09:57 AM
What? It's not Double-Sided - do you have to flip the DVD over when you want to see what's on the other side?
We use double-layer, which is dual-layer.
Frozenfireside
24-10-2007, 11:24 AM
Sorry yes, I am thinking of the PC format DVD.
I relized after I shut down that you use the dual layer.
Shows you how much I care for the DVD when I've got uncapped.
My question still stands tho-Whats the price difference?
Miktar
24-10-2007, 11:48 AM
Dunno.
Powered by vBulletin® Version 4.1.11 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.