Recent Posts
Links
Archives
Categories
Anatomy of a Solid State Disk Failure
Solid state drives can fail just like anything can. I’ve been told that they fail gracefully and I’ll have enough time to rescue my files. Is that true? Read on.
For a couple weeks my home PC’s SSD (a 60GB OC-Z Agility 1 I bought two years ago on sale) has been reporting errors when I do a disk check – just one or two until this week, when hundreds of files and other artifacts have been lost – no idea what, but Windows dutifully counted them out. And recently Windows started showing signs of flaking out on me – for example, Event Viewer won’t load, but I could read the event files from another program. After running SFC (System File Checker), I confirmed that some system files are corrupted… including some used by Event Viewer. Apparently, when the drive wrote data, it didn’t verify, fail the write and use another sector like I had believed – maybe only enterprise class drives do that; this one wrote the data and continued on just like a consumer class magnetic drive would. Fortunately, I bought my wife a shiny new 120GB SSD for Christmas, and it sat on a table for over three months, waiting for me to prioritize it. Last night, it became mine. I created an image backup of my 60GB SSD that I bought a couple years back for a boot disk. I used Acronis True Image Home 2012, because it’s been recommended to me a few times.
I’m impressed – Acronis finished the backup in about 15 minutes, mostly due to the external drive’s USB 2.0 interface speed being maxed out. It even compressed by 50%. To see how fast it CAN get, I copied the file to my super-fast RAID drive, made a boot CD, installed the new SSD (an OCZ Agility3 120GB, bought on NewEgg for about $110) and booted directly into Acronis. It let me choose the backup file from the external drive or internal, so I chose internal for speed. Formatting, validation and restore completed in 8 minutes, sustaining writes at about 200MB/sec which barely challenged the new drive. A later file copy test showed the same speed and the drive was only about 10% ‘busy’ when the other one would be 100%. So I’m pretty happy with the new drive and the backup tool.
So, how did I know the disk was failing, and what does it look like when a 2010-era SSD fails?
Symptom 1: Changes don’t seem to save. Programs (including parts of Windows) crash randomly, and say that a file is corrupt. If you see that, scan your disk – but be aware, if you tell it to fix, it will amputate to save the patient. Windows 7 recovers files, but I have no idea where my recovered files went, or which ones were lost. If you tell it to scan for bad sectors, watch the screen constantly - “33 bad clusters found/marked” flashed by and the machine rebooted. I can’t see any report, possibly because the event log files are corrupted. As soon as ou see bad clusters, run, don’t walk, and get a replacement disk. Every disk write operation could be corrupting files. Windows 7 is much more resilient than XP, but nothing’s immune.
Symptom 2: If you run SFC /VERIFYONLY or SFC /SCANNOW at an admin command prompt, it may tell you system files are corrupted. This seems odd because system files don’t change, and SSDs are supposed to fail only on writes. But remember the updates? Yes, Windows is constantly updating OS files, recompiling the .Net framework and updating Windows Search indexes – which on an SSD uses new sectors, some of which may be bad. So in theory I could install the old drive as read-only and it won’t give more bad sectors. Anyone want to place bets?
I’ve encountered spinning drive failures before, and in almost all cases it was a controller or armature type failure. For the low, low price of $3,000 I could get the data restored by a rescue service – I prefer prevention, thanks. I’ve done file backups for years, but if I ever lost my boot disk, I’d lose hundreds of hours to reinstalling apps. I think I’ll be keeping OS image backups from now on, and I think Acronis is doing a good job – I’ll have to test the incremental backup, but seriously, an sector-by-sector image in under an hour, without a reboot? That’s worth money. I have no vested interest in Acronis, so if you know of a similar product that does well, please mention it in the comments.
So, would I rely on an SSD for my important stuff? Yes, with backups. Because, everyone should have backups anyway. I think most SSD media failures would show warning signs early, so a regular full ‘checkdisk’ scan is a good idea for home users. The speed boost is so incredible compared to a spinning disk, that the ‘new’ technology (really, it’s not that new) is worth the unknowns. I absolutely would use enterprise-class SSDs in a production environment, subject to cost/benefit analysis, as long as I had them set up as RAID10 or RAID5 (as appropriate) and a few spare drives on hand (which is what enterprise IT should do anyway, right?). And, your backup tool should be easy enough to use that you can do a restore test on a whim or on a schedule. That goes for any type of backup.
SQL Saturday 132 – Pensacola, FL
Have you ever wanted free training on using, maintaining or developing for Microsoft SQL Server, along with the opportunity to network with your peers and other SQL experts? The Professional Association for SQL Server (PASS) created SQL Saturday for exactly that reason:
PASS SQLSaturday’s are free 1-day training events for SQL Server professionals that focus on local speakers, providing a variety of high-quality technical sessions, and making it all happen through the efforts of volunteers.
-SQL Saturday home page
SQL Saturdays are amazing regional events – hundreds of DBAs and developers shuffling between sessions, trying to decide between two great presentations during the same hour, hyped up on coffee and chattering about the session they just left. This year’s event in Pensacola is the biggest for me because I’m actually organizing it… with a lot of help. I never imagined the scale of logistics for securing venue, speakers, sponsors, attendees, coffee, breakfast, coffee, lunch, more coffee, parties, etc. I’m fortunate to have the support of prior experts along with a national team from PASS to make this event better than ever.
If you’re planning to be in the neighborhood on June 9, register at the link below and come see what “community” means in the SQL Server world. We also have an amazing pre-con planned for Friday, June 8 – two speakers will be presenting “Demystifying Database Administration Best Practices“ – a full 8 hour day of training from two Microsoft Certified Masters, for just $99. Yes, I’m serious. That’s like getting the Hope Diamond in a box of Cracker Jacks.
If you’d like to join us, register for SQL Saturday 132 in Pensacola, FL today.
First Post!
Happy birthday to SQLJuJu.com!
Once I get layout and appearance decided, I will blog about my adventures in the DBA world of SQL Server and related tools.
This is actually a project I’d started after PASS Summit 2011, but external influences and a million excuses kept pushing it to the back burner. There are always excuses for not blogging -
- “I don’t have anything to contribute.”
- “I don’t have time.”
- “Nobody’s going to read it anyway.”
In reality, those are only true because of my own actions. It’s time for that to change. Stay tuned.
Recent Comments