Today is March 31, World Backup Day. http://www.worldbackupday.com/
This year, World Backup Day is also Easter. What do these days have in common? Resurrection. Think about it.
You’ve probably heard someone tell you to back up your home PC or your servers. If you’re like most people, you’ve ignored that advice – you’ll get to it one day. My intent with this post is to convince you to take action. I’ll write this for the home user, because honestly if you’re an IT person in charge of a company’s data, you’d better have this memorized already.
There are a few components absolutely essential to any backup plan. No, strike that – disaster recovery plan. Don’t just think of it as running backups. You are planning for disaster: imagine a fire or explosion at your house completely burns it down with no injuries. What do you do about your data? If it’s important, you need a plan:
- Decide what data you absolutely must have immediately. This data is going to your local backup first. Are you in the middle of doing your taxes, negotiating a contract, or working on someone else’s business needs? If your PC dies, will you know where your very busy family needs to be in the next few days?
- Decide how long you can be without the other data. Hours? Days? This will determine your remote strategy. You must have a remote backup of some kind.
- Decide how much data you can afford to lose. Can you lose the last few minutes, hours or days? This will determine your backup frequency and type.
In business we have a term “point-in-time recovery” : in order to keep international systems running, we spend a LOT of money and time to make sure we don’t lose even minutes. You probably don’t need a million-dollar solution but you do need a plan.
- This is the toughest decision of all: what service or technology to use. For home users, you have hundreds of options from the native OS tools to third party elaborate solutions. There are several very good options, and some are better to your particular needs. To save time, I’ll recommend a short list:
- Acronis True Image Home (I use this for local full-image backups)
- CrashPlan (I use this one)
- Mozy (I have used this for years. I’m in the process of moving everything to CrashPlan)
- DropBox (not 100% secure; the DB employees can read your data)
- USB drive – Copy only a few files to a USB drive, in folders with the current date. (flash disk or solid state, not a spindle) and – trust me on this – drive or mail it to another state. (Also not secure unless military-grade encrypted with a key not tied to your soon-to-be-dead PC)
Options I do not recommend:
- Windows native backup tools. Sure, they work, but the amount of effort to get you working again is too great.
- Anything that requires you to remember to take action; an automated process is the best. You will forget or say, “it can wait.” Or you’ll need a version of a file two backups ago and you’ve overwritten it already. Or worse, you had a virus, and the backup is corrupted. Been there.
- Implement your solution. Each of the above has a dozen page manual on how to implement it. What’s second-most important is that you do. Because this part is rather involved, I’ll detail it below.
- Test your backups. This is the most important step of all. On a regular basis, pretend you’ve had a disaster: human error, fire, flood, zombie apocalypse, whatever. Restore a few files to be sure you can. This also will let you know if something – like a virus – is corrupting your backups.
Implementing your solution
As you choose the technical portions of your disaster recovery solution, you will need:
- Local backup – a hard disk with a few terabytes free space, or a few times larger than the data you want to back up. You’ll want room to keep multiple versions for a history.
Ideally, every day or week, take a full image of your primary hard disk through Acronis or something else that does such things. I use a monthly full image with daily updates. So, for a 200GB boot drive I’ll have a 200GB file and 29-30 files of a couple GB each.
This way, if my primary drive dies I can run to Best Buy, get a new disk, get back online in about an hour. I use my home PC for database consulting; I don’t have the luxury of being offline for a day. The average home user can wait longer, but if you’re calling your family geek to “come fix my computer,” trust me, they want that chore to end as efficiently as possible. Restoring mom’s PC is not play time, it’s more like rebuilding a porch.
- Remote backup – an online service or other method to get your data to another state, just in case yours is hit by apocalyptic weather. I’m serious, go out of state; major storms in Alabama can strafe several major cities, so it’s possible for a tornado to hit my house in Mobile and another one from the same storm wipe out a relative’s house in Birmingham.
I have the CrashPlan (“cloud based”) service read my most important data files and transmit them to their servers, using a long encryption key I chose. Nobody, not even the NSA, can read my tax returns. I can go to my work PC or a friend’s house and restore a few files over the web. If I want the whole thing immediately, I can pay $100 or so and get a USB drive overnighted to me. Acronis seems to now have a similar cloud service, but I’ve already paid for a year of CrashPlan so I have time to check that one out later.
How quickly do you need your data back? If you need to be back online really fast, you need to make more backups which takes more space – usually daily full backups, hourly log/updates, to a local disk with a regular transfer offsite. The amount of time spent running the backup is usually inversely proportional to the amount of restore time. So, the fast monthly full-backup with hourly change log, will take a long time to restore today’s data.
- Retention policy
How much data do you want to keep, measured in number of changes? If you changed a document a month ago and now want to see what it looked like then you must keep more than a month of history. Keep in mind, if you changed it ten times between two backups, you can only see the two snapshots seen at the exact time of the backups, not the ten changes. That loops back to the strategy above.
When do you want to run your backups? I leave my home PC online 24×7, and run the more intensive backups at night. What’s important here is that backups don’t disrupt your work.
- A healthy dose of paranoia
What happens when the restore fails? Do you want or need a second backup disk, a second service? What happens if someone breaks in to your house and steals your PC? Thieves will take your backup disk too – more reason for a remote backup.
Why is this so important? In my 25+ years as a developer and DBA, I have seen and experienced disaster. It is always painful to hear a company rep say, “It looks like our backups stopped working three years ago.” I’ve seen entire companies just shut down because of it. I’ve lost gigabytes of my own data because I just didn’t make a DR plan a priority. Lesson learned: Ask the what-if questions.
In my younger days, I ran the “computer department” for my father’s company – two PCs on a network, hosting an automated payroll system that I designed. I encrypted the database because I could, but I forgot one detail. I had backups from PC 1 going to PC 2, because we didn’t have the budget for a tape drive. When PC 1′s hard disk failed on a Wednesday, I went to PC 2 to restore the payroll database. Wouldn’t you know it, I forgot to back up the master encryption key. The backup was completely useless, and obviously untested. Payday was Friday, and in about 38 hours 200 hard-working native Guatemalan factory workers really wouldn’t care about encryption or whatever tech mumbo jumbo excuse I could throw their way, so the head accountant manually retyped an entire 13 days data – over a 37 hour stretch, finishing 4pm Friday. None of them knew how close we came to having a riot instead of payday. Lesson learned: Test your backups … by restoring them.
Also test the restore time; in a business you may not be able to handle a multiple-hour delay. If your backup strategy is a one-time full backup with four years of change log, it’s going to take days to restore recent data – the restore application has to go through every change to determine what’s the newest. In a 24×7 business with 99.999% uptime requirement, that’s apocalyptic. That’s why businesses that hold years of data like health or financial information spend six or seven figures on disaster recovery plans – in most cases, data can be restored before most customers even notice. And if you’re a regular home user, you’d rather be doing something else with your scarce time, wouldn’t you?
Now, the majority of the above is worded for a typical person with one or two home PCs, but the concepts apply to business as well. If you’re in charge of databases or systems for a business, your entire stream of income depends on the ability to recover from disaster. Instead of worrying about how much a proper backup system will cost, ask yourself: What will it cost if we lose a week or a month of data? What if the CEO loses his address book ? Or if someone loses the source code to this 10-year-old application that controls half of the income ? What if we lose the entire billing history (including unpaid receivables) for our 20,000 customers ? These are actual scenarios I’ve seen where people did not have backups. In case 1, a $3,000 emergency disk repair recovered 65% of the data, and some more from painstaking manual reconstruction. In case 2, it cost $200,000 in developer salaries to rewrite the application. In case 3, I never heard from the company again. They were completely dependent on the computing solution, so it had to be rough.
What is a
backup disaster recovery plan worth to you now? Any more or less than before you read this post? Let me know.
(Work in progress – testing – Embedding doesn’t seem to be working if saving in Visual mode. Probably a WordPress thing.)
Direct Links for the below visualization:
I have to admit, I felt a bit like I did during my senior year of high school when I read the words, “You have been ACCEPTED to speak at SQL Saturday #150 Baton Rouge!“ This is my first time submitting a session to a “real” event… previously I’ve spoken in front of coworkers at the office, and my own local Pensacola SQL Server Users Group. But a rare opportunity and gumption crossed streams, and my brain decided this was the right time to take the next bold step.
My session, Developer vs DBA: Can They Coexist? is scheduled just after lunch, and a fed audience is less likely to fall asleep or run out. I don’t expect a kumbaya-singing campfire, but if I do my job well, a few in the audience will leave a step closer to understanding why DBAs restrict developer activity in production, and why developers really do want to just get into production and fix stuff – immediately, and how with planning and communication the two sides can coexist without a policy of Mutually Assured Destruction. I will also discuss some big mistakes I’ve made, and pitfalls that I’ve run into, because the road from Wild West to Seamless Integration is never complete, and never, ever a straight line.
The full schedule is available to view here, and I recommend planning well ahead because there are ELEVEN TRACKS! I have enough trouble deciding what to order on a prix fixe menu. It’ll take me a week to plan out my route on August 4. I also really, really hope we’ll be using the Guidebook app and have live tweets of any schedule changes. I’ve been to conferences with 8-10 tracks… speakers cancel, schedule shifts happen, and the attendees only know how confusing or organized an event was, meaning the organizers have to work a lot harder to keep each change seamless. I may offer to volunteer in some capacity (i.e. room monitor/proctor) but I’ve recently volunteered at Birmingham and organized Pensacola, so I intend to be a little more selfish for this one.
If you have questions about my session, or recommended content, leave a comment here. I’m open to including your influence in my first presentation.
Thanks to William Assaf and the Baton Rouge SQL Server User Group for selecting me!
This past Saturday was the first time I’ve organized a SQL Saturday, and it spawned a number of other firsts. Each of them presented its own challenges and benefits, and while I was exhausted for days afterwards, I would not hesitate to repeat the experience.
Organizing the event involved coordinating venue, sponsors, food and drink supplies, volunteers, speakers, sessions, dinner parties, marketing and a plethora of other things. The weeks leading up to the event were stressful enough, but then came Game Day – where failure is not an option. This is where being a DBA comes in handy – a great DBA learns to be prepared for not just one, but multiple failures. And boy did we have a challenging day.
The pre-con went smoothly – so well that I actually got overconfident for Saturday. Karma had plans to humble me. Note that just a few days prior, the weather reports predicted a completely sunny day, and 20% chance of rain on Saturday. Friday night, halfway through our speaker dinner, rain and wind came out of nowhere and forced everyone inside. I’m told the food was great, though I was late due to setting up tables after the pre-con – just in case something happened on Saturday, I didn’t want the sponsors to be waiting on something as simple as tables.
Saturday morning, Dunkin Donuts didn’t even know about our coffee order. Or donuts. For 200 people. Have you ever seen 200 geeks without coffee… and donuts? That’s a red alert if I’ve ever heard of one. So after I made it clear on the phone that someone on their side dropped the ball, I drove over and they only had half the coffee we ordered… because they ran out of coffee totes. And it started raining – not so great for donuts. Victor Rojas, on the user group board and a great volunteer, helped me load up the truck, and we were soaked before even leaving. We brought back our partial order just 20 minutes late, and within minutes had Panera able to fill in the gaps just in time for the 10am official coffee break – we were about 10 minutes late, but I was told by an authoritative source that almost nobody noticed – and I was paid a huge compliment for handling it smoothly.
Around 11am we noticed that not only had the rain not let up, it was pouring off the roof like Niagara Falls… [ Video - Rain in the courtyard (video by Jeannie Holmes) ] and we had started to see leaks inside. The courtyard drain wasn’t doing much, and the parking lot had water up to the floorboards of some cars. We also heard that a manhole cover blew up at 9th and Airport, sending debris everywhere – and the intersection itself was 2-3 feet deep in water. The Cordova Mall parking lot had some cars submerged halfway to completely. Right around then we get a call that the auditorium is flooding – right where the speaker is standing near electrical equipment. So, we evac the auditorium and send everyone to another room. It ended up being a good thing that we reserved 8 rooms for 6 tracks, because the speakers needed one and we lost one due to flooding. With a new room came an equipment challenge – video cable is too short. Fortunately, Sven Aelterman had an extension he was kind enough to loan to the room for the day. Again, redundancy pays off.
But life would throw us a new challenge: it was now lunch time, and we’re surrounded by a 3 foot moat. Assuming Lenny’s Subs’ driver could actually drive to the school, could we get the food in? We didn’t have the option to fail. We called up every available volunteer, and the driver eventually found her way to a spot where we could load up. 200 boxes of subs came in on a human conveyor belt, about 30 minutes late, but we managed to feed everyone and have leftovers. That driver got a very good tip, and that Lenny’s Subs is officially on my High Trust List – the manager even called later to make sure everything was fine.
At this point, if I were asked to organize a SQL Saturday, my response would have been, “Hell No – not even if you paid me.” And then, just as I was about to Google plans for an ark, it mostly stopped raining. Now I *know* nature is screwing with me.
The rest of the day went swimmingly (pun intended) by comparison, except for one item. We called the after party venue to confirm, and they said while they were fine, they were an island in the middle of downtown Pensacola – roads around them were under several feet of water, and most of downtown was without power. I can’t in good conscience send 200 people into that, so we decided to cancel the after party. Attendees understood, but were understandably disappointed. Hopefully the swag and door prizes eased their pain a bit. There’s not much redundancy you can plan in a party for 100-200 people when the entire venue is unreachable.
After most everyone left, our volunteers worked hard to clean up and the only evidence of our presence was a large pile of trash bags. That’s important if we want to be invited back next year – Pensacola State College has been very generous to host our monthly meetings and SQL Saturdays, and we want that to continue.
I have seen “SQL Family” in action – organizers and volunteers worked together to make sure the attendees, sponsors and speakers had a good show despite the apocalypse outside. Somehow, it all worked – I received only compliments, no complaints – and that blew me away. Even better, I was told that the leftover lunches we donated to homeless at the Waterfront Mission were especially appreciated, because their building was under 5 feet of water – and Victor went out of his way to find their new location. Victor, you rock.
All said, if I were asked to host another SQL Saturday, my answer is “When and where?”
I don’t know how you consultants do it. I’ve traveled for a week at a time, twice in two months – and have one more trip planned. First to my company’s home office in Omaha at the end of March for Oracle OBIEE end user training. Second, to our sister company in Columbus Ohio for OBIEE Admin training. I’m writing this on the plane back from Columbus, at 10:40pm Central time – and I’ve been up and active since 6:00am. In mid June I go to Atlanta for a week of Oracle ODI training; I may just drive that since it’s closer. That’s a year’s worth of travel in just three months – nothing compared to those who travel to SQL Saturdays and other conferences every week, but way outside my comfort zone.
I once wondered if I was cut out for the consulting career path, and have been actively working to overcome every barrier and objection within me. I’m not accustomed to pulling five 14 hour days in a row – but that’s what I did this week, and the trainers did the same or more. I know, we’re all awake those hours anyway, but my mind isn’t accustomed to so much deep thought and shop talk – 8+ hours of training and then discussing architecture to, during and after dinner. I’m definitely not complaining – it was nice to discuss technical design and lessons learned with a brilliant team. I’d compare this to running a 5K after being out of practice for ten years. I need to grow the brain muscles again and get out of my comfort zone (aka daily grind). The past two trips have definitely done that.
I’ve been invited to be a part of a team and project that, if successful, can transform a multi-billion-dollar business. Wait, what’s with the “if?” This team has some of the best minds I’ve ever worked with, and we’re backed by world-class trainers who have trained teams on even larger companies. This project will succeed, we’ll all grow with it, and the company’s quest for world domination will accelerate.
Had I been asked to lead this, I would have chosen different tools, based on my own experience and prejudices. I forced myself to unassume that Oracle was too big and expensive, and I can see how it’s a better fit for our purposes. I forget where I read this, but I’ve tried to apply the idea that “the teacher must become the student.” I can lead in what I know, but we all have to keep learning in order to grow. I’m sure the trainers this week learned a thing or two as well.
Exit your comfort zone. Go learn something you’ve told yourself you’d hate, or that seems too snobby, but this time keep an open mind. Your new experience doesn’t have to be technical or a long trip – it can merely be trying a new food. Make sure you always challenge your own status quo, and you will grow without even realizing it.
Solid state drives can fail just like anything can. I’ve been told that they fail gracefully and I’ll have enough time to rescue my files. Is that true? Read on.
For a couple weeks my home PC’s SSD (a 60GB OC-Z Agility 1 I bought two years ago on sale) has been reporting errors when I do a disk check – just one or two until this week, when hundreds of files and other artifacts have been lost – no idea what, but Windows dutifully counted them out. And recently Windows started showing signs of flaking out on me – for example, Event Viewer won’t load, but I could read the event files from another program. After running SFC (System File Checker), I confirmed that some system files are corrupted… including some used by Event Viewer. Apparently, when the drive wrote data, it didn’t verify, fail the write and use another sector like I had believed – maybe only enterprise class drives do that; this one wrote the data and continued on just like a consumer class magnetic drive would. Fortunately, I bought my wife a shiny new 120GB SSD for Christmas, and it sat on a table for over three months, waiting for me to prioritize it. Last night, it became mine. I created an image backup of my 60GB SSD that I bought a couple years back for a boot disk. I used Acronis True Image Home 2012, because it’s been recommended to me a few times.
I’m impressed – Acronis finished the backup in about 15 minutes, mostly due to the external drive’s USB 2.0 interface speed being maxed out. It even compressed by 50%. To see how fast it CAN get, I copied the file to my super-fast RAID drive, made a boot CD, installed the new SSD (an OCZ Agility3 120GB, bought on NewEgg for about $110) and booted directly into Acronis. It let me choose the backup file from the external drive or internal, so I chose internal for speed. Formatting, validation and restore completed in 8 minutes, sustaining writes at about 200MB/sec which barely challenged the new drive. A later file copy test showed the same speed and the drive was only about 10% ‘busy’ when the other one would be 100%. So I’m pretty happy with the new drive and the backup tool.
So, how did I know the disk was failing, and what does it look like when a 2010-era SSD fails?
Symptom 1: Changes don’t seem to save. Programs (including parts of Windows) crash randomly, and say that a file is corrupt. If you see that, scan your disk – but be aware, if you tell it to fix, it will amputate to save the patient. Windows 7 recovers files, but I have no idea where my recovered files went, or which ones were lost. If you tell it to scan for bad sectors, watch the screen constantly - “33 bad clusters found/marked” flashed by and the machine rebooted. I can’t see any report, possibly because the event log files are corrupted. As soon as ou see bad clusters, run, don’t walk, and get a replacement disk. Every disk write operation could be corrupting files. Windows 7 is much more resilient than XP, but nothing’s immune.
Symptom 2: If you run SFC /VERIFYONLY or SFC /SCANNOW at an admin command prompt, it may tell you system files are corrupted. This seems odd because system files don’t change, and SSDs are supposed to fail only on writes. But remember the updates? Yes, Windows is constantly updating OS files, recompiling the .Net framework and updating Windows Search indexes – which on an SSD uses new sectors, some of which may be bad. So in theory I could install the old drive as read-only and it won’t give more bad sectors. Anyone want to place bets?
I’ve encountered spinning drive failures before, and in almost all cases it was a controller or armature type failure. For the low, low price of $3,000 I could get the data restored by a rescue service – I prefer prevention, thanks. I’ve done file backups for years, but if I ever lost my boot disk, I’d lose hundreds of hours to reinstalling apps. I think I’ll be keeping OS image backups from now on, and I think Acronis is doing a good job – I’ll have to test the incremental backup, but seriously, an sector-by-sector image in under an hour, without a reboot? That’s worth money. I have no vested interest in Acronis, so if you know of a similar product that does well, please mention it in the comments.
So, would I rely on an SSD for my important stuff? Yes, with backups. Because, everyone should have backups anyway. I think most SSD media failures would show warning signs early, so a regular full ‘checkdisk’ scan is a good idea for home users. The speed boost is so incredible compared to a spinning disk, that the ‘new’ technology (really, it’s not that new) is worth the unknowns. I absolutely would use enterprise-class SSDs in a production environment, subject to cost/benefit analysis, as long as I had them set up as RAID10 or RAID5 (as appropriate) and a few spare drives on hand (which is what enterprise IT should do anyway, right?). And, your backup tool should be easy enough to use that you can do a restore test on a whim or on a schedule. That goes for any type of backup.
Have you ever wanted free training on using, maintaining or developing for Microsoft SQL Server, along with the opportunity to network with your peers and other SQL experts? The Professional Association for SQL Server (PASS) created SQL Saturday for exactly that reason:
PASS SQLSaturday’s are free 1-day training events for SQL Server professionals that focus on local speakers, providing a variety of high-quality technical sessions, and making it all happen through the efforts of volunteers.
-SQL Saturday home page
SQL Saturdays are amazing regional events – hundreds of DBAs and developers shuffling between sessions, trying to decide between two great presentations during the same hour, hyped up on coffee and chattering about the session they just left. This year’s event in Pensacola is the biggest for me because I’m actually organizing it… with a lot of help. I never imagined the scale of logistics for securing venue, speakers, sponsors, attendees, coffee, breakfast, coffee, lunch, more coffee, parties, etc. I’m fortunate to have the support of prior experts along with a national team from PASS to make this event better than ever.
If you’re planning to be in the neighborhood on June 9, register at the link below and come see what “community” means in the SQL Server world. We also have an amazing pre-con planned for Friday, June 8 – two speakers will be presenting “Demystifying Database Administration Best Practices“ – a full 8 hour day of training from two Microsoft Certified Masters, for just $99. Yes, I’m serious. That’s like getting the Hope Diamond in a box of Cracker Jacks.
If you’d like to join us, register for SQL Saturday 132 in Pensacola, FL today.
Happy birthday to SQLJuJu.com!
Once I get layout and appearance decided, I will blog about my adventures in the DBA world of SQL Server and related tools.
This is actually a project I’d started after PASS Summit 2011, but external influences and a million excuses kept pushing it to the back burner. There are always excuses for not blogging -
- “I don’t have anything to contribute.”
- “I don’t have time.”
- “Nobody’s going to read it anyway.”
In reality, those are only true because of my own actions. It’s time for that to change. Stay tuned.