- A study by the Gartner Group concluded that 70% of all companies experiencing a serious data loss will be out of business within a few months as a direct consequence of the data disaster.
- According to the National Archives and Records Administration, 93% of companies that lost their data for 10 days or more filed for bankruptcy within 1 year.
- According to the same study 76% of businesses in America do not have an effective database backup system.
I have to admit that I have never confirmed these statements with the primary sources, but I have read them often enough in reputable publications that I feel safe in repeating them.
These often-quoted statistics have been brought home to me recently by an unusually high number of calls from people who have lost their data. As a result of these calls I set aside the article I had been writing for this month’s issue and decided instead to describe the elements of a simple but effective backup plan. It is a plan that you can implement today, using only the tools that you have already installed with your sql server.
If you already have a plan as you should, this is a good opportunity to confirm that it will do what it is supposed to do if a data disaster occurs. Many data losses occur in companies that thought they had a good plan. You must routinely test your recovery capabilities.
The important components of a good backup plan are the decisions you make, not the tools you use. In fact I am going to talk only about concepts and decisions. The SQL Server Maintenance Plan wizard is simple enough that you should be able to use it to implement the plan without much trouble, even if you are not a dba.
Start with a question
Creating the proper disaster recovery plan begins by answering a critical business question. “How much data can your business afford to lose?” This question should be answered by upper management, not the person who sets up the plan. Many data disasters result from delegating business-critical decisions to technical staff.
Zero risk of data loss is not possible. We can all imagine a series of events that would overcome the best of disaster recovery plans. However, the probability of those events occurring is small and the cost of preparing for them is large. The plan we are discussing is low cost but protects you against all but the most unlikely disaster scenarios.
All the data disasters I have seen could have been prevented if a simple plan like this had been implemented working properly.
An overview for the non-technical or semi-technical reader
In case you are not technical (I hope a lot of CEO’s are reading this) here is a brief overview and definition of terms that will help you understand what follows.
A full backup is a copy of your data taken at a moment in time. With it, your data can be restored to the time when the backup file was created but no further.
Log backups capture all the changes that have been made to the data since the last full backup. In conjunction with the last full backup, they can be used to recover your data to any point in time up to the moment failure occurred.
Since full backups contain all of your data, they take a comparatively long time to run, they put a significant load on the server and the backup file can be large. For these reasons it is often not practical to run full backups frequently enough for them to be your only data recovery option.
Since log backups only contain recent changes to the database they execute quickly and the files are a tiny fraction of the size of the full backup. We can safely run log backups as often as necessary.
What I am proposing is a disaster recovery plan that will recover data from a full backup and then bring it up to the current moment by replaying all changes from the subsequent log backup files into the database.
Lastly, don’t confuse the terms ‘log’, ‘log file’ and ‘log backup’. The log or log file is an active part of the database where records of data changes are kept. A log backup is a file containing a record of data changes extracted from the log itself.
Coming up with a Plan
There are a number of ways to design a good backup plan, but exploring all the variations is beyond the scope of this article. Below, we suggest a workable solution that is a compromise between cost, complexity and vulnerability to data loss. It may not fit all your needs but many of our clients have found it offers an acceptable basic level of security.
What to back up
System databases control the operation of the SQL Server and should have a regular weekly backup. The master database should also be backed up each time a new database is added or an existing database is removed from the server.
Log backups are not possible on the master or tempdb system databases and usually not necessary on the other system databases.
Your business databases usually need more frequent full backups in order to minimize recovery time in case of a data failure. The schedule can vary based on business need but the most common schedule we find in enterprise environments includes a daily full backup taken at a time of little database activity and transaction log backups taken at appropriate intervals during the business day. The frequency of log backups is discussed below.
Log Backup Schedule
The frequency of log backups is your answer to our original question, “How much data can you afford to lose?” The interval between transaction log backups determines the maximum amount of data that you could lose. That is why the frequency of log backups is a business decision, not a technical decision. In our experience, viable intervals between log backups range from 15 minutes to 4 hours depending on many factors specific to the business.
Here are the trade-offs. Frequent transaction log backups mean less vulnerability to loss at the cost of running the backup job more frequently and having more files to restore in case of disaster.
Less frequent backups increase your vulnerability but occur less often and create fewer files to restore. The total size of all log backup files is about the same in either case. The performance impact of frequent log backups is minimal.
About Recovery Modes
To make transaction log backups, the database must be running in FULL or BULK-LOGGED recovery mode which cause the log to retain transaction records until they are removed by the log backup. BULK-LOGGED recovery mode has some additional complexities we won’t deal with in a simple plan so we assume your business databases are running in FULL recovery mode.
Putting a database in either of these modes without scheduling regular log backups will cause the log file to grow continually from the accumulation of transaction records. This can cause a database to stop if the log disk runs out of space. Do not put the database in one of these recovery modes without also scheduling transaction log backups.
Vulnerability to Data Loss
In the best case, using this plan will allow you to recover the database up to the point of failure with zero data loss. However recovery to the point of failure depends on the transaction log file itself still being accessible after a failure of the data file. If the log file is not accessible after a disaster, then recovery is limited to the time of the most recent transaction log backup.
To improve the chances of having a usable log file, keep your log file on a separate physical disk from the data file so the failure of a single disk won’t destroy both files. As an added benefit, separating the two files often results in a performance boost.
Don’t go to all this trouble then lose the backup files
Backup files should be stored on a different physical drive than the database data files. This protects against a single disk failure causing the loss of both the database and the database backups. This seems obvious but you might be surprised at how often I see data and backup files on the same disk.
Backup files must also be archived to media or to other reliable storage. Offsite storage of archive media is highly recommended for greater safety. On-line backup services are also worth considering.
The Recovery Process
Understanding the recovery process can help you make decisions about backup file storage and retention.
Recovering a database from this kind of backup plan involves restoring the most recent full database backup and ALL subsequent transaction log backups in chronological order. Normally, SQL Server will know the folder where you backed up the files and restore them in the proper order without you having to sequence them individually.
Log backup files are linked together like a chain. If a log backup file is missing from the chain, recovery stops at that point. Log backups subsequent to the missing file cannot be restored.
Normally during recovery you will want to restore all of the lost data. However, there is a scenario where you would not want to do that. Imagine that a large amount of important data was accidentally or maliciously deleted from your database at 10:00. You discover the problem at 11:00. A complete restore up to 11:00 would include the deletion of the data.
If you are restoring from transaction logs you have the option of restoring to a specific point in time. In this case you would probably want to restore the database up to 9:59, just before the deletion. this is an important benefit of using log backups.
If rapid recovery is an issue, we recommend keeping a copy of the most recent full backup and all subsequent transaction log backups on a local disk or array as well as archiving copies to one or more locations. It can take considerable time to retrieve files from an archive so having local copies will get your business up and running more quickly in scenarios where the local backup files remain available.
Don’t Drop the Ball
While many data disasters occur through lack of a plan, others occur by failing to monitor the execution of the plan. You must have a feedback loop of some sort that lets you know when a component of the plan fails. In sql server you can monitor the error log or set up alerts to be sent when a failure occurs.
It is also important to assign responsibility for backups to a specific person or group. When I am doing a post mortem on a data disaster I often find that a perfectly good backup plan had been failing unnoticed for months because of some simple issue like a password change. Consequently there was no current backup when the failure occurred. I very frequently hear “I thought he was taking care of it.” or “The person who set it up left the company. We don’t know anything about it.” Don’t let that happen in your organization.
Database Consistency Check
A consistency check should be scheduled no less frequently than weekly for each line-of-business database. Setting this up is simply a matter of setting up a schedule for it in the Maintenance Plan wizard. In addition, the DBA should examine the results of this process in the SQL Error log or schedule an alert to be sent if data corruption is encountered. Often data corruption can be easily repaired if detected early.
This step is very important because data corruption sometimes goes unnoticed until it gets to a critical point and the database fails. Sometimes the only option then is to restore from the “last known good” database backup. If corruption has been accumulating for a long time, the last known good backup could be days, weeks or months old.
Rapid Failover and High Availability
We have covered only the topic of data recoverability via backups in this short article. However, a long service outage can be nearly as devastating to a business as data loss. In the September 2009 issue we discussed several SQL Server technologies that provide assurance of high availability and rapid failover. You can read that article here.
High Availability and Redundancy Solutions
Remember however that the technologies dicussed in the article give added assurance of availability but they do not replace the need for a backup plan such as we have described.
Conclusion
Your data is your company’s most valuable asset. Keeping it safe is a business priority of the highest order. It is too easy to put off creating a bullet-proof plan because the need is never urgent until the database fails. And then it is too late.