The number one responsibility that a DBA has is to keep the data safe. Many things can harm your data but the one we are going to talk about now is data corruption.
Since this article is written for beginner’s and intermediate level DBA’s it will not be a deep dive. It will cover the most common mistakes that are made when a database must be recovered from corruption.
Almost every DBA will eventually have to recover a corrupt database. It can be a harrowing experience if it is your first time. The pressure is on and everyone is looking over your shoulder waiting for you to save the day.
Experienced DBA’s put in place SQL backup maintenance jobs as well as a data consistency check job. Some believe these jobs will keep them safe from failure. They are certainly necessary but when the problem is caused by a hardware failure resulting in data corruption these jobs sometimes cannot solve the problem alone.
Check Your Setting
Your database page_ verify option should be set to CHECKSUM. There is no doubt this is the best setting for minimizing severe page corruption.
We sometimes see Page Verify Option set to the archaic TORN_PAGE_DETECTION setting when database instances have been upgraded from earlier SQL versions.
Upgrading an instance does not automatically change the Page Verify Option, you have to do it yourself. here is the query to see the current state. I
select name, page_verify_option, page_verify_option_desc from sys.databases
Recovery in Simple Mode
Your first option is to restore from a recent backup. If your DB is in SIMPLE recovery mode you have to restore from the last-known-good backup. That could cause an unknown amount of data loss but is sometimes your only option.
Recovery in Full Recovery Mode
Full recovery mode gives you a method to restore without data loss as long as you are running transaction log backups, and the back up files are not not damaged by the hardware failure.
If you are in FULL recovery mode you have to run periodic tran log backups. If not, you will be faced with severe transaction log growth. I find that many inexperienced DBA’s do not know this and put the database in SIMPLE mode just to avoid transaction log growth.
They don’t realize that they are limiting their options to recover the database with little or no data loss.
There are better options to control log file growth.
Restoring from a recent backup is much easier if you have practiced recovery before hand. You need to practice recovery until you are comfortable with doing it and you have found the inevitable glitches that appear in the process and fixed them.
DBCC CheckDB
You should be running this built in maintenance job that finds corruption. Many of my clients run it weekly but if you have a maintenance window large enough you can run it more frequently. The sooner you detect corruption the better your chances are to fix it.
If you run CHECKDB and it finds corruption you may receive a message about running CHECKDB with the Allow_Data_Loss option. If SO, STOP RIGHT HERE. Here is a page that you must read before proceeding with the Allow Data_Loss option.
The page is from Paul Randal, former member of the SQL Server team at Microsoft. He wrote the code for the DBCC commands. He likens the Allow Data_Loss option to “taking a chainsaw to your database“. It could could fix things or it could destroy the database. You never know. In any case it is the very last option to try.
I hope that this newsletter will remind you to think about recovery from corruption and how you can prepare before it happens. If you have any questions, contact me.
This being an intermediate level subject I don’t mention 3rd party database backup solutions but they are out there if you have the budget.