Skill 2.3 – Manage database integrity

Databases can get corrupted by hardware problems or virtualization issues
Microsoft recommends restoring instead of repairing.

Implement database consistency checks

  • DBCC CHECKALLOC – checks disk space allocation
  • DBCC CHECKCATALOG – check consistency of system tables in a database
  • DBCC CHECKDB – checks logical and physical integrity of all objects in the DB
  • DBCC CHECKFILEGROUP – checks the allocation and physical integrity of all tables and indexed views in a filegroup
  • DBCC CHECKTABLE – checks allocation and structural integrity of a table
    • Does not check in-memory tables
  • DBCC TABLOCK – will lock tables instead of creating a snapshot
    • also happens if disk is formatted with FAT or ReFS

Identify database corruption

The database engine logs corruption in  few places.  Check them regularly

  • ERRORLOG captures any DBCC CHEC
  • Windows event log (application) captures DBCC errors
  • [msdb] system database had a [suspect_tables] table
  • dump file in the LOG directory SQLDUMPnnnn.txt
  • Error 823 – Cyclic redundancy check – mean disk troubles.  Set the SQL agent to alert you on these
  • Error 824 – Logical consistency check – storage trouble of some sort
  • Error 825 -Read Retry error – means the DB had to re-try a read or write.  Fails after 4 tries
  • Error 832 – corrupt memory page – might mean bad memory

Recover from database corruption

The DBCC will suggest a course of action.  If the corruption is in a nonclustered index, just drop and rebuild the index.  Otherwise just restore the DB

DBCC Error States:

  • 0 – Error 8930 was raised.  corrup metadata
  • 1 – error 8967 was raised. internal DBCC error
  • 2 – failure during emergency mode database repair
  • 3 – Corrupt metadata
  • 4 – assert or acess violation
  • 5 – unknown error

REPAIR_ALLOW_DATA_LOSS – absolute last resort.  Have resume up to date.

 

 

Skill 2.2 – Restore Databases

Design a Restore Strategy

In case of disaster:

  1. Do not panic
  2. Think about people, processes, and recovery assets
  3. Stop access to the DB
  4. See if there is a tail-log
  5. backup the tail-log (if you can)
  6. access damage / corruption
  7. Assess whether a restore strategy is required
  8. Assess whether point-in-time recovery is required
  9. Assess what backups exist
  10. Verify backups are current and valid
  11. Plan restore strategy
  12. Initiate restore process

RESTORE command takes some arguments:

  1. FILELISTONLY – list of database and log files contained in the backup
  2. HEADERONLY – returns backup header info for all backup sets
  3. LABELONLY – lists media information
  4. VERIFYONLY – verifies the backup, but doesn’t restore it
  5. MOVE – relocates the database’s files
  6. NORECOVERY -will not roll back uncommitted transactions
  7. RECOVERY – rolls back uncommitted transactions
  8. REPLACE – bypass internal safety check. Overwrites without mercy
  9. STANDBY – creates a second transaction log.  allows database to be read-only so you can inspect between log restores

 

Restore a Database

Restore your fulls and differentials with NORECOVERY, then restore the last part with RECOVERY to bring it up to the last transaction.

Revert a database snapshot

Reverting a snapshot overwrites the files, the logs, everything. Do a log backup first.

USE master;
GO
RESTORE DATABASE [WHATEVERDATABASE]
FROM DATABASE_SNAPSHOT = 'WHATEVERDATABASE_2018_12_05'

Perform Piecemeal restores

  • Lets you restore the database in stages.
  • Only works with multiple filegroups
  • You can gets parts of the DB online while restoring less important parts

Perform Page Recovery

The database engine keeps track of corrupt pages in the msdb suspect_pages table.  Assuming the DB page verification is set to CHECKSUM or TORN_PAGE_VERIFICATION.

If you only have a few corrupt pages, you can just restore those instead of the whole database.  If the torn pages are in an index, just rebuild the index.

USE [msdb];
GO
-- find corrupted pages
SELECT database_id' file_id' page_id' event_type' error_count' last_update_date
FROM dbo.suspect_pages
WHERE database_id = DB_ID('WorldWideImporters');
GO
-- restore 4 corrupt pages
USE [master];
GO
RESTORE DATABASE [WorldWideImporters] PAGE='1:300984' 1:300811' 1:280113' 1:170916'
     FROM WideWorldImporters_Backup_Device
     WITH NORECOVERY;
GO

Perform point-in-time recovery

Any transactions after the point in time will be lost.  This is handy in case a user accidentally deleted a table or something.  Not commonly used in real life.  Do a point in time restore to a non-prod database

Three ways to do a PiT restore:

  1. Date/Time – a specific date and time
  2. Log Sequence Number (LSN) – an internal numbering sequence for every operation in the transaction log
  3. Marked Transaction – an explicitly named transaction
  • STOPAT –
  • STOPATMARK- rolls the log forward to the mark, and includes the marked transaction
  • STOPBEFOREMARK – rolls the log forward to the mark, excludes the marked transaction
RESTORE LOG WorldWideImporters
     FROM WorkdWideImporters_Log_Backup_Device
     WITH FILE=6' RECOVERY'
     STOPAT = 'Sep 17' 2016 2:30 AM';

Restore a filegroup

  1. Backup the tail-log
  2. Restore the file group
  3. Restore and different backups
  4. restore the log-chain to bring the filegroup in sync with the rest of the files in the DB
  5. Stop at a point in time if necessary
  6. Recover the filegroup and bring it online

Can be done online in Enterprise Edition, must be done offline in others.

Develop a plan to automate and test restores

You need to know how long your restores take, so you know if you can meet your SLAs

 

Skill 2.1 – Develop a Backup Strategy

Design a backup strategy

Gather requirements. What level of loss is acceptable? How quickly do we need to recover? Which databases are more important than others?

A backup strategy consists of a set of backup operations and a schedule or frequency of when the backup operations will be performed.

Stakeholders need to define the three main requirements:

  • Recovery Time Objective (RTO) – maximum downtime after an incident
  • Recovery Point Objective (RPO) – maximum acceptable data loss
  • Recovery Level Objective (RLO) – the granularity of the data that needs to be restored. (Server / Database / tables, etc.)

These numbers will make up part of your SLA. You need to test restores to know how long they take and if you can really meet your RTO. You may need to use different backup types to get the restores you need.

Things to consider

  • Size of the database – bigger takes longer
  • DB file structure – multiple files gives flexibility
  • Speed & Throughput of network / storage /processor subsystems
  • Volume and size of data modifications in the database – need to keeps tranaction log growth under control
  • Type of data modifications – Updates v Inserts – Updates don’t grow the logs the same as inserts?
  • Data Compressibility – compression uses CPU. If data is not compressible, why waste CPU on compression
  • Is point-in-time recovery required? If so, you have to do log backups.
  • Recovery Model – simple recovery model doesn’t allow for log backups
  • Is the data important? – Maybe don’t backup test data?

Back up databases

Backup Types

  • Full – The entire database and any changes made during the backup
  • Differential – Everything between the last full and when the differential was started.
  • Log – Everything since the lat log backup
  • File – a file or file group that maes up the database
  • Partial – Like a Full, except it excludes read-only filegroups. It always includes the primary filegroup
  • Tail-Log – all transaction log records not backed up since the last log backup. This is usually the first step in a DR process

Backup Destinations

  • Backup device – disk or tape
  • Media Family – single non-mirrored device, or a set of mirrored devices in a media set
  • Backup set – a successful backup’s content
  • Media Set – a set of backup media that contains one or more backup sets
  • Log sequence number (LSN) – numbering sequence inside the transaction log
  • Sequence Number – order of the pyysical media in a media family. Media families are also numbered within a media set

There is a header on each backup device listing media name, number of media families, sequence numbers, etc.

Full backups are important. If your full gets corrupted, all the differentials in the world won’t save you. A full backup consists of:

  • Checkpoint – flushes buffers
  • Record backup start LSN – records the LSN where the backup started
  • Backup Data – backs up all the extents (8 8k pages) from all the data files in the db
  • Record backup end LSN – records the LSN at the end of the backup
  • Calculate minimum LSN – determines the minimum LSN required for the log backup
  • Backup log –

Differential backups – get bigger the longer you go without a full backup. Restore times increase as well. Does not support point-in-time recovery.

Log Backups (incremental) – frees up space in the transaction log and support point-in-time recovery

File Backups – used when a multi-file db is really large and you want to backup different files at different times.

Partial Backups – by default only backs up read-write file groups and the default file group

Backup Options

  • CHECKSUM | NO_CHECKSUM – makes sure each page does not have a checksum or torn page error. also generates a checksum for the entire backup
  • CONTINUE_AFTER_ERROR | STOP_ON_ERROR – tells the DB engine what to do if there is a checksmum error
  • COPY_ONLY – does not change the backup sequence number
  • COMPRESSION | NO_COMPRESSION – compress the backup
  • DESCRIPTION – describe the backup
  • ENCRYPTION | NO_ENCRYPTION – encrypt the backup? Default is no encryption
  • EXPIREDATE | RETAIN_DAYS – when can this backup be overwritten. RETAINDAYS takes priority
  • FORMAT | NO FORMAT – creates a new media set. destructive
  • INIT | NOINIT – INIT means all backup sets should be overwritten
  • SKIP | NOSKIP – ignore the safety check that checks the backup set’s expiration before overwriting it
  • STATS – controls the percentage intervals at which you get feedback on the progress of the backup. Default is 10
  • MIRROR TO – mirror the backup to another device

Perform database snapshots

Snapshots are read-only point in time files separate from the database file. Work at the page level. Whenever a page is modified, the non modified version is written to the snapshot. They depend on the underlying database. If you loose the parent DB, they are worthless.

Backup databases to Azure

SQL Server Managed Backup to Microsoft Azure – Uses Azure storage blobs. Automatically backs things up through stored procedures. Does fulls every week or every 1GB of log growth. does log backups every tweo hours or every 5MB of log growth.

Backup a VLDB (Very Large Database)

A VLDB is so big that you can’t meet SLA’s even if you throw hardware at it.

Tricks to shrink backup times on non-VLDBs

  • Compress tables to make them smaller
  • columnstore indexes can decrease the size of tables
  • move old data out to an archive database
  • use backup compression to reduce the size of the backup
  • backup to multiple devices – can run 64 concurrent threads, backup to 64 devices

Tricks to shrink backup times on VLDBs

  • Use Filegroups, multiple data files
  • Backup different files on different nights
  • Put files on separate disks
  • Don’t put anything inside the primary data file for the DB
  • partial availability – DB can be up even when some files are not

Manage transaction log backups

Recovery Models

Full – every transaction fully logged. Log backups have no dependency on DB files

  • can recover up to a point in time
  • can recover to a marked transaction
  • can restore individual pages
  • needed for availability groups, database mirroring, and log shipping

Bulk-Logged – records only the extents modified. Need files. Not for production use

  • Minimally logs bulk operations
  • No point-in-time recovery
  • Otherwise like full
  • Use when you want to do a big operation, but don’t have log space for it

Simple – no transaction log backups are possible. Not for production use

  • No log management – you can loose everything since the last full or differential

You can switch recovery models whenever you want, but it breaks the log backup chain.

You should backup the transaction logs. Here are options you should never use, but will be on the test.

  • NO_TRUNCATE – don’t truncate the log at the end of the backup
  • NORECOVERY – backs up the tail log and leaves the DB in the RESTORING state
  • STANDBY – backs up the tail of the log and leaves the database in a read-only and STANDBY state

You can use NVDIMM (storage class memory) since 2016 SP1 to hold the transaction log cache

Configure backup automation

  • You can schedule stuff from the “SQL Server Agent” node in Management Studio
  • You can also do a maintenance plan
    • Backups
    • Check Database Integrity
    • Clean up History
    • Execute SQL Server Agent Job
    • Execute T-SQL Statement Task
    • Maintenance Cleanup Task
    • Notify Operator Task
    • Rebuild Index
    • Shrink Database – not for prod.  fragments tables and logs
    • Update Statistics