70-764 – SysadminDork

Skill 2.3 – Manage database integrity

Databases can get corrupted by hardware problems or virtualization issues
Microsoft recommends restoring instead of repairing.

Implement database consistency checks

DBCC CHECKALLOC – checks disk space allocation
DBCC CHECKCATALOG – check consistency of system tables in a database
DBCC CHECKDB – checks logical and physical integrity of all objects in the DB
DBCC CHECKFILEGROUP – checks the allocation and physical integrity of all tables and indexed views in a filegroup
DBCC CHECKTABLE – checks allocation and structural integrity of a table
- Does not check in-memory tables
DBCC TABLOCK – will lock tables instead of creating a snapshot
- also happens if disk is formatted with FAT or ReFS

Identify database corruption

The database engine logs corruption in few places. Check them regularly

ERRORLOG captures any DBCC CHEC
Windows event log (application) captures DBCC errors
[msdb] system database had a [suspect_tables] table
dump file in the LOG directory SQLDUMPnnnn.txt
Error 823 – Cyclic redundancy check – mean disk troubles. Set the SQL agent to alert you on these
Error 824 – Logical consistency check – storage trouble of some sort
Error 825 -Read Retry error – means the DB had to re-try a read or write. Fails after 4 tries
Error 832 – corrupt memory page – might mean bad memory

Recover from database corruption

The DBCC will suggest a course of action. If the corruption is in a nonclustered index, just drop and rebuild the index. Otherwise just restore the DB

DBCC Error States:

0 – Error 8930 was raised. corrup metadata
1 – error 8967 was raised. internal DBCC error
2 – failure during emergency mode database repair
3 – Corrupt metadata
4 – assert or acess violation
5 – unknown error

REPAIR_ALLOW_DATA_LOSS – absolute last resort. Have resume up to date.

Skill 2.2 – Restore Databases

Design a Restore Strategy

In case of disaster:

Do not panic
Think about people, processes, and recovery assets
Stop access to the DB
See if there is a tail-log
backup the tail-log (if you can)
access damage / corruption
Assess whether a restore strategy is required
Assess whether point-in-time recovery is required
Assess what backups exist
Verify backups are current and valid
Plan restore strategy
Initiate restore process

RESTORE command takes some arguments:

FILELISTONLY – list of database and log files contained in the backup
HEADERONLY – returns backup header info for all backup sets
LABELONLY – lists media information
VERIFYONLY – verifies the backup, but doesn’t restore it
MOVE – relocates the database’s files
NORECOVERY -will not roll back uncommitted transactions
RECOVERY – rolls back uncommitted transactions
REPLACE – bypass internal safety check. Overwrites without mercy
STANDBY – creates a second transaction log. allows database to be read-only so you can inspect between log restores

Restore a Database

Restore your fulls and differentials with NORECOVERY, then restore the last part with RECOVERY to bring it up to the last transaction.

Revert a database snapshot

Reverting a snapshot overwrites the files, the logs, everything. Do a log backup first.

USE master;
GO
RESTORE DATABASE [WHATEVERDATABASE]
FROM DATABASE_SNAPSHOT = 'WHATEVERDATABASE_2018_12_05'

Perform Piecemeal restores

Lets you restore the database in stages.
Only works with multiple filegroups
You can gets parts of the DB online while restoring less important parts

Perform Page Recovery

The database engine keeps track of corrupt pages in the msdb suspect_pages table. Assuming the DB page verification is set to CHECKSUM or TORN_PAGE_VERIFICATION.

If you only have a few corrupt pages, you can just restore those instead of the whole database. If the torn pages are in an index, just rebuild the index.

USE [msdb];
GO
-- find corrupted pages
SELECT database_id' file_id' page_id' event_type' error_count' last_update_date
FROM dbo.suspect_pages
WHERE database_id = DB_ID('WorldWideImporters');
GO
-- restore 4 corrupt pages
USE [master];
GO
RESTORE DATABASE [WorldWideImporters] PAGE='1:300984' 1:300811' 1:280113' 1:170916'
     FROM WideWorldImporters_Backup_Device
     WITH NORECOVERY;
GO

Perform point-in-time recovery

Any transactions after the point in time will be lost. This is handy in case a user accidentally deleted a table or something. Not commonly used in real life. Do a point in time restore to a non-prod database

Three ways to do a PiT restore:

Date/Time – a specific date and time
Log Sequence Number (LSN) – an internal numbering sequence for every operation in the transaction log
Marked Transaction – an explicitly named transaction

STOPAT –
STOPATMARK- rolls the log forward to the mark, and includes the marked transaction
STOPBEFOREMARK – rolls the log forward to the mark, excludes the marked transaction

RESTORE LOG WorldWideImporters
     FROM WorkdWideImporters_Log_Backup_Device
     WITH FILE=6' RECOVERY'
     STOPAT = 'Sep 17' 2016 2:30 AM';

Restore a filegroup

Backup the tail-log
Restore the file group
Restore and different backups
restore the log-chain to bring the filegroup in sync with the rest of the files in the DB
Stop at a point in time if necessary
Recover the filegroup and bring it online

Can be done online in Enterprise Edition, must be done offline in others.

Develop a plan to automate and test restores

You need to know how long your restores take, so you know if you can meet your SLAs

Skill 2.1 – Develop a Backup Strategy

Design a backup strategy

Gather requirements. What level of loss is acceptable? How quickly do we need to recover? Which databases are more important than others?

A backup strategy consists of a set of backup operations and a schedule or frequency of when the backup operations will be performed.

Stakeholders need to define the three main requirements:

Recovery Time Objective (RTO) – maximum downtime after an incident
Recovery Point Objective (RPO) – maximum acceptable data loss
Recovery Level Objective (RLO) – the granularity of the data that needs to be restored. (Server / Database / tables, etc.)

These numbers will make up part of your SLA. You need to test restores to know how long they take and if you can really meet your RTO. You may need to use different backup types to get the restores you need.

Things to consider

Size of the database – bigger takes longer
DB file structure – multiple files gives flexibility
Speed & Throughput of network / storage /processor subsystems
Volume and size of data modifications in the database – need to keeps tranaction log growth under control
Type of data modifications – Updates v Inserts – Updates don’t grow the logs the same as inserts?
Data Compressibility – compression uses CPU. If data is not compressible, why waste CPU on compression
Is point-in-time recovery required? If so, you have to do log backups.
Recovery Model – simple recovery model doesn’t allow for log backups
Is the data important? – Maybe don’t backup test data?

Back up databases

Backup Types

Full – The entire database and any changes made during the backup
Differential – Everything between the last full and when the differential was started.
Log – Everything since the lat log backup
File – a file or file group that maes up the database
Partial – Like a Full, except it excludes read-only filegroups. It always includes the primary filegroup
Tail-Log – all transaction log records not backed up since the last log backup. This is usually the first step in a DR process

Backup Destinations

Backup device – disk or tape
Media Family – single non-mirrored device, or a set of mirrored devices in a media set
Backup set – a successful backup’s content
Media Set – a set of backup media that contains one or more backup sets
Log sequence number (LSN) – numbering sequence inside the transaction log
Sequence Number – order of the pyysical media in a media family. Media families are also numbered within a media set

There is a header on each backup device listing media name, number of media families, sequence numbers, etc.

Full backups are important. If your full gets corrupted, all the differentials in the world won’t save you. A full backup consists of:

Checkpoint – flushes buffers
Record backup start LSN – records the LSN where the backup started
Backup Data – backs up all the extents (8 8k pages) from all the data files in the db
Record backup end LSN – records the LSN at the end of the backup
Calculate minimum LSN – determines the minimum LSN required for the log backup
Backup log –

Differential backups – get bigger the longer you go without a full backup. Restore times increase as well. Does not support point-in-time recovery.

Log Backups (incremental) – frees up space in the transaction log and support point-in-time recovery

File Backups – used when a multi-file db is really large and you want to backup different files at different times.

Partial Backups – by default only backs up read-write file groups and the default file group

Backup Options

CHECKSUM | NO_CHECKSUM – makes sure each page does not have a checksum or torn page error. also generates a checksum for the entire backup
CONTINUE_AFTER_ERROR | STOP_ON_ERROR – tells the DB engine what to do if there is a checksmum error
COPY_ONLY – does not change the backup sequence number
COMPRESSION | NO_COMPRESSION – compress the backup
DESCRIPTION – describe the backup
ENCRYPTION | NO_ENCRYPTION – encrypt the backup? Default is no encryption
EXPIREDATE | RETAIN_DAYS – when can this backup be overwritten. RETAINDAYS takes priority
FORMAT | NO FORMAT – creates a new media set. destructive
INIT | NOINIT – INIT means all backup sets should be overwritten
SKIP | NOSKIP – ignore the safety check that checks the backup set’s expiration before overwriting it
STATS – controls the percentage intervals at which you get feedback on the progress of the backup. Default is 10
MIRROR TO – mirror the backup to another device

Perform database snapshots

Snapshots are read-only point in time files separate from the database file. Work at the page level. Whenever a page is modified, the non modified version is written to the snapshot. They depend on the underlying database. If you loose the parent DB, they are worthless.

Backup databases to Azure

SQL Server Managed Backup to Microsoft Azure – Uses Azure storage blobs. Automatically backs things up through stored procedures. Does fulls every week or every 1GB of log growth. does log backups every tweo hours or every 5MB of log growth.

Backup a VLDB (Very Large Database)

A VLDB is so big that you can’t meet SLA’s even if you throw hardware at it.

Tricks to shrink backup times on non-VLDBs

Compress tables to make them smaller
columnstore indexes can decrease the size of tables
move old data out to an archive database
use backup compression to reduce the size of the backup
backup to multiple devices – can run 64 concurrent threads, backup to 64 devices

Tricks to shrink backup times on VLDBs

Use Filegroups, multiple data files
Backup different files on different nights
Put files on separate disks
Don’t put anything inside the primary data file for the DB
partial availability – DB can be up even when some files are not

Manage transaction log backups

Recovery Models

Full – every transaction fully logged. Log backups have no dependency on DB files

can recover up to a point in time
can recover to a marked transaction
can restore individual pages
needed for availability groups, database mirroring, and log shipping

Bulk-Logged – records only the extents modified. Need files. Not for production use

Minimally logs bulk operations
No point-in-time recovery
Otherwise like full
Use when you want to do a big operation, but don’t have log space for it

Simple – no transaction log backups are possible. Not for production use

No log management – you can loose everything since the last full or differential

You can switch recovery models whenever you want, but it breaks the log backup chain.

You should backup the transaction logs. Here are options you should never use, but will be on the test.

NO_TRUNCATE – don’t truncate the log at the end of the backup
NORECOVERY – backs up the tail log and leaves the DB in the RESTORING state
STANDBY – backs up the tail of the log and leaves the database in a read-only and STANDBY state

You can use NVDIMM (storage class memory) since 2016 SP1 to hold the transaction log cache

Configure backup automation

You can schedule stuff from the “SQL Server Agent” node in Management Studio
You can also do a maintenance plan
- Backups
- Check Database Integrity
- Clean up History
- Execute SQL Server Agent Job
- Execute T-SQL Statement Task
- Maintenance Cleanup Task
- Notify Operator Task
- Rebuild Index
- Shrink Database – not for prod. fragments tables and logs
- Update Statistics