SlideShare a Scribd company logo
1 of 33
Data Recovery & Consistency with CHECKDBwith SQL Server Vinod Kumar Technology Evangelist  - Microsoft @vinodk_sql www.ExtremeExperts.com http://blogs.sqlxml.org/vinodkumar
Why Is This Session Important? Corruption does happen, mostly caused by IO subsystem People don’t realize they have corruption until too late People don’t know what to do when they do have corruption, leading to: More data loss and downtime than necessary Monetary and even job losses
What Can Happen to an Unprepared DBA Confronted by Corruption?
Session Takeaways From this session you will CHECKDB Significance Guidance and options after corruption Getting database online  Distinguish Repair VS Restore DON’T TRY this on your  Production Environments
Agenda Discovering corruption Interpreting CHECKDB output Choosing between restore and repair Recovering from a ‘last resort’ With demos of common scenarios
I/O Errors Three types 823 (a hard I/O error) 824 (a soft I/O error) 825 (a read-retry error) Nice error messages in 2005+ Msg 824, Level 24, State 2, Line 1 	SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7232c940; actual: 0x720e4940). It occurred during a read of page (1:143) in database ID 8 at offset 0x0000000011e000 in file 'c:roken.mdf'. Additional messages in the SQL Server error log or  system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. Logged in msdb..suspect_pages Input into single-page restore operations
Page Protection Options SQL Server allows pages to be ‘protected’ on disk from corruptions Allows fast detection of corruptions Set using ALTER DATABASE SET PAGE_VERIFY <option> Three options: NONE TORN_PAGE_DETECTION CHECKSUM
DBCC CHECKDB The only way to read all allocated pages in the database Use to force page checksums to be checked Choose between full checks and WITH PHYSICAL_ONLY Many algorithms to minimize runtime and run ONLINE since SQL Server 2000 Blog post series: http://www.sqlskills.com/blogs/paul/category/CHECKDB-From-Every-Angle.aspx
First Hints That Something Is Wrong… Application/user connections get broken Users report 823 or 824 errors ‘Hard’ and ‘Soft’ IO errors Backup jobs start failing Error 3043 – backup detected checksum errors Agent alerts start firing Should have alerts on all errors with severity >= 19 Should have an alert on error 825 Informational (!) message that there are transient IO problems Maintenance jobs start failing
As Soon As Corruption Is Suspected… No need to panic! Determine the extent of the corruption Run DBCC CHECKDB Look in the SQL Server error log  Check maintenance job history Check what backups are available Wait for CHECKDB to finish before doing anything else You many not NEED to do anything intrusive/destructive
How To Run DBCC CHECKDB By default, CHECKDB will: Only return the first 200 errors Return lots of info that’s distracting in a corruption situation Use the following command with only these options: DBCC CHECKDB (<<yourdb>>) WITH ALL_ERRORMSGS, NO_INFOMSGS If it’s taking longer than usual, that should mean that it found some corruption Check the error log for message 5268 from SQL Server 2005 SP2 onwards to see if it’s rescanning some data Most importantly, wait for it to complete!
Interpreting CHECKDB Output (1) So, CHECKDB completes and you have a bunch of cryptic error messages. Now what? There are over 150 errorsthat CHECKDB can output, some with over 200 states Figuring out what one error means isn’t too bad MSDN has most of them published for reference There are some tips and tricks you can use…
Interpreting CHECKDB Output (2) Did CHECKDB fail? If it stops before completing successfully, something bad has happened that is preventing CHECKDB from running This means there is no choice but to restore from a backup as CHECKDB cannot be forced to run (and hence repair) Examples of fatal (to CHECKDB) errors 7984 – 7988: corruption in critical system tables 8967: invalid states within CHECKDB itself 8930: corrupt metadata in the database such that CHECKDB could not run See ‘Understanding DBCC Error Messages’ in the BOL for DBCC CHECKDB for more details
Interpreting CHECKDB Output Example fatal errors to CHECKDB demo
Interpreting CHECKDB Output (3) Are the corruptions only in non-clustered indexes? If recommended repair level is REPAIR_REBUILD, then YES! Otherwise, check all the index IDs in the errors – if they’re all greater than 1, then YES! If YES, you *could* just rebuild the corrupt indexes Depends on the error, and the size of the index But, what caused the corruption? If you just rebuild the indexes, the corruption will probably happen again (especially if caused by the IO subsystem) Make sure you do root-cause analysis and take preventative measures
Interpreting CHECKDB Output Non-clustered index corruption only demo
Interpreting CHECKDB Output (4) Was there an un-repairable error found? 8909, 8938, 8939 (page header corruption) errors where the type is ‘PFS’ 8970 error: invalid data for the column type 8992 error: CHECKCATALOG (metadata mismatch) error Plus a few more obscure ones E.g. an 8904 error (extent is allocated to two objects). This is usually repairable except in the case where the extent is marked as mixed and dedicated, and has pages allocated to multiple objects. The repair is too complicated and/or destructive so is not attempted. None of these can be automatically repaired But if you don’t have a backup without these corruptions, you may be able to fix the 8970 and 8992 errors…
Interpreting CHECKDB Output Manually repairing an invalid data value (2570) in SQL Server 2005+ demo
Interpreting CHECKDB Output Manually repairing a metadata corruption (8992) in SQL Server 2005+ demo
Recovering Using Backups Best way to avoid data loss Not necessarily the best way to avoid downtime Depends what kind of backups are available Although backup compression in SQL Server 2008 helps… Plethora of options available Full database backup is a good starting point Series of transaction log backups as well is much better Beyond the scope of this session… Remember: Backups have to exist to be useful Backups have to be valid to avoid data loss
Choosing Between Restore and Repair (1) Multiple decision points that could short-circuit the decision process Do you still have a database? No – you must restore from a backup Do you have working backups? No – you must use repair, or restore a damaged backup with CONTINUE_AFTER_ERROR, or extract data to a new database Is the log damaged? Yes – you must restore, or run emergency mode repair, or extract to a new database
Choosing Between Restore and Repair (2) Did CHECKDB fail? Yes – you must restore or extract Is it just non-clustered indexes that are damaged? Yes – maybe rebuild them manually Are there any un-repairable errors? Yes – you must restore or extract If you’re still able to make a repair/restore choice: Consider your down-time and data-loss Service Level Agreements Use whichever option you can which allows you to limit down-time and data-loss while still staying within the SLAs
Repair vs. Restore Manually repairing a single page corruption with and without backups demo
Beware of REPAIR_ALLOW_DATA_LOSS Repair fixes structural inconsistencies by de-allocating (Not REPAIR_REBUILD, but indexes should be fixed manually) This is the fastest and most provably correct way Repair doesn’t take into account: Foreign-key constraints Inherent business logic and data relationships Replication (see BOL for DBCC CHECKDB) Before running repair, protect yourself Take a backup and quiesce replication topologies involved After running repair, check the data Consider running DBCC CHECKCONSTRAINTS Fix up any replication topologies involved
What If the Log Is Damaged? Without a backup, two realistic choices: Use EMERGENCY mode to access the data in the corrupt state E.g. to extract to another database ALTER DATABASE mydb SET EMERGENCY; Use EMERGENCY mode repair New feature of SQL Server 2005 Rebuilds the log and runs REPAIR_ALLOW_DATA_LOSS as an atomic operation Database must be in EMERGENCY *and* SINGLE_USER This is the 3rd worst state to be in
Things That People Often Try *First* Restart SQL Server Just wastes time and delays getting back online Immediately jump to a last resort and cause data loss without working through options Running repair Rebuilding the transaction log Detach a suspect database It will fail to attach again – now the situation is even worse! This is the 2nd worst state to be in However, there’s a trick you can use…
Repairing a Suspect Database How to hack a detached suspect database back into the system and repair it demo
What If You Don't Have a Database At All *OR* Any Kind of Backup to Restore From? Total data loss - *this* is the worst state to be in You might have no choice apart from manual re-entry, or URLC Update Resume, Leave City 
Summary: Pulling It All Together Know the signs of corruption When corruption occurs, be methodical: Figure out the extent of the corruption Figure out your options to limit downtime, data loss, or both If you’re going to run repair, take a backup first Fix the corruption Finish with root-cause analysis Test all of this before you have to do it for real Good luck!
Resources (Paul's Blog) Example corrupt databases to play with http://www.sqlskills.com/blogs/paul/post/Example-20002005-corrupt-databases-and-some-more-info-on-backup-restore-page-checksums-and-IO-errors.aspx Everything you ever wanted to know about CHECKDB http://www.sqlskills.com/blogs/paul/category/CHECKDB-From-Every-Angle.aspx Tips and tricks for interpreting CHECKDB output http://www.sqlskills.com/blogs/paul/post/CHECKDB-From-Every-Angle-Tips-and-tricks-for-interpreting-CHECKDB-output.aspx Log rebuilding and repair http://www.sqlskills.com/blogs/paul/post/Corruption-Last-resorts-that-people-try-first.aspx Page checksums and SQLIOSim http://www.sqlskills.com/blogs/paul/post/How-to-tell-if-the-IO-subsystem-is-causing-corruptions.aspx EMERGENCY mode repair http://www.sqlskills.com/blogs/paul/post/CHECKDB-From-Every-Angle-EMERGENCY-mode-repair-the-very-very-last-resort.aspx
આભાર ধন্যবাদ நன்றி धन्यवाद ಧನ್ಯವಾದಗಳು ధన్యవాదాలు ଧନ୍ୟବାଦ ਧੰਨਵਾਦ നിങ്ങള്‍‌ക്ക് നന്ദി
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.  Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.  MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related Content

Viewers also liked

Microservices in LoCloud
Microservices in LoCloud Microservices in LoCloud
Microservices in LoCloud locloud
 
Savvis - Rising to the Challenge (2009)
Savvis - Rising to the Challenge (2009)Savvis - Rising to the Challenge (2009)
Savvis - Rising to the Challenge (2009)Telstra_International
 
2. importancia de hangouts en el mundo laboral
2. importancia de hangouts en el mundo laboral2. importancia de hangouts en el mundo laboral
2. importancia de hangouts en el mundo laboralyoliima
 
Dynamics and partnerships with local associations involved in LoCloud: a case...
Dynamics and partnerships with local associations involved in LoCloud: a case...Dynamics and partnerships with local associations involved in LoCloud: a case...
Dynamics and partnerships with local associations involved in LoCloud: a case...locloud
 
Make It Happen Keynote
Make It Happen KeynoteMake It Happen Keynote
Make It Happen KeynoteHeather Lister
 
4 E of corporate strategy
4 E of corporate strategy 4 E of corporate strategy
4 E of corporate strategy Manish Chaurasia
 
Hipotesis Grafica
Hipotesis GraficaHipotesis Grafica
Hipotesis Graficaleopa85
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyAmrapali Zaveri, PhD
 
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...semanticsconference
 
11 ftm18 amendola
11 ftm18 amendola11 ftm18 amendola
11 ftm18 amendolaManh Pham
 
N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축
N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축
N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축Yoojoo Jang
 

Viewers also liked (18)

3.lekcija
3.lekcija3.lekcija
3.lekcija
 
Microservices in LoCloud
Microservices in LoCloud Microservices in LoCloud
Microservices in LoCloud
 
Savvis - Rising to the Challenge (2009)
Savvis - Rising to the Challenge (2009)Savvis - Rising to the Challenge (2009)
Savvis - Rising to the Challenge (2009)
 
2. importancia de hangouts en el mundo laboral
2. importancia de hangouts en el mundo laboral2. importancia de hangouts en el mundo laboral
2. importancia de hangouts en el mundo laboral
 
Dynamics and partnerships with local associations involved in LoCloud: a case...
Dynamics and partnerships with local associations involved in LoCloud: a case...Dynamics and partnerships with local associations involved in LoCloud: a case...
Dynamics and partnerships with local associations involved in LoCloud: a case...
 
Make It Happen Keynote
Make It Happen KeynoteMake It Happen Keynote
Make It Happen Keynote
 
4 E of corporate strategy
4 E of corporate strategy 4 E of corporate strategy
4 E of corporate strategy
 
Hipotesis Grafica
Hipotesis GraficaHipotesis Grafica
Hipotesis Grafica
 
CORPORATE STRATEGY PPT
CORPORATE STRATEGY PPTCORPORATE STRATEGY PPT
CORPORATE STRATEGY PPT
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
 
Germany
GermanyGermany
Germany
 
Kyngtbi
KyngtbiKyngtbi
Kyngtbi
 
menu a4
menu a4menu a4
menu a4
 
11 ftm18 amendola
11 ftm18 amendola11 ftm18 amendola
11 ftm18 amendola
 
N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축
N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축
N screen과 클라우드 컴퓨팅 패러다임 ux 생태계 구축
 
Webo Naver Manual(24 Dec2009)Sj
Webo Naver Manual(24 Dec2009)SjWebo Naver Manual(24 Dec2009)Sj
Webo Naver Manual(24 Dec2009)Sj
 
Materi ppt titanic
Materi ppt titanicMateri ppt titanic
Materi ppt titanic
 

Similar to Data recovery consistency with check db

Error management
Error managementError management
Error managementdaniil3
 
Handling errors in t sql code (1)
Handling errors in t sql code (1)Handling errors in t sql code (1)
Handling errors in t sql code (1)Ris Fernandez
 
patchVantage Cloud Starter Pack
patchVantage Cloud Starter Pack patchVantage Cloud Starter Pack
patchVantage Cloud Starter Pack David McNish
 
SQL Server - High availability
SQL Server - High availabilitySQL Server - High availability
SQL Server - High availabilityPeter Gfader
 
Oracle11g(1z0 050) v100612[1]
Oracle11g(1z0 050) v100612[1]Oracle11g(1z0 050) v100612[1]
Oracle11g(1z0 050) v100612[1]revoluson
 
Sql Server tips from the field
Sql Server tips from the fieldSql Server tips from the field
Sql Server tips from the fieldInnoTech
 
Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)
Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)
Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)Hari Srinivasan
 
SQL Operations : Part 3 (Database Restore & Database Shrink) - SLT
SQL Operations : Part 3 (Database Restore & Database Shrink) - SLTSQL Operations : Part 3 (Database Restore & Database Shrink) - SLT
SQL Operations : Part 3 (Database Restore & Database Shrink) - SLTAnkit Prajapati
 
Sql interview question part 8
Sql interview question part 8Sql interview question part 8
Sql interview question part 8kaashiv1
 
Online Reporting Architectures Behind Load Balancers
Online Reporting Architectures Behind Load BalancersOnline Reporting Architectures Behind Load Balancers
Online Reporting Architectures Behind Load BalancersCrystal Foor Manson
 
Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017
Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017
Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017Jelastic Multi-Cloud PaaS
 
Rman backup and recovery 11g new features
Rman backup and recovery 11g new featuresRman backup and recovery 11g new features
Rman backup and recovery 11g new featuresNabi Abdul
 
RMAN in 12c: The Next Generation (WP)
RMAN in 12c: The Next Generation (WP)RMAN in 12c: The Next Generation (WP)
RMAN in 12c: The Next Generation (WP)Gustavo Rene Antunez
 
2020 New Updated 1Z0-060 Questions and Answers
2020 New Updated 1Z0-060 Questions and Answers2020 New Updated 1Z0-060 Questions and Answers
2020 New Updated 1Z0-060 Questions and Answersdouglascarnicelli
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...Alex Zaballa
 

Similar to Data recovery consistency with check db (20)

Error management
Error managementError management
Error management
 
Handling errors in t sql code (1)
Handling errors in t sql code (1)Handling errors in t sql code (1)
Handling errors in t sql code (1)
 
patchVantage Cloud Starter Pack
patchVantage Cloud Starter Pack patchVantage Cloud Starter Pack
patchVantage Cloud Starter Pack
 
SQL Server - High availability
SQL Server - High availabilitySQL Server - High availability
SQL Server - High availability
 
Oracle11g(1z0 050) v100612[1]
Oracle11g(1z0 050) v100612[1]Oracle11g(1z0 050) v100612[1]
Oracle11g(1z0 050) v100612[1]
 
Les 09 diag
Les 09 diagLes 09 diag
Les 09 diag
 
Backup And Recovery
Backup And RecoveryBackup And Recovery
Backup And Recovery
 
Sql Server tips from the field
Sql Server tips from the fieldSql Server tips from the field
Sql Server tips from the field
 
Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)
Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)
Database Lifecycle Management and Cloud Management - Hands on Lab (OOW2014)
 
SQL Operations : Part 3 (Database Restore & Database Shrink) - SLT
SQL Operations : Part 3 (Database Restore & Database Shrink) - SLTSQL Operations : Part 3 (Database Restore & Database Shrink) - SLT
SQL Operations : Part 3 (Database Restore & Database Shrink) - SLT
 
Corruptbkp
CorruptbkpCorruptbkp
Corruptbkp
 
Sql interview question part 8
Sql interview question part 8Sql interview question part 8
Sql interview question part 8
 
Ebook8
Ebook8Ebook8
Ebook8
 
Online Reporting Architectures Behind Load Balancers
Online Reporting Architectures Behind Load BalancersOnline Reporting Architectures Behind Load Balancers
Online Reporting Architectures Behind Load Balancers
 
Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017
Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017
Automated scaling of microservice stacks for JavaEE applications - JEEConf 2017
 
JEEconf 2017
JEEconf 2017JEEconf 2017
JEEconf 2017
 
Rman backup and recovery 11g new features
Rman backup and recovery 11g new featuresRman backup and recovery 11g new features
Rman backup and recovery 11g new features
 
RMAN in 12c: The Next Generation (WP)
RMAN in 12c: The Next Generation (WP)RMAN in 12c: The Next Generation (WP)
RMAN in 12c: The Next Generation (WP)
 
2020 New Updated 1Z0-060 Questions and Answers
2020 New Updated 1Z0-060 Questions and Answers2020 New Updated 1Z0-060 Questions and Answers
2020 New Updated 1Z0-060 Questions and Answers
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
 

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Data recovery consistency with check db

  • 1.
  • 2. Data Recovery & Consistency with CHECKDBwith SQL Server Vinod Kumar Technology Evangelist - Microsoft @vinodk_sql www.ExtremeExperts.com http://blogs.sqlxml.org/vinodkumar
  • 3. Why Is This Session Important? Corruption does happen, mostly caused by IO subsystem People don’t realize they have corruption until too late People don’t know what to do when they do have corruption, leading to: More data loss and downtime than necessary Monetary and even job losses
  • 4. What Can Happen to an Unprepared DBA Confronted by Corruption?
  • 5. Session Takeaways From this session you will CHECKDB Significance Guidance and options after corruption Getting database online Distinguish Repair VS Restore DON’T TRY this on your Production Environments
  • 6. Agenda Discovering corruption Interpreting CHECKDB output Choosing between restore and repair Recovering from a ‘last resort’ With demos of common scenarios
  • 7. I/O Errors Three types 823 (a hard I/O error) 824 (a soft I/O error) 825 (a read-retry error) Nice error messages in 2005+ Msg 824, Level 24, State 2, Line 1 SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7232c940; actual: 0x720e4940). It occurred during a read of page (1:143) in database ID 8 at offset 0x0000000011e000 in file 'c:roken.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. Logged in msdb..suspect_pages Input into single-page restore operations
  • 8. Page Protection Options SQL Server allows pages to be ‘protected’ on disk from corruptions Allows fast detection of corruptions Set using ALTER DATABASE SET PAGE_VERIFY <option> Three options: NONE TORN_PAGE_DETECTION CHECKSUM
  • 9. DBCC CHECKDB The only way to read all allocated pages in the database Use to force page checksums to be checked Choose between full checks and WITH PHYSICAL_ONLY Many algorithms to minimize runtime and run ONLINE since SQL Server 2000 Blog post series: http://www.sqlskills.com/blogs/paul/category/CHECKDB-From-Every-Angle.aspx
  • 10. First Hints That Something Is Wrong… Application/user connections get broken Users report 823 or 824 errors ‘Hard’ and ‘Soft’ IO errors Backup jobs start failing Error 3043 – backup detected checksum errors Agent alerts start firing Should have alerts on all errors with severity >= 19 Should have an alert on error 825 Informational (!) message that there are transient IO problems Maintenance jobs start failing
  • 11. As Soon As Corruption Is Suspected… No need to panic! Determine the extent of the corruption Run DBCC CHECKDB Look in the SQL Server error log Check maintenance job history Check what backups are available Wait for CHECKDB to finish before doing anything else You many not NEED to do anything intrusive/destructive
  • 12. How To Run DBCC CHECKDB By default, CHECKDB will: Only return the first 200 errors Return lots of info that’s distracting in a corruption situation Use the following command with only these options: DBCC CHECKDB (<<yourdb>>) WITH ALL_ERRORMSGS, NO_INFOMSGS If it’s taking longer than usual, that should mean that it found some corruption Check the error log for message 5268 from SQL Server 2005 SP2 onwards to see if it’s rescanning some data Most importantly, wait for it to complete!
  • 13. Interpreting CHECKDB Output (1) So, CHECKDB completes and you have a bunch of cryptic error messages. Now what? There are over 150 errorsthat CHECKDB can output, some with over 200 states Figuring out what one error means isn’t too bad MSDN has most of them published for reference There are some tips and tricks you can use…
  • 14. Interpreting CHECKDB Output (2) Did CHECKDB fail? If it stops before completing successfully, something bad has happened that is preventing CHECKDB from running This means there is no choice but to restore from a backup as CHECKDB cannot be forced to run (and hence repair) Examples of fatal (to CHECKDB) errors 7984 – 7988: corruption in critical system tables 8967: invalid states within CHECKDB itself 8930: corrupt metadata in the database such that CHECKDB could not run See ‘Understanding DBCC Error Messages’ in the BOL for DBCC CHECKDB for more details
  • 15. Interpreting CHECKDB Output Example fatal errors to CHECKDB demo
  • 16. Interpreting CHECKDB Output (3) Are the corruptions only in non-clustered indexes? If recommended repair level is REPAIR_REBUILD, then YES! Otherwise, check all the index IDs in the errors – if they’re all greater than 1, then YES! If YES, you *could* just rebuild the corrupt indexes Depends on the error, and the size of the index But, what caused the corruption? If you just rebuild the indexes, the corruption will probably happen again (especially if caused by the IO subsystem) Make sure you do root-cause analysis and take preventative measures
  • 17. Interpreting CHECKDB Output Non-clustered index corruption only demo
  • 18. Interpreting CHECKDB Output (4) Was there an un-repairable error found? 8909, 8938, 8939 (page header corruption) errors where the type is ‘PFS’ 8970 error: invalid data for the column type 8992 error: CHECKCATALOG (metadata mismatch) error Plus a few more obscure ones E.g. an 8904 error (extent is allocated to two objects). This is usually repairable except in the case where the extent is marked as mixed and dedicated, and has pages allocated to multiple objects. The repair is too complicated and/or destructive so is not attempted. None of these can be automatically repaired But if you don’t have a backup without these corruptions, you may be able to fix the 8970 and 8992 errors…
  • 19. Interpreting CHECKDB Output Manually repairing an invalid data value (2570) in SQL Server 2005+ demo
  • 20. Interpreting CHECKDB Output Manually repairing a metadata corruption (8992) in SQL Server 2005+ demo
  • 21. Recovering Using Backups Best way to avoid data loss Not necessarily the best way to avoid downtime Depends what kind of backups are available Although backup compression in SQL Server 2008 helps… Plethora of options available Full database backup is a good starting point Series of transaction log backups as well is much better Beyond the scope of this session… Remember: Backups have to exist to be useful Backups have to be valid to avoid data loss
  • 22. Choosing Between Restore and Repair (1) Multiple decision points that could short-circuit the decision process Do you still have a database? No – you must restore from a backup Do you have working backups? No – you must use repair, or restore a damaged backup with CONTINUE_AFTER_ERROR, or extract data to a new database Is the log damaged? Yes – you must restore, or run emergency mode repair, or extract to a new database
  • 23. Choosing Between Restore and Repair (2) Did CHECKDB fail? Yes – you must restore or extract Is it just non-clustered indexes that are damaged? Yes – maybe rebuild them manually Are there any un-repairable errors? Yes – you must restore or extract If you’re still able to make a repair/restore choice: Consider your down-time and data-loss Service Level Agreements Use whichever option you can which allows you to limit down-time and data-loss while still staying within the SLAs
  • 24. Repair vs. Restore Manually repairing a single page corruption with and without backups demo
  • 25. Beware of REPAIR_ALLOW_DATA_LOSS Repair fixes structural inconsistencies by de-allocating (Not REPAIR_REBUILD, but indexes should be fixed manually) This is the fastest and most provably correct way Repair doesn’t take into account: Foreign-key constraints Inherent business logic and data relationships Replication (see BOL for DBCC CHECKDB) Before running repair, protect yourself Take a backup and quiesce replication topologies involved After running repair, check the data Consider running DBCC CHECKCONSTRAINTS Fix up any replication topologies involved
  • 26. What If the Log Is Damaged? Without a backup, two realistic choices: Use EMERGENCY mode to access the data in the corrupt state E.g. to extract to another database ALTER DATABASE mydb SET EMERGENCY; Use EMERGENCY mode repair New feature of SQL Server 2005 Rebuilds the log and runs REPAIR_ALLOW_DATA_LOSS as an atomic operation Database must be in EMERGENCY *and* SINGLE_USER This is the 3rd worst state to be in
  • 27. Things That People Often Try *First* Restart SQL Server Just wastes time and delays getting back online Immediately jump to a last resort and cause data loss without working through options Running repair Rebuilding the transaction log Detach a suspect database It will fail to attach again – now the situation is even worse! This is the 2nd worst state to be in However, there’s a trick you can use…
  • 28. Repairing a Suspect Database How to hack a detached suspect database back into the system and repair it demo
  • 29. What If You Don't Have a Database At All *OR* Any Kind of Backup to Restore From? Total data loss - *this* is the worst state to be in You might have no choice apart from manual re-entry, or URLC Update Resume, Leave City 
  • 30. Summary: Pulling It All Together Know the signs of corruption When corruption occurs, be methodical: Figure out the extent of the corruption Figure out your options to limit downtime, data loss, or both If you’re going to run repair, take a backup first Fix the corruption Finish with root-cause analysis Test all of this before you have to do it for real Good luck!
  • 31. Resources (Paul's Blog) Example corrupt databases to play with http://www.sqlskills.com/blogs/paul/post/Example-20002005-corrupt-databases-and-some-more-info-on-backup-restore-page-checksums-and-IO-errors.aspx Everything you ever wanted to know about CHECKDB http://www.sqlskills.com/blogs/paul/category/CHECKDB-From-Every-Angle.aspx Tips and tricks for interpreting CHECKDB output http://www.sqlskills.com/blogs/paul/post/CHECKDB-From-Every-Angle-Tips-and-tricks-for-interpreting-CHECKDB-output.aspx Log rebuilding and repair http://www.sqlskills.com/blogs/paul/post/Corruption-Last-resorts-that-people-try-first.aspx Page checksums and SQLIOSim http://www.sqlskills.com/blogs/paul/post/How-to-tell-if-the-IO-subsystem-is-causing-corruptions.aspx EMERGENCY mode repair http://www.sqlskills.com/blogs/paul/post/CHECKDB-From-Every-Angle-EMERGENCY-mode-repair-the-very-very-last-resort.aspx
  • 32. આભાર ধন্যবাদ நன்றி धन्यवाद ಧನ್ಯವಾದಗಳು ధన్యవాదాలు ଧନ୍ୟବାଦ ਧੰਨਵਾਦ നിങ്ങള്‍‌ക്ക് നന്ദി
  • 33. © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.