100字范文,内容丰富有趣,生活中的好帮手!
100字范文 > begin backup导致的故障恢复全过程

begin backup导致的故障恢复全过程

时间:2019-04-22 06:03:45

相关推荐

begin backup导致的故障恢复全过程

墨墨导读:一套19C CDB数据库,存储更换HBA卡宕,本文详述这起begin backup导致的故障恢复全过程。

半夜接到客户反馈,一套19C CDB数据库,存储更换HBA卡宕,起不来了,OPEN时提示需要介质恢复,这里截了一段ALERT LOG。

-07-28T23:40:53.328908+08:00Errors in file /u02/app/oracle/diag/rdbms/racdb3/racdb32/trace/racdb32_ora_306493.trc:ORA-10873: file 1 needs to be either taken out of backup mode or media recoveredORA-01110: data file 1: '+DATA/RACDB3/DATAFILE/system.278.1037610503'-07-28T23:40:53.387627+08:00Errors in file /u02/app/oracle/diag/rdbms/racdb3/racdb32/trace/racdb32_ora_306493.trc:ORA-10873: file 1 needs to be either taken out of backup mode or media recoveredORA-01110: data file 1: '+DATA/RACDB3/DATAFILE/system.278.1037610503'ORA-10873 signalled during: ALTER DATABASE OPEN /* db agent *//* {0:17:3557} */...-07-28T23:40:55.357177+08:00License high water mark = 2-07-28T23:40:55.357492+08:00USER(prelim) (ospid: 310054): terminating the instance-07-28T23:40:56.369307+08:00Instance terminated by USER(prelim), pid = 310054ORA-10873: file 1 needs to be either taken out of backup mode or media recovered

这里报ORA-10873是由于数据库或表空间BEGIN BACKUP导致,正确的处理方法只需要end backup即可。

alter database end backup;alter tablespace [tablespace_name] end backup;alter database open;

[oracle@test ~]$ oerr ora 10873

perl: warning: Setting locale failed.

perl: warning: Please check that your locale settings:

LANGUAGE = (unset),

LC_ALL = (unset),

LANG = “en_us.utf8”

are supported and installed on your system.

perl: warning: Falling back to the standard locale (“C”).

10873, 00000, “file %s needs to be either taken out of backup mode or media recovered”

// *Cause: An attempt was made to open a database after an instance failure or

// SHUTDOWN ABORT interrupted an online backup.

// *Action: If the indicated file is not a restored backup, then issue the

// ALTER DATABASE END BACKUP command and open the database. If the

// file is a restored online backup, then apply media recovery to

// it and open the database.

当时RECOVER DATABASE 提示找不到归档(需要6-18号的归档)

由于有存储相关操作,误以为其它原因导致的问题,没有关注该报错,查询vdatafile,vdatafile,vdatafile_header发现检查点为上个月的6-18号。

---当前日期为-07-28T23:40:56SQL> col name for a10SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from v$containers a,v$datafile b where a.con_id=b.con_id order by checkpoint_change#;CON_ID NAME FILE#RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIMESTATUS-------------------- ---------- -------------------- -------------------- -------------------- ------------------- -------2 PDB$SEED 64 2336937 -04-13 09:26:28 ONLINE2 PDB$SEED 51 2336937 -04-13 09:26:28 SYSTEM2 PDB$SEED 89 2336937 -04-13 09:26:28 ONLINE1 CDB$ROOT178 178 8041849750453 -06-18 19:28:19 ONLINE1 CDB$ROOT 77 8041849750453 -06-18 19:28:19 ONLINE1 CDB$ROOT 99 8041849750453 -06-18 19:28:19 ONLINE1 CDB$ROOT 44 8041849750453 -06-18 19:28:19 ONLINE1 CDB$ROOT 33 8041849750453 -06-18 19:28:19 ONLINE1 CDB$ROOT 11 8041849750453 -06-18 19:28:19 SYSTEM

接着检查日志及询问客户也没有做restore的操作,误判断为出现了异常,未找到解决办法,因为有最近的全备,做了restore CDB$ROOT的操作,结果悲剧了,还原出来的数据文件,RECOVER仍然需要从6-18的归档开始,查询文件头检查点还是6-18号,接着查询备份中信息,文件的检查点也是6-18号,没遇到过这种情况,以为是ORACLE BUG。

RMAN> list backup of datafile 1;List of Backup Sets===================BS Key Type LV Size Device Type Elapsed Time Completion Time------- ---- -- ---------- ----------- ------------ -------------------2508 Incr 0 23.19GSBT_TAPE 00:04:26**-07-26 08:32:28**BP Key: 2508 Status: AVAILABLE Compressed: NO Tag: HOT_DB_BK_LEVEL0Handle: bk_3490_1_1046766482 Media: @aaaa6List of Datafiles in backup set 2508File LV Type Ckp SCN Ckp Time Abs Fuz SCN Sparse Name---- -- ---- ---------- ------------------- ----------- ------ ----1 0 Incr 8041849750453 **-06-18 19:28:19** 8050242429418 NO +DATA/RACDB3/DATAFILE/system.278.1037610503RMAN>

这里有一个大问题,就是由的数据库较大,restore前没有对当前环境做备份。切记,任何危险的变更操作都需要备份。做到可回退!!!

咨询公司专家后,确定为某此表空间做了begin backup导致。begin backup后文件头上的checkpoint不再更新。但经了解实际没有人为发起过backup backup,alert log中也没有找到begin backup的操作记录,需要再分析。

这时由于之前做了restore cdbroot的操作,控制文件,cdbroot的文件已从备份中还原,导致不能再end backup操作,1个月前的归档已清理,也没办法从6-18开始应用归档。没有办法通过正常的途径恢复数据库,悲剧!!!!

通过v$backup确定文件处于begin backup热备模式。SQL> select * from v$backup;FILE# STATUSCHANGE# TIMECON_ID-------------------- ------------------ -------------------- ------------------- --------------------3 NOT ACTIVE8041849750453 -06-18 19:28:1914 NOT ACTIVE8041849750453 -06-18 19:28:1915 NOT ACTIVE 026 NOT ACTIVE 027 ACTIVE8041849750453 -06-18 19:28:1918 NOT ACTIVE 02.......19 ACTIVE8041849750453 -06-18 19:28:19420 ACTIVE8041849750453 -06-18 19:28:19421 ACTIVE8041849750453 -06-18 19:28:194SQL> select * from v$backup;FILE# STATUSCHANGE# TIMECON_ID-------------------- ------------------ -------------------- ------------------- --------------------3 NOT ACTIVE8041849750453 -06-18 19:28:1914 NOT ACTIVE8041849750453 -06-18 19:28:1915 NOT ACTIVE 026 NOT ACTIVE 027 ACTIVE8041849750453 -06-18 19:28:1918 NOT ACTIVE 029 NOT ACTIVE8041849750453 -06-18 19:28:19115 ACTIVE8041849750453 -06-18 19:28:19416 ACTIVE8041849750453 -06-18 19:28:19417 ACTIVE8041849750453 -06-18 19:28:19418 ACTIVE8041849750453 -06-18 19:28:194FILE# STATUSCHANGE# TIMECON_ID-------------------- ------------------ -------------------- ------------------- --------------------19 ACTIVE8041849750453 -06-18 19:28:19420 ACTIVE8041849750453 -06-18 19:28:19421 ACTIVE8041849750453 -06-18 19:28:19422 ACTIVE8041849750453 -06-18 19:28:19423 ACTIVE8041849750453 -06-18 19:28:19424 ACTIVE8041849750453 -06-18 19:28:19425 ACTIVE8041849750453 -06-18 19:28:19426 ACTIVE8041849750453 -06-18 19:28:19427 ACTIVE8041849750453 -06-18 19:28:19428 ACTIVE8041849750453 -06-18 19:28:19429 ACTIVE8041849750453 -06-18 19:28:194

如果不是CDB还可以重建控制文件,然后用以下方法解决,但如果是CDB,需要切到pdb中执行以下操作,重建控制文件后,PDB是ORACLE内部的名字,没办法切换PDB,所以以下方法行不通。

alter database datafile 1 offline;

alter database datafile 1 end backup;

alter database datafile 1 online;

最终的解决办法是,bbed修改文件头上的检查点信息,再应用近几天的归档,应用到最新状态,open resetlogs,最终0数据丢失恢复。

这里由于文件比较多,不好全copy到本地文件系统,用到了ASM未公开的内部包,只

读取ASM中的数据头数据块到本地,bbed修改完,再copy回去。

--copy datafile head from asmSet pagesize 300Set linesize 300set numw 20alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';col name for a100select '@tofs '||b.name||' '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/') from v$datafile b where b.checkpoint_change#<805077812 and con_id!=2 order by checkpoint_change#;@tofs +DATA/RACDB/DATAFILE/system.278.1037610503 /u01/work/system.278.1037610503@tofs +DATA/RACDB/DATAFILE/sysaux.261.1037610537 /u01/work/sysaux.261.1037610537@tofs +DATA/RACDB/DATAFILE/undotbs1.288.1037610553 /u01/work/undotbs1.288.1037610553@tofs +DATA/RACDB/DATAFILE/users.282.1037610553 /u01/work/users.282.1037610553@tofs +DATA/RACDB/DATAFILE/undotbs2.263.1037611185 /u01/work/undotbs2.263.1037611185@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/system.302.1038386279 /u01/work/system.302.1038386279@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/sysaux.292.1038386277 /u01/work/sysaux.292.1038386277@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/undotbs1.299.1038386279......@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/sysaux.272.1038492071 /u01/work/sysaux.272.1038492071@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/undotbs1.281.1038492071 /u01/work/undotbs1.281.1038492071@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/undo_2.283.1038492071 /u01/work/undo_2.283.1038492071@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/users.279.1038492071 /u01/work/users.279.1038492071@tofs +DATA/RACDB/DATAFILE/test11.425.1041690739 /u01/work/test11.425.1041690739--生成bbed listfileSet pagesize 300Set linesize 300set numw 20alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';col name for a100select file#||' '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/') from v$datafile b where b.checkpoint_change#<805077812 and con_id!=2 order by checkpoint_change#;BBED> infoFile# NameSize(blks)----- --------------1 /u01/work/system.278.1037610503 03 /u01/work/sysaux.261.1037610537 04 /u01/work/undotbs1.288.103761055307 /u01/work/users.282.1037610553 09 /u01/work/undotbs2.263.1037611185015 /u01/work/system.302.1038386279 016 /u01/work/sysaux.292.1038386277 017 /u01/work/undotbs1.299.1038386279018 /u01/work/undotbs1.303.1038386279019 /u01/work/undotbs2.294.1038386277020 /u01/work/undotbs2.301.10383862790......178 /u01/work/test11.425.1041690739 0---bbed 从kcvfhbcp恢复检查点assign file 1 block 1 kcvfhckp = file 1 block 1 kcvfhbcpassign file 3 block 1 kcvfhckp = file 3 block 1 kcvfhbcpassign file 4 block 1 kcvfhckp = file 4 block 1 kcvfhbcpassign file 7 block 1 kcvfhckp = file 7 block 1 kcvfhbcpassign file 9 block 1 kcvfhckp = file 9 block 1 kcvfhbcpassign file 15 block 1 kcvfhckp = file 15 block 1 kcvfhbcpassign file 16 block 1 kcvfhckp = file 16 block 1 kcvfhbcpassign file 17 block 1 kcvfhckp = file 17 block 1 kcvfhbcpassign file 18 block 1 kcvfhckp = file 18 block 1 kcvfhbcpassign file 19 block 1 kcvfhckp = file 19 block 1 kcvfhbcp......assign file 119 block 1 kcvfhckp = file 119 block 1 kcvfhbcpassign file 178 block 1 kcvfhckp = file 178 block 1 kcvfhbcpassign file 181 block 1 kcvfhckp = file 181 block 1 kcvfhbcpsum apply file 1 block 1sum apply file 3 block 1sum apply file 4 block 1sum apply file 7 block 1sum apply file 9 block 1sum apply file 15 block 1sum apply file 16 block 1sum apply file 17 block 1sum apply file 18 block 1sum apply file 19 block 1......sum apply file 118 block 1sum apply file 119 block 1sum apply file 178 block 1sum apply file 181 block 1kcvfhckp 检查点,恢复起始点,begin backup后再不更新kcvfhbcp begin backup后检查点(begin backup后检查点更新在该位置,end backup以该检查点更新kcvfhckp)---copy to asmSet pagesize 300Set linesize 300set numw 20alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';col name for a100select '@toasm '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/')||' '||b.name from v$datafile b where b.checkpoint_change#<805077812 and con_id!=2 order by checkpoint_change#;@toasm /u01/work/system.278.1037610503 +DATA/RACDB/DATAFILE/system.278.1037610503@toasm /u01/work/sysaux.261.1037610537 +DATA/RACDB/DATAFILE/sysaux.261.1037610537@toasm /u01/work/undotbs1.288.1037610553 +DATA/RACDB/DATAFILE/undotbs1.288.1037610553@toasm /u01/work/users.282.1037610553 +DATA/RACDB/DATAFILE/users.282.1037610553@toasm /u01/work/undotbs2.263.1037611185 +DATA/RACDB/DATAFILE/undotbs2.263.1037611185@toasm /u01/work/system.302.1038386279 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/system.302.1038386279@toasm /u01/work/sysaux.292.1038386277 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/sysaux.292.1038386277@toasm /u01/work/undotbs1.299.1038386279 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/undotbs1.299.1038386279.......@toasm /u01/work/test11.425.1041690739 +DATA/RACDB/DATAFILE/test11.425.1041690739

查询v$datafile_header 确认checkpoint_change#已更新。

recover databaserun {allocate channel ch00 device type sbt;allocate channel ch01 device type sbt;allocate channel ch02 device type sbt;allocate channel ch03 device type sbt;SEND ‘NB_ORA_SERV=bak-svr,NB_ORA_CLIENT=racdb6’;recover database ;release channel ch00;release channel ch01;release channel ch02;release channel ch03;}

注册最近两天的归档,继续recover database,直到最新

catalog archivelog ‘+ARCH/RACDB/archivelog/_07_28/thread_2_seq_4935.819.1046934651’;catalog archivelog ‘+ARCH/RACDB/archivelog/_07_28/thread_2_seq_4936.524.1046935119’;…catalog archivelog ‘+ARCH/RACDB/archivelog/_07_28/thread_2_seq_4942.554.1046945869’;catalog archivelog ‘+ARCH/RACDB/archivelog/_07_28/thread_2_seq_4943.750.1046946151’;最后recover database using backup controlfile;cancel;alter database open resetlogs;

到些数据库启动成功。

下面测试重现了该问题,及正确的处理方法。不过19C中并没有人为发起begin backup,需要继续排查什么原因导致。

SQL> alter tablespace ts2 begin backup;Tablespace altered.SQL> select * from v$backup;FILE# STATUSCHANGE# TIME---------- ------------------ ---------- -----------------1 NOT ACTIVE 02 NOT ACTIVE 03 NOT ACTIVE 04 NOT ACTIVE 05 NOT ACTIVE 06 NOT ACTIVE 07 NOT ACTIVE 08 ACTIVE2237473 2030 17:30:449 NOT ACTIVE 09 rows selected.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> shutdown abort;ORACLE instance shut down.SQL>SQL> startupORA-3: obsolete or deprecated parameter(s) specified for RDBMS instanceORACLE instance started.Total System Global Area 626327552 bytesFixed Size 2255832 bytesVariable Size 419431464 bytesDatabase Buffers197132288 bytesRedo Buffers7507968 bytesDatabase mounted.ORA-10873: file 8 needs to be either taken out of backup mode or media recoveredORA-01110: data file 8: '/oracle/app/oracle/oradata/TESTA/datafile/o1_mf_ts2_hcw05wo4_.dbf'SQL> alter tablespace ts2 end backup;Tablespace altered.SQL> alter database open;Database altered.

墨天轮原文链接:https://www.modb.pro/db/28494(复制到浏览器中打开或者点击“阅读原文”)

推荐阅读:144页!分享珍藏已久的数据库技术年刊

视频号,新的分享时代,关注我们,看看有什么新发现?

数据和云

ID:OraNews

如有收获,请划至底部,点击“在看”,谢谢!

点击下图查看更多 ↓

云和恩墨大讲堂 |一个分享交流的地方

长按,识别二维码,加入万人交流社群

请备注:云和恩墨大讲堂

点个“在看”

你的喜欢会被看到❤

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。