昨天一备份数据库,在open read only的过程中出现故障,从而导致在standby上备份失败。alert log 中信息如下: Errors in file /opt/Oracle/admin/crmhz/udump/crmhz_ora_13914.trc:ORA-00604: error occurred at recursive SQL level 1 ORA-16000: database open for read-only accessError 604 happened during db open, shutting down databaseUSER: terminating instance due to error 604Instance terminated by USER, pid = 13914ORA-1092 signalled during: alter database open read only...Tue Aug 10 09:31:51 2004 再查看trace file信息 [root@crmsb bdump]# more?? /opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trc/opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trcOracle9i EnterPRise Edition Release 9.2.0.1.0 - ProdUCtionWith the Partitioning, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.1.0 - ProductionORACLE_HOME = /opt/oracle/product/9.2.0System name:??? linuxNode name:????? crmsbRelease:??????? 2.4.9-e.3smpVersion:??????? #1 SMP Fri May 3 16:48:54 EDT 2002Machine:??????? i686Instance name: crmhzRedo thread mounted by this instance: 1Oracle process number: 11Unix process pid: 13914, image: oracle@crmsb (TNS V1-V3) *** session ID:(10.3) 2004-08-10 09:30:37.408*** 2004-08-10 09:30:37.408Managed Recovery: Cancel posted.ORA-00604: error occurred at recursive SQL level 1ORA-16000: database open for read-only access 都没有提供什么有价值的信息,很希奇会出现604错误 于是决定trace? open过程 startup nomount alter database mount standby? database; select sid from v$mystat where rownum = 1; select sid,serial# from v$session where sid = ?; exec dbms_system.set_ev(sid,serial#,10046,12,''); alter database open read only; 这是数据库crash ,找到trace file,发现末尾部分为 select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0),2))from objauth$ where obj#=:1 group by grantee#,privilege#,nvl(col#,0) order by grantee#END OF STMTPARSE #13:c=0,e=485,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=0,tim=1066522815818430BINDS #13:?bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
?? bfp=40647620 bln=22 avl=02 flg=05?? value=72EXEC #13:c=0,e=518,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=1066522815819173FETCH #13:c=0,e=58,p=0,cr=2,cu=0,mis=0,r=0,dep=2,og=4,tim=1066522815819296=====================PARSING IN CURSOR #12 len=34 dep=1 uid=0 oct=7 lid=0 tim=1066522815819480 hv=3008185193 ad='56fcbaac'delete from idl_ub1$ where obj#=:1END OF STMTPARSE #12:c=10000,e=14804,p=6,cr=79,cu=0,mis=1,r=0,dep=1,og=0,tim=1066522815819472BINDS #12:?bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0?? bfp=40645a48 bln=22 avl=06 flg=05?? value=4294951150WAIT #12: nam='db file sequential read' ela= 41 p1=1 p2=22839 p3=1WAIT #12: nam='db file sequential read' ela= 27 p1=1 p2=9 p3=1EXEC #12:c=0,e=884,p=2,cr=4,cu=2,mis=0,r=0,dep=1,og=4,tim=1066522815820884ERROR #12:err=16000 tim=1837753940ORA-00604: error occurred at recursive SQL level 1ORA-16000: database open for read-only accessEXEC #1:c=370000,e=375493,p=304,cr=5505,cu=4,mis=0,r=0,dep=0,og=4,tim=1066522815826303ERROR #1:err=1092 tim=1837753940 很希奇,正常standby数据库open read only的时候是没有delete的,在这里居然出现dml。于是一开始我尝试去研究idl_ub1$是个什么表,结果发现是存储 pl/sql代码的,很希奇,这个数据库为什么会出现个状况,继续在 $ORACLE_HOME/rdbms/admin 下 grep -ir idl_ub1$ * ,结果找到和一个x$ 表挂上关系做连接了,未果,又通过 delete from idl_ub1$ where obj#=:1 中绑定变量value=4294951150 去结合搜索出来的view查找,也没什么进展。实际上,根据 WAIT #12: nam='db file sequential read' ela= 27 p1=1 p2=9 p3=1 可以知道这个delete是访问了file# =1 block#=9 的数据块,很显然,这是系统回滚段的块头,根据这里我们可以知道,这是一个回滚操作,也就是说,在主数据库中由于DDL 的失败,使得系统回滚段中记录了该操作的回退信息,但是数据库又还没有回滚完成,到了standby中就出现这个状况了。 为了尝试解决这个问题,首先继续应用几个日志,再open read only,结果失败,也就是说该失败的ddl在主数据库一直没有回滚,而数据库中已经查找不到任何异常进程或者事务、锁,只能认为是系统出现异常。于是,今天在征得相关负责人的同意的情况下,我们决定在下班后将主数据库重新启动一次看看,这样强迫系统回滚失败的ddl。下班重新启动数据库,切换日志并归档,在standby上应用日志到重起数据库之后产生的归档日志,再open read only 成功,问题得到解决。