Last week we had an incident were one of the storage went down. Just to give you clear idea, we have 2 different storage for Redo logs and controlfiles. This way we have multiplexing on database and so at Storage level too. My second shift started with following issue where one of the production database went down with error – ORA-00221: error on write to control file | ORA-00206: error in writing (block xx, # blocks xx) of control file.
If you interested check out How to multiplex controlfile in oracle 19c
Just for your information. Actual database name is censored with website name.
Around 3:03 PM I got an alert from OEM saying Production database is in unknown state. (I thought it would be one of that day when OEM wants to celebrate 1st April – April fool 😀 ). That was not the case – Immediately I took the session of the server, check if I can see the SMON is up or not. And it was not
Time to check the Alert log –
Mon Aug 01 15:02:00 2022 Archived Log entry 40355 added for thread 1 sequence 48832 ID 0xffffffff8a1c3d29 dest 1: Mon Aug 01 15:13:35 2022 Errors in file /oradump/oracledbworld/diag/diag/rdbms/oracledbworld/oracledbworld/trace/oracledbworld_ckpt_25249.trc: ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: '/redo2/oracledbworld/CONTROL/oracledbworld_control02.ctl' ORA-27063: number of bytes read/written is incorrect SVR4 Error: 6: No such device or address Additional information: 4294967295 Additional information: 16384 Mon Aug 01 15:13:35 2022 Errors in file /oradump/oracledbworld/diag/diag/rdbms/oracledbworld/oracledbworld/trace/oracledbworld_ckpt_25249.trc: ORA-00221: error on write to control file ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file:'/redo2/oracledbworld/CONTROL/oracledbworld_control02.ctl' ORA-27063: number of bytes read/written is incorrect SVR4 Error: 6: No such device or address Additional information: 4294967295 Additional information: 16384 Mon Aug 01 15:13:35 2022 Errors in file /oradump/oracledbworld/diag/diag/rdbms/oracledbworld/oracledbworld/trace/oracledbworld_lgwr_25245.trc: ORA-00206: error in writing (block 14, # blocks 1) of control file ORA-00202: control file:'/redo2/oracledbworld/CONTROL/oracledbworld_control02.ctl' ORA-27063: number of bytes read/written is incorrect SVR4 Error: 6: No such device or address Additional information: 4294967295 Additional information: 16384 Mon Aug 01 15:13:35 2022 USER (ospid: 25249): terminating the instance due to error 221 Mon Aug 01 15:13:35 2022 System state dump requested by (instance=1, osid=4294992545 (CKPT)), summary=[abnormal instance termination]. System State dumped to trace file /oradump/oracledbworld/diag/diag/rdbms/oracledbworld/oracledbworld/trace/oracledbworld_diag_25233_20220801151335.trc Mon Aug 01 15:13:36 2022 Dumping diagnostic data in directory=[cdmp_20220801151335], requested by (instance=1, osid=4294992545 (CKPT)), summary=[abnormal instance termination]. Mon Aug 01 15:13:37 2022 Instance terminated by USER, pid = 25249
Since error OS related, I tried to do cd to the filesystem and it gave me i/o error. Informed system admin and Storage team, Both confirmed storage is not responding and since they have to visit DC to check what’s the issue, they requested us to move out of that storage.
Table of Contents
Fun Question –
Today you may get answer for following question ?
- What happens if one of controlfile is not accessible.
- What happens if one of redo member from each group is not accessible.
- Can we start database with one member of each group of redo log is not accessible.
De-multiplexing of Controlfile
If you are planning to drop a diskgroup / planning to remove the dependency of the mount point which has controlfile. We will require downtime for de-multiplexing of controlfile.
- Create pfile from spfile ( we can connect to any instance and create a pfile from spfile provided you have given full path of spfile and always make sure you have give pfile location too.)
SQL> create pfile='/tmp/pfile_01082022.ora' from spfile='/u01/app/oracle/product/dbs/spfile<>.ora';
2. Remove one entry of control file from control_files parameter in pfile. ( In my case I am removing problematic control file.)
Original –
oracle :~/orabase/oracle/dbs$ cat /tmp/pfile_01082022.ora oracledbworld.__db_cache_size=1308622848 oracledbworld.__java_pool_size=16777216 oracledbworld.__large_pool_size=100663296 oracledbworld.__oracle_base='/oracle/orabase'#ORACLE_BASE set from environment oracledbworld.__pga_aggregate_target=1073741824 oracledbworld.__sga_target=3221225472 oracledbworld.__shared_io_pool_size=0 oracledbworld.__shared_pool_size=1744830464 oracledbworld.__streams_pool_size=16777216 *._smu_debug_mode=33554432 *.archive_lag_target=900 *.Compatible='11.2.0.4' *.control_files='/redo1/oracledbworld/cntrl/oracledbworld_control01.ctl','/redo2/oracledbworld/CONTROL/oracledbworld_control02.ctl' *.db_block_size=8192 *.db_files=1500 *.db_name='oracledbworld' *.db_unique_name='oracledbworld' *.diagnostic_dest='/oradump/oracledbworld/diag' *.log_archive_dest_1='location=/arch/oracledbworld/arch' *.log_archive_format='oracledbworld_%r_%t_%s.arc' *.log_buffer=16072000 *.nls_date_format='DD-MM-YYYY' *.open_cursors=3500 *.optimizer_adaptive_features=FALSE *.pga_aggregate_target=1G *.Processes=1500 *.remote_login_passwordfile='EXCLUSIVE' *.sga_max_size=3G *.sga_target=3G *.standby_file_management='auto' *.statistics_level='TYPICAL' *.timed_statistics=TRUE *.undo_management='AUTO' *.utl_file_dir='*' oracle:~/orabase/oracle/dbs$
Removed one entry of controlfile.
oracle:~/orabase/oracle/dbs$ cp /tmp/pfile_01082022.ora /tmp/pfile_01082022.ora_safe oracle:~/orabase/oracle/dbs$ vi /tmp/pfile_01082022.ora oracle :~/orabase/oracle/dbs$ cat /tmp/pfile_01082022.ora oracledbworld.__db_cache_size=1308622848 oracledbworld.__java_pool_size=16777216 oracledbworld.__large_pool_size=100663296 oracledbworld.__oracle_base='/oracle/orabase'#ORACLE_BASE set from environment oracledbworld.__pga_aggregate_target=1073741824 oracledbworld.__sga_target=3221225472 oracledbworld.__shared_io_pool_size=0 oracledbworld.__shared_pool_size=1744830464 oracledbworld.__streams_pool_size=16777216 *._smu_debug_mode=33554432 *.archive_lag_target=900 *.Compatible='11.2.0.4' *.control_files='/redo1/oracledbworld/cntrl/oracledbworld_control01.ctl' *.db_block_size=8192 *.db_files=1500 *.db_name='oracledbworld' *.db_unique_name='oracledbworld' *.diagnostic_dest='/oradump/oracledbworld/diag' *.log_archive_dest_1='location=/arch/oracledbworld/arch' *.log_archive_format='oracledbworld_%r_%t_%s.arc' *.log_buffer=16072000 *.nls_date_format='DD-MM-YYYY' *.open_cursors=3500 *.optimizer_adaptive_features=FALSE *.pga_aggregate_target=1G *.Processes=1500 *.remote_login_passwordfile='EXCLUSIVE' *.sga_max_size=3G *.sga_target=3G *.standby_file_management='auto' *.statistics_level='TYPICAL' *.timed_statistics=TRUE *.undo_management='AUTO' *.utl_file_dir='*' :~/orabase/oracle/dbs$
3. Now recreated the spfile from pfile and started the database
:~/orabase/oracle/dbs$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.2.0 Production on Mon Aug 1 15:13:09 2022 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to an idle instance. SQL> create spfile='/oracle/orabase/oracle/dbs/spfileoracledbworld.ora' from pfile='/tmp/pfile_01082022.ora'; File created. SQL> :~$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.2.0 Production on Mon Aug 1 15:13:16 2022 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to an idle instance. SQL> deff DEFINE _DATE = "01-AUG-22" (CHAR) DEFINE _CONNECT_IDENTIFIER = "oracledbworld" (CHAR) DEFINE _USER = "SYS" (CHAR) DEFINE _PRIVILEGE = "AS SYSDBA" (CHAR) DEFINE _SQLPLUS_RELEASE = "1201000200" (CHAR) DEFINE _EDITOR = "ed" (CHAR) DEFINE _O_VERSION = "" (CHAR) DEFINE _O_RELEASE = "" (CHAR) SQL> startup ORACLE instance started. Total System Global Area 3221225472 bytes Fixed Size 2919184 bytes Variable Size 1879053552 bytes Database Buffers 1308622848 bytes Redo Buffers 30629888 bytes Database mounted. Database opened. SQL>
Redo log De-multiplexing
Now it’s time to remove redo members from the problematic mount point. Since we had 3 separate mount point, gave us no problem while removal.
SQL> select * from v$logfile; GROUP# STATUS TYPE MEMBER IS_ CON_ID ---------- ------- ------- ------------------------------------------------------------ --- ---------- 1 ONLINE /redo1/oracledbworld/redo1/redo1a.log NO 0 1 ONLINE /redo3/oracledbworld/redo1/redo1b.log NO 0 2 ONLINE /redo3/oracledbworld/redo2/redo2a.log NO 0 2 ONLINE /redo1/oracledbworld/redo2/redo2b.log NO 0 3 ONLINE /redo3/oracledbworld/redo3/redo3a.log NO 0 3 ONLINE /redo1/oracledbworld/redo3/redo3b.log NO 0 1 INVALID ONLINE /redo2/oracledbworld/REDO/oracledbworld_redo1c.log NO 0 2 INVALID ONLINE /redo2/oracledbworld/REDO/oracledbworld_redo2c.log NO 0 3 INVALID ONLINE /redo2/oracledbworld/REDO/oracledbworld_redo3c.log NO 0 9 rows selected. SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------- ------------ ---------- ---------- 1 1 48832 104857600 512 3 YES INACTIVE 1.6982E+13 01-08-2022 1.6983E+13 01-08-2022 0 2 1 48833 104857600 512 3 YES INACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 3 1 48834 104857600 512 3 NO CURRENT 1.6983E+13 01-08-2022 2.8147E+14 0
Follow the basic steps of switching the logfile and dropping the redo member from the group as follows.
SQL> alter database drop logfile member '/redo2/oracledbworld/REDO/oracledbworld_redo1c.log'; Database altered. SQL> alter database drop logfile member '/redo2/oracledbworld/REDO/oracledbworld_redo2c.log'; Database altered. SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------- ------------ ---------- ---------- 1 1 48832 104857600 512 2 YES INACTIVE 1.6982E+13 01-08-2022 1.6983E+13 01-08-2022 0 2 1 48833 104857600 512 2 YES INACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 3 1 48834 104857600 512 3 NO CURRENT 1.6983E+13 01-08-2022 2.8147E+14 0 SQL> alter system switch logfile; System altered. SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------- ------------ ---------- ---------- 1 1 48835 104857600 512 2 NO CURRENT 1.6983E+13 01-08-2022 2.8147E+14 0 2 1 48833 104857600 512 2 YES INACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 3 1 48834 104857600 512 3 YES ACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 SQL> alter system checkpoint; System altered. SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------- ------------ ---------- ---------- 1 1 48835 104857600 512 2 NO CURRENT 1.6983E+13 01-08-2022 2.8147E+14 0 2 1 48833 104857600 512 2 YES INACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 3 1 48834 104857600 512 3 YES INACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 SQL> alter database drop logfile member '/redo2/oracledbworld/REDO/oracledbworld_redo3c.log'; Database altered. SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------- ------------ ---------- ---------- 1 1 48835 104857600 512 2 NO CURRENT 1.6983E+13 01-08-2022 2.8147E+14 0 2 1 48833 104857600 512 2 YES INACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 3 1 48834 104857600 512 2 YES INACTIVE 1.6983E+13 01-08-2022 1.6983E+13 01-08-2022 0 SQL> select * from v$logfile; GROUP# STATUS TYPE MEMBER IS_ CON_ID ---------- ------- ------- ------------------------------------------------------------ --- ---------- 1 ONLINE /redo1/oracledbworld/redo1/redo1a.log NO 0 1 ONLINE /redo3/oracledbworld/redo1/redo1b.log NO 0 2 ONLINE /redo3/oracledbworld/redo2/redo2a.log NO 0 2 ONLINE /redo1/oracledbworld/redo2/redo2b.log NO 0 3 ONLINE /redo3/oracledbworld/redo3/redo3a.log NO 0 3 ONLINE /redo1/oracledbworld/redo3/redo3b.log NO 0 6 rows selected. SQL>
And we released the system within 30 mins. 🙂