For quite some time we have been treated nicely by ASM when it comes to transient disk failures. Since 11.1 (if memory serves me right), transient failures won’t cause an ASM disk to be dropped immediately. This is good, because it can potentially save a lot of time! When a disk is dropped from an ASM disk, a rebalance operation is unavoidable. And there is more than a chance of another one following it, because ultimately you want to add the disk back to the disk group. Well, to be fair, this is only applicable for normal or high redundancy disk groups.
The feature I just described very briefly is referred to as ASM Fast Mirror Resync, and documented in the ASM Administrator’s Guide in section Administering Oracle ASM Disk Groups.
By the way, there is another new feature you might want to look at, called FAILGROUP_REPAIR_TIME, but I’ll keep that for another post.
In this post I’d like to demonstrate how easy it is to simulate a transient failure of an ASM disk using libvirt and KVM on a host running Oracle Linux 7.2. I also wanted to have the output from the O/S and ASM alert.log as personal reference.
As usual, this procedure is strictly for YOUR HOME LAB USE ONLY and for educational purposes.
The setup
When creating the VM – named “asmtest” – I created a boot volume, and a 50 GB volume for the Oracle binaries. To keep it manageable I went for an Oracle 12.1 Restart configuration, patched to the July PSU. There are 9 ASM disks, presented as SCSI devices (not virtio), I’ll explain why in a bit. The guest O/S is Oracle Linux 7.2, as is my lab server. All virtual disks on the VM are partitioned although that is not necessary (strictly speaking).
Using the lsscsi tool (part of the sg3_utils), I created udev rules following the excellent description I found on oracle-base.com.
[oracle@asmtest ~]$ cat /etc/udev/rules.d/61-asm.rules KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0", SYMLINK+="asm-disk01", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1", SYMLINK+="asm-disk02", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2", SYMLINK+="asm-disk03", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3", SYMLINK+="asm-disk04", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0", SYMLINK+="asm-disk05", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1", SYMLINK+="asm-disk06", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2", SYMLINK+="asm-disk07", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3", SYMLINK+="asm-disk08", OWNER="oracle", GROUP="asmdba", MODE="0660" KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0", SYMLINK+="asm-disk09", OWNER="oracle", GROUP="asmdba", MODE="0660"
This is slightly more complicated than my usual setup, but was needed for device name persistence. Unfortunately libvirt does not report the same block device names as my guest VM. Whilst the host “sees” sd[h-v] my guest reported these devices as sd[a-i]. Persistent device names to the rescue! The only issue I had initially was that disks presented to the VM via virtio don’t report any SCSI IDs I need for my udev rules, or in other words, lsscsi returned no data. Presenting the block devices via the SCSI interface did the trick.
The hypothesis
This snippet from the documentation (referenced above) is exactly what I wanted to test:
After you repair the disk, run the SQL statement ALTER DISKGROUP ONLINE DISK. This statement brings a repaired disk group back online to enable writes so that no new writes are missed. This statement also starts a procedure to copy of all of the extents that are marked as stale on their redundant copies.
Here is the current configuration I am using for my DATA disk group, as seen by ASM:
SQL> select name,state,type,offline_disks from v$asm_diskgroup 2 where name = 'DATA'; NAME STATE TYPE OFFLINE_DISKS ------------------------------ ----------- ------ ------------- DATA MOUNTED NORMAL 0 SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer 2 from v$asm_disk where group_number = 1 order by disk_number; NAME DISK_NUMBER PATH MOUNT_S HEADER_STATU MODE_ST STATE FAILGROUP REPAIR_TIMER --------------- ----------- --------------- ------- ------------ ------- -------- ---------------- ------------ DATA_0000 0 /dev/asm-disk02 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0001 1 /dev/asm-disk01 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0002 2 /dev/asm-disk03 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0003 3 /dev/asm-disk06 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0004 4 /dev/asm-disk07 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0005 5 /dev/asm-disk08 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0006 6 /dev/asm-disk04 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0007 7 /dev/asm-disk05 CACHED MEMBER ONLINE NORMAL BATCH2 0
You can see 2 failure groups, batch1 and batch2, and all disks are available/online. The operating system mapping works perfectly thanks to my udev rules:
[oracle@asmtest ~]$ ls -l /dev/asm-disk0* lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk01 -> sda1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk02 -> sdd1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk03 -> sdc1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk04 -> sdb1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk05 -> sde1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk06 -> sdh1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk07 -> sdg1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk08 -> sdf1 lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk09 -> sdi1
Let’s detach a disk and cause a transient failure
Libvirt is really cool, it allows me to modify my VM configuration at runtime. One example is removing of disks, using the detach-disk command:
[root@host ~]# virsh help detach-disk
NAME
detach-disk - detach disk device
SYNOPSIS
detach-disk <domain> <target> [--persistent] [--config] [--live] [--current]
DESCRIPTION
Detach disk device.
OPTIONS
[--domain] <string> domain name, id or uuid
[--target] <string> target of disk device
--persistent make live change persistent
--config affect next boot
--live affect running domain
--current affect current domain
But which disk to remove? The disks attached to a VM can be listed using the domblklist command:
[root@host ~]# virsh domblklist asmtest Target Source ------------------------------------------------ vda /path/to/slow/disk/asmtest vdb /path/to/slow/disk/asmtest-1 sdh /path/to/fast/disk/asmtest-asm01.img sdi /path/to/fast/disk/asmtest-asm02.img sdj /path/to/fast/disk/asmtest-asm03.img sdk /path/to/fast/disk/asmtest-asm04.img sdo /path/to/fast/disk/asmtest-asm05.img sdp /path/to/fast/disk/asmtest-asm06.img sdq /path/to/fast/disk/asmtest-asm07.img sdr /path/to/fast/disk/asmtest-asm08.img sdv /path/to/fast/disk/asmtest-asm09.img
I mentioned previously that the information provided by the host does not necessarily match that of the guest. Since I have persistent device names though in my VM there shouldn’t be an issue. Let’s take a disk away, but only temporarily! Since my O/S and Oracle binaries reside on disks presented to the guest as virtio devices, I know they are named vd[a-z] and better left alone. That leaves sdh as the first disk to remove.
[root@host ~]# virsh detach-disk asmtest sdh --live Disk detached successfully
And the disk is gone.
And what is happening?
The disk removal has immediate effect on the ASM instance. Switching over to my VM I can see the following information recorded in the ASM alert.log:
2016-09-29 13:45:13.936000 +01:00 WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096 path:/dev/asm-disk01 incarnation:0xe969ccd4 asynchronous result:'I/O error' subsys:System krq:0x7fda4f0b8210 bufp:0x7fda4f20b000 osderr1:0x69b5 osderr2:0x0 IO elapsed time: 0 usec Time waited on I/O: 0 usec WARNING: Hbeat write to PST disk 1.3916025044 in group 1 failed. [2] NOTE: process _b000_+asm (4346) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting NOTE: checking PST: grp = 1 GMON checking disk modes for group 1 at 13 for pid 22, osid 4346 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: checking PST for grp 1 done. NOTE: sending set offline flag message (844741558) to 1 disk(s) in group 1 WARNING: Disk 1 (DATA_0001) in group 1 mode 0x7f is now being offlined NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear GMON updating disk modes for group 1 at 14 for pid 22, osid 4346 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: PST update grp = 1 completed successfully NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear GMON updating disk modes for group 1 at 15 for pid 22, osid 4346 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: cache closing disk 1 of grp 1: DATA_0001 NOTE: PST update grp = 1 completed successfully 2016-09-29 13:46:53.300000 +01:00 WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600 WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
I also had a journalctl -f running in a different session. Unsurprisingly the O/S also noticed the disk disappearing:
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Sep 29 13:45:13 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Sep 29 13:45:13 asmtest kernel: Sense Key : Illegal Request [current] Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Sep 29 13:45:13 asmtest kernel: Add. Sense: Logical unit not supported
A transient failure
In yet another session I can see that the disk is about to go, the repair timer has started ticking down, and is decremented as time passes. This did not happen straight away, there was a little delay of maybe a couple of minutes after the disk was removed.
SQL> r 1 select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer 2* from v$asm_disk where group_number = 1 order by disk_number NAME DISK_NUMBER PATH MOUNT_S HEADER_STATU MODE_ST STATE FAILGROUP REPAIR_TIMER --------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------ DATA_0000 0 /dev/asm-disk02 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0001 1 MISSING UNKNOWN OFFLINE NORMAL BATCH1 600 DATA_0002 2 /dev/asm-disk03 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0003 3 /dev/asm-disk06 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0004 4 /dev/asm-disk07 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0005 5 /dev/asm-disk08 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0006 6 /dev/asm-disk04 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0007 7 /dev/asm-disk05 CACHED MEMBER ONLINE NORMAL BATCH2 0 8 rows selected. SQL> r 1 select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer 2* from v$asm_disk where group_number = 1 order by disk_number NAME DISK_NUMBER PATH MOUNT_S HEADER_STATU MODE_ST STATE FAILGROUP REPAIR_TIMER --------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------ DATA_0000 0 /dev/asm-disk02 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0001 1 MISSING UNKNOWN OFFLINE NORMAL BATCH1 416 DATA_0002 2 /dev/asm-disk03 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0003 3 /dev/asm-disk06 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0004 4 /dev/asm-disk07 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0005 5 /dev/asm-disk08 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0006 6 /dev/asm-disk04 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0007 7 /dev/asm-disk05 CACHED MEMBER ONLINE NORMAL BATCH2 0 8 rows selected.
Another day I would like to run an additional test to see what happens when the repair_timer is actually down to 0, but for now let’s add the disk back in. On the host, I use libvirt again for this:
[root@host ~]# virsh attach-disk asmtest /path/to/fast/disk/asmtest-asm01.img sdh --live Disk attached successfully
As soon as the prompt returns, I can see the guest O/S picking the disk up.
Sep 29 13:54:09 asmtest kernel: scsi 2:0:0:0: Direct-Access QEMU QEMU HARDDISK 1.5. PQ: 0 ANSI: 5 Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0 Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB) Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write Protect is off Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08 Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 29 13:54:09 asmtest kernel: sda: sda1 Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Attached SCSI disk
The documentation states that I should now be able to online the disk again, which I tried. The result is shown below:
SQL> alter diskgroup data online all; Diskgroup altered.
I am showing you the complete ASM alert.log output here, feel free to skip to the bottom:
2016-09-29 13:54:44.774000 +01:00 SQL> alter diskgroup data online all NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001 NOTE: initiating resync of disk group 1 disks DATA_0001 (1) NOTE: process _user3750_+asm (3750) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting NOTE: sending set offline flag message (4172832432) to 1 disk(s) in group 1 WARNING: Disk 1 (DATA_0001) in group 1 mode 0x1 is now being offlined NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear GMON updating disk modes for group 1 at 17 for pid 20, osid 3750 NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001 NOTE: PST update grp = 1 completed successfully NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear GMON updating disk modes for group 1 at 18 for pid 20, osid 3750 NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001 NOTE: PST update grp = 1 completed successfully NOTE: requesting all-instance membership refresh for group=1 NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x11, op = assign GMON updating disk modes for group 1 at 19 for pid 20, osid 3750 NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: PST update grp = 1 completed successfully NOTE: requesting all-instance disk validation for group=1 NOTE: disk validation pending for 1 disk in group 1/0xceb93c35 (DATA) NOTE: Found /dev/asm-disk01 for disk DATA_0001 NOTE: completed disk validation for 1/0xceb93c35 (DATA) NOTE: discarding redo for group 1 disk 1 NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x19, op = assign GMON updating disk modes for group 1 at 20 for pid 20, osid 3750 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: PST update grp = 1 completed successfully NOTE: membership refresh pending for group 1/0xceb93c35 (DATA) GMON querying group 1 at 21 for pid 16, osid 2535 NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/asm-disk01 SUCCESS: refreshed membership for 1/0xceb93c35 (DATA) NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x5d, op = assign GMON updating disk modes for group 1 at 22 for pid 20, osid 3750 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: PST update grp = 1 completed successfully NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7d, op = assign GMON updating disk modes for group 1 at 23 for pid 20, osid 3750 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: PST update grp = 1 completed successfully SUCCESS: alter diskgroup data online all NOTE: Attempting voting file refresh on diskgroup DATA NOTE: starting rebalance of group 1/0xceb93c35 (DATA) at power 1 Starting background process ARB0 ARB0 started with pid=23, OS id=4986 NOTE: assigning ARB0 to group 1/0xceb93c35 (DATA) with 1 parallel I/O 2016-09-29 13:54:48.265000 +01:00 NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7f, op = assign GMON updating disk modes for group 1 at 24 for pid 23, osid 4986 NOTE: group DATA: updated PST location: disk 0003 (PST copy 0) NOTE: group DATA: updated PST location: disk 0000 (PST copy 1) NOTE: PST update grp = 1 completed successfully NOTE: reset timers for disk: 1 NOTE: completed online of disk group 1 disks DATA_0001 (1) 2016-09-29 13:54:49.580000 +01:00 NOTE: stopping process ARB0 NOTE: requesting all-instance membership refresh for group=1 SUCCESS: rebalance completed for group 1/0xceb93c35 (DATA) NOTE: membership refresh pending for group 1/0xceb93c35 (DATA) GMON querying group 1 at 25 for pid 16, osid 2535 SUCCESS: refreshed membership for 1/0xceb93c35 (DATA) 2016-09-29 13:54:51.286000 +01:00 NOTE: Attempting voting file refresh on diskgroup DATA
This looks promising, but what about V$ASM_DISK?
SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer 2 from v$asm_disk where group_number = 1 order by disk_number 3 / NAME DISK_NUMBER PATH MOUNT_S HEADER_STATU MODE_ST STATE FAILGROUP REPAIR_TIMER --------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------ DATA_0000 0 /dev/asm-disk02 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0001 1 /dev/asm-disk01 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0002 2 /dev/asm-disk03 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0003 3 /dev/asm-disk06 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0004 4 /dev/asm-disk07 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0005 5 /dev/asm-disk08 CACHED MEMBER ONLINE NORMAL BATCH2 0 DATA_0006 6 /dev/asm-disk04 CACHED MEMBER ONLINE NORMAL BATCH1 0 DATA_0007 7 /dev/asm-disk05 CACHED MEMBER ONLINE NORMAL BATCH2 0 8 rows selected. SQL>
Hurray, I have my disk back! And at no time did I see a rebalance operation. Thanks ASM!
Response
[…] « Little things worth knowing: transient ASM disk failures in 12c […]