Hello everyone, I have a cluster with 4 oVirt Node and a self hosted engine, fiber channel
storage for storage volumes and for boot in san of oVirt Node itself (nodes doesn't
have physical disks but boots from SAN). We made a software update of SAN storage and one
of the multi path links went down for some seconds before coming up again. No issues with
VMs but oVirt Node marked as readonly a couple of mount points:
/dev/mapper/onn-var on /var type ext4
(ro,relatime,seclabel,discard,stripe=16,data=ordered)
/dev/mapper/onn-var_log on /var/log type ext4
(ro,relatime,seclabel,discard,stripe=16,data=ordered)
Every other mount point was r/w but not /var... We had to reboot every single node to
restore r/w mounts. These are the last logs written before read only:
Mar 4 11:15:59 ovirt-node1 kernel: sd 1:0:0:4: [sds] FAILED Result: hostbyte=DID_ERROR
driverbyte=DRIVER_OK
Mar 4 11:15:59 ovirt-node1 kernel: sd 1:0:0:4: [sds] CDB: Read(16) 88 00 00 00 00 01 64
5f 9a 18 00 00 00 20 00 00
Mar 4 11:15:59 ovirt-node1 kernel: blk_update_request: I/O error, dev sds, sector
5978954264
Mar 4 11:15:59 ovirt-node1 kernel: device-mapper: multipath: Failing path 65:32.
Mar 4 11:15:59 ovirt-node1 multipathd: sds: mark as failed
Mar 4 11:15:59 ovirt-node1 multipathd: 36001738c7c8069bb00000000000135eb: remaining
active paths: 3
Mar 4 11:16:11 ovirt-node1 sanlock[10535]: 2021-03-04 11:16:11 28402637 [13668]: s4
delta_renew read timeout 10 sec offset 0 /dev/58b54b5f-993b-4710-b107-7744018d22b
And this is multipath -ll output:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX dm-2 IBM ,XXXXXX
size=237G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 0:0:0:2 sdc 8:32 active ready running
|- 1:0:0:2 sdg 8:96 active ready running
|- 1:0:1:2 sdn 8:208 active ready running
`- 0:0:1:2 sdr 65:16 active ready running
Any idea on what happed and how to prevent it? Thank you in advice for you help.
Show replies by date