On Thu, Oct 10, 2019 at 1:10 PM Francesco Romani <fromani@redhat.com> wrote:
On 10/10/19 10:44 AM, Gianluca Cecchi wrote:
On Thu, Oct 10, 2019 at 9:56 AM Francesco Romani <fromani@redhat.com> wrote:

The only way Vdsm will not pause the VM is if libvirt+qemu never reports any ioerror, which is something I'm not sure is possible and that I'd never recommend anyway.

Vdsm always tries hard to be super-careful with respect possible data corruption.


OK.
In case of storage not accessible for a bunch of seconds is more a matter of I/O blocked than data corruption.


True, but we can know only ex-poste that the storage was just temporarily unavailable, don't we?


yes but I would like to have an option to say: don't do anything for X seconds, both at host level and guest level.
X could be 5 seconds, or 10 seconds or 20 seconds.... according to several needs.
 

If no other host powers on the VM I think there is no risk of data corruption itself, or at least no more than when you have a physical server and for some reason the I/O operations to its physical disks (local or on a SAN) are blocked for some tens of seconds.


IMO, a storage unresponsive for tens of seconds is something which should be uncommon and very alarming in every circumstances, especially for physical servers.

What i'm trying to say is that yes, there probabily are ways to sidestep this behaviour, but I think this is the wrong direction and adds fragility rather than convenience to the system.

In general I agree with you on this 

 


So I think that if I want in any way to modify behavior I have to change the options so that I keep "report" for both write and read errors on virtual disks.


Yep. I don't remember what Engine allows. Worst case you can use an hook, but once again this is making things a bit more fragile.


I'm only experimenting to see possible different options to manage "temporary" problems at storage level, that often resolve without manual actions in tens of seconds, sometimes due to uncorrect operations at levels managed by other teams (network, storage, ecc).


I think the best option is improve the current behaviour: learn why Vdsm fails to unpause the VM and improve here.



yes, I'm just experimenting on possible options and their pros & cons

I see that on my 4.3.6 environment with plain CentOS 7.7 hosts the qemu-kvm process is spawned with "werror=stop,rerror=stop" for all virtual disks
I didn't find any related option in VM edit page

In my Fedora 30 when I start a VM (with virt-manager or "virsh start") I see that the options are not present in command line and based on qemu-kvm manual page:
"
The default setting is werror=enospc and rerror=report
"
 
In the mean time I created a wrapper script for qemu-kvm that changes command line

1) 
from werror=stop to werror=report
and
from rerror=stop to rerror=report

This seems worse, in the sense that the VM is not paused at all, as expected, but strange behavior inside it
From host point of view:
[root@ov300 ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 7     mydbsrv                        running

I suddenly get in VM /var/log/messsages something like

Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] Sense Key : Aborted Command [current]
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] Add. Sense: I/O process terminated
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] CDB: Write(10) 2a 00 03 07 a8 78 00 00 08 00
Oct 10 12:42:55 mydbsrv kernel: blk_update_request: I/O error, dev sdc, sector 50833528
Oct 10 12:42:55 mydbsrv kernel: EXT4-fs warning (device dm-3): ext4_end_bio:322: I/O error -5 writing to inode 1573304 (offset 0 size 0 starting block 6353935)
Oct 10 12:42:55 mydbsrv kernel: Buffer I/O error on device dm-3, logical block 6353935
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] Sense Key : Aborted Command [current]
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] Add. Sense: I/O process terminated
Oct 10 12:42:55 mydbsrv kernel: sd 2:0:0:1: [sdc] CDB: Write(10) 2a 00 03 07 a8 98 00 00 08 00
Oct 10 12:42:55 mydbsrv kernel: blk_update_request: I/O error, dev sdc, sector 50833560
Oct 10 12:42:55 mydbsrv kernel: EXT4-fs warning (device dm-3): ext4_end_bio:322: I/O error -5 writing to inode 1573308 (offset 0 size 0 starting block 6353939)
...

and only shell builtin commands apparently working inside VM making necessary anyway a power off (from engine) and power on  

[root@mydbsrv ~]# uptime
-bash: uptime: command not found
[root@mydbsrv ~]# df -h
-bash: df: command not found
[root@mydbsrv ~]# id
-bash: id: command not found
[root@mydbsrv ~]# ll
-bash: ls: command not found
[root@mydbsrv ~]#
[root@mydbsrv ~]# jobs
[root@mydbsrv ~]# ps
-bash: ps: command not found
[root@mydbsrv ~]# sync
-bash: sync: command not found
[root@mydbsrv ~]# pwd
/root
[root@mydbsrv ~]# ls
-bash: ls: command not found
[root@mydbsrv ~]# /bin/ls
-bash: /bin/ls: Input/output error
[root@mydbsrv ~]# type mount
-bash: type: mount: not found
[root@mydbsrv ~]# /bin/mount -o remount,rw /myfs
-bash: /bin/mount: Input/output error
[root@mydbsrv ~]# tail /var/log/messages
-bash: tail: command not found
[root@mydbsrv ~]# cat /var/log/messages
-bash: cat: command not found
[root@mydbsrv ~]# echo some_word
some_word
[root@mydbsrv ~]# 

even after storage accessible again the wrong behavior continues.

2) 
from werror=stop to werror=ignore
and
from rerror=stop to rerror=ignore

This is the more non-intrusive approach in respect of VM in my opinion, letting it discover itself there is a problem.
In this case I get inside /var/log/messages of VM

Oct 10 12:54:00 mydbsrv chronyd[4133]: Selected source 131.175.12.3
Oct 10 12:54:00 mydbsrv chronyd[4133]: System clock wrong by 1.065994 seconds, adjustment started
Oct 10 12:54:00 mydbsrv systemd: Time has been changed
Oct 10 12:54:00 mydbsrv chronyd[4133]: System clock was stepped by 1.065994 seconds
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 289, block bitmap and bg descriptor inconsistent: 30748 vs 28302 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 290, block bitmap and bg descriptor inconsistent: 31999 vs 32768 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 291, block bitmap and bg descriptor inconsistent: 28880 vs 32768 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 292, block bitmap and bg descriptor inconsistent: 32478 vs 32768 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 293, block bitmap and bg descriptor inconsistent: 32698 vs 32768 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 294, block bitmap and bg descriptor inconsistent: 32151 vs 32768 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 295, block bitmap and bg descriptor inconsistent: 31925 vs 32768 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 296, block bitmap and bg descriptor inconsistent: 32468 vs 32768 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 0, block bitmap and bg descriptor inconsistent: 32768 vs 1496 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 1, block bitmap and bg descriptor inconsistent: 32768 vs 279 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_mb_generate_buddy:757: group 112, block bitmap and bg descriptor inconsistent: 32767 vs 23733 free clusters
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_m000_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_m002_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:23 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_m005_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:27 mydbsrv kernel: JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Oct 10 12:54:28 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_cjq0_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:31 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_pmon_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:49 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_reco_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:49 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_tmon_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:56 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_dia0_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:54:58 mydbsrv kernel: EXT4-fs error: 49 callbacks suppressed
Oct 10 12:54:58 mydbsrv kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy:757: group 192, block bitmap and bg descriptor inconsistent: 32379 vs 24527 free clusters
Oct 10 12:54:58 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_tt00_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:07 mydbsrv chronyd[4133]: Selected source 131.175.12.6
Oct 10 12:55:21 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_gen1_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:22 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_mman_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:22 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_psp0_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:22 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_dbrm_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:22 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_pxmn_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:22 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_smco_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:22 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_mmnl_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:26 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_lgwr_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:26 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_gen0_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
Oct 10 12:55:26 mydbsrv kernel: EXT4-fs error (device dm-2): ext4_find_dest_de:1829: inode #928680: block 3679007: comm ora_mmon_test: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0

and then when the 100 seconds of artificial storage access inhibition ends, the VM is able again to be accessible.

[root@mydbsrv ~]# id
uid=0(root) gid=0(root) groups=0(root)
[root@mydbsrv ~]# uptime
 12:56:36 up 3 min,  1 user,  load average: 0.28, 0.38, 0.17
[root@mydbsrv ~]# 

[root@mydbsrv log]# time dd if=/dev/zero bs=1024k count=10240 of=/myfs/testfile
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 60.785 s, 177 MB/s

real 1m1.771s
user 0m0.016s
sys 0m10.459s
[root@mydbsrv log]# 

Obviously, as mentioned in messages, this behavior could potentially lead to fs/journal corruption...
Gianluca