Hello,
in environment in subject, I downloaded from glance repository the CentOS 8
image
CentOS 8 Generic Cloud Image v20200113.3 for x86_64 (5e35c84)
and imported as a template.
I created a vm based on it (I got the message: "In order to create a VM
from a template with a different chipset, device configuration will be
changed. This may affect functionality of the guest software. Are you sure
you want to proceed?")
When running "dnf update", during I/O of packages updates the VM went into
pause.
VM c8desktop started on Host
novirt2.example.net 5/28/201:29:04 PM
VM c8desktop has been paused. 5/28/201:41:52 PM
VM c8desktop has been paused due to unknown storage error. 5/28/201:41:52 PM
VM c8desktop has recovered from paused back to up. 5/28/201:43:50 PM
In messages of (nested) host I see:
May 28 13:28:06 novirt2 systemd-machined[1497]: New machine
qemu-7-c8desktop.
May 28 13:28:06 novirt2 systemd[1]: Started Virtual Machine
qemu-7-c8desktop.
May 28 13:28:06 novirt2 kvm[57798]: 2 guests now active
May 28 13:28:07 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:12 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:17 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:22 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:27 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:32 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:37 novirt2 journal[13368]: Domain id=7 name='c8desktop'
uuid=63e27cb5-087d-435e-bf61-3fe25e3319d
6 is tainted: custom-ga-command
May 28 13:28:37 novirt2 journal[26984]: Cannot open log file:
'/var/log/libvirt/qemu/c8desktop.log': Device o
r resource busy
May 28 13:28:37 novirt2 journal[13368]: Cannot open log file:
'/var/log/libvirt/qemu/c8desktop.log': Device o
r resource busy
May 28 13:28:37 novirt2 journal[13368]: Unable to open domainlog
May 28 13:30:00 novirt2 systemd[1]: Starting system activity accounting
tool...
May 28 13:30:00 novirt2 systemd[1]: Started system activity accounting tool.
May 28 13:37:21 novirt2 python3[62512]: detected unhandled Python exception
in '/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:21 novirt2 abrt-server[62514]: Deleting problem directory
Python3-2020-05-28-13:37:21-62512 (dup of Python3-2020-05-28-10:32:57-29697)
May 28 13:37:21 novirt2 dbus-daemon[1502]: [system] Activating service
name='org.freedesktop.problems' requested by ':1.3111' (uid=0 pid=62522
comm="/usr/libexec/platform-python /usr/bin/abrt-action-"
label="system_u:system_r:abrt_t:s0-s0:c0.c1023") (using servicehelper)
May 28 13:37:21 novirt2 dbus-daemon[1502]: [system] Successfully activated
service 'org.freedesktop.problems'
May 28 13:37:21 novirt2 abrt-server[62514]: /bin/sh:
reporter-systemd-journal: command not found
May 28 13:37:21 novirt2 python3[62550]: detected unhandled Python exception
in '/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:21 novirt2 abrt-server[62552]: Not saving repeating crash in
'/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:22 novirt2 python3[62578]: detected unhandled Python exception
in '/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:22 novirt2 abrt-server[62584]: Not saving repeating crash in
'/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:40:00 novirt2 systemd[1]: Starting system activity accounting
tool...
May 28 13:40:00 novirt2 systemd[1]: Started system activity accounting tool.
On the related gluster volume log where vm disk is I have:
[2020-05-28 11:41:33.892074] W [MSGID: 114031]
[client-rpc-fops_v2.c:679:client4_0_writev_cbk] 0-vmstore-client-0: remote
operation failed [Invalid argument]
[2020-05-28 11:41:33.892140] W [fuse-bridge.c:2925:fuse_writev_cbk]
0-glusterfs-fuse: 348168: WRITE => -1
gfid=35ae86e8-0ccd-48b8-9ef2-6ca9a108ccf9 fd=0x7fd1d800cf38 (Invalid
argument)
[2020-05-28 11:41:33.902984] I [MSGID: 133022]
[shard.c:3693:shard_delete_shards] 0-vmstore-shard: Deleted shards of
gfid=35ae86e8-0ccd-48b8-9ef2-6ca9a108ccf9 from backend
[2020-05-28 11:41:52.434362] E [MSGID: 133010]
[shard.c:2339:shard_common_lookup_shards_cbk] 0-vmstore-shard: Lookup on
shard 6 failed. Base file gfid = 3e12e7fe-6a77-41b8-932a-d4f50c41ac00 [No
such file or directory]
[2020-05-28 11:41:52.434423] W [fuse-bridge.c:2925:fuse_writev_cbk]
0-glusterfs-fuse: 353565: WRITE => -1
gfid=3e12e7fe-6a77-41b8-932a-d4f50c41ac00 fd=0x7fd208093fb8 (No such file
or directory)
[2020-05-28 11:46:34.095697] W [MSGID: 114031]
[client-rpc-fops_v2.c:679:client4_0_writev_cbk] 0-vmstore-client-0: remote
operation failed [Invalid argument]
[2020-05-28 11:46:34.095758] W [fuse-bridge.c:2925:fuse_writev_cbk]
0-glusterfs-fuse: 384006: WRITE => -1
gfid=7b804a1a-1734-4bec-b8f4-9ba33ffefe8b fd=0x7fd1d0005fd8 (Invalid
argument)
[2020-05-28 11:46:34.104494] I [MSGID: 133022]
[shard.c:3693:shard_delete_shards] 0-vmstore-shard: Deleted shards of
gfid=7b804a1a-1734-4bec-b8f4-9ba33ffefe8b from backend
Very similar sharding message when in 4.3.9 in single host with gluster and
"heavy"/sudden I/O operations when using thin provisioned disks.....
I see in 4.4 Gluster is glusterfs-7.5-1.el8.x86_64
Can this be a problem that only appears in single host as there are indeed
no data travellling to the network to sync nodes and sharding feature for
some reason is not able to keep pace, when local disk is very fast?
In 4.3.9 single host with gluster 6.8-1 my only way to solve was to disable
sharding.. and I got final stability see here
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/OIN4R63I6ITO...
Still waiting for Gluster devs comments on logs provided at that time
As I already wrote, in my opinion the single host wizard should set
sharding off in automatic, because in that environment can make thin
provisioned disks unusable.
In case of future additions of nodes, the setup can make a check and say
the user that he/she should re-enable sharding....
Just my 0.2 eurocent
Gianluca