On Thu, Apr 9, 2020 at 1:11 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

This ^^, right here is the reason the VM paused. Are you using a plain distribute volume here?
Can you share some of the log messages that occur right above these errors?
Also, can you check if the file $VMSTORE_BRICKPATH/.glusterfs/d2/25/d22530cf-2e50-4059-8924-0aafe38497b1 exists on the brick?

-Krutika



Thanks for answering Krutika
To verify that sharding in some way was "involved" in the problem, I executed a new re-deploy of the 9 Openshift OCP servers, without indeed receiving any error.
While with sharding enable I received at least 3-4 errors every deployment run.
In particular I deleted the VM disks of the previous VMs to put them on a volume without sharding.
Right now the directory is so empty:

[root@ovirt ~]# ll -a /gluster_bricks/vmstore/vmstore/.glusterfs/d2/25/
total 8
drwx------.   2 root root    6 Apr  8 16:59 .
drwx------. 105 root root 8192 Apr  9 00:50 ..
[root@ovirt ~]#

Here you can find the entire log (in gzip format) from [2020-04-05 01:20:02.978429] to [2020-04-09 10:45:36.734079] of the vmstore volume

You will find same error at least in these timestamps below corresponding to engine webadmin events "unknown storage error", taking care that inside the log file the time is UTC, so you have to shift 2hours behind (03:27:28 PM in engine webadmin event corresponds to 13:27:28 in log file)

Apr 7, 2020, 3:27:28 PM

Apr 7, 2020, 4:38:55 PM

Apr 7, 2020, 5:31:02 PM

Apr 8, 2020, 8:52:49 AM

Apr 8, 2020, 12:05:17 PM

Apr 8, 2020, 3:11:10 PM

Apr 8, 2020, 3:20:30 PM

Apr 8, 2020, 3:26:54 PM

Thanks again, and I'm available to re-try on sharding enable volume after modifying anything, eventually
Gianluca

Hi Krutika,
did you have the opportunity to verify log content I sent and better understand the reason of sharding errors and possible solution?

Thanks,
Gianluca