On Sat, Mar 28, 2020 at 8:26 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
[snip]
Hey Nir,
> You are right ... This is just a theory based on my knowledge and it
might not be valid.
> We nees the libvirt logs to confirm or reject the theory, but I'm
convinced that is the reason.
>
> Yet, it's quite possible.
> Qemu tries to write to the qcow disk on gluster.
> Gluster is creating shards based of the ofset, as it was not done
initially (preallocated disk take the full size on gluster and all
shards are created immediately). This takes time and requires to be done
on all bricks.
> As the shard size is too small (default 64MB), gluster has to create
the next shard almost immediately, but if it can't do it as fast as qemu
is filling it's qcow2 disk
Gluster can block the I/O until it can write the data to a new shard.
There is no reason
to return an error unless a real error happened.
Also the VMs mentioned here are using raw disks, not qcow2:
[snip]
<target bus="scsi" dev="sda"/>
<source
file="/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6">
<seclabel model="dac" relabel="no"
type="none"/>
</source>
<driver cache="none" error_policy="stop"
io="threads"
name="qemu" type="raw"/>
[snip]
Note type="raw"
> - qemu will get an I/O error and we know what happens there.
> Later gluster manages to create the shard(s) , and the VM is unpaused.
>
> That's why the oVirt team made all gluster-based disks to be fully
preallocated.
Yes, in my disk definition I used default proposed.
Possibly I only chose virito-scsi (see the sda name): I don't remember in
4.3.9 and red hat core os as os type if virtio would be the default one or
not...
Gluster disks are thin (raw-sparse) by default just like any other
file based storage.
If this theory was correct, this would fail consistently on gluster:
1. create raw sparse image
truncate -s 100g /rhev/data-center/mnt/glusterSD/server:_path/test
2. Fill image quickly with data
dd if=/dev/zero bs=1M | tr "\0" "U" | dd
of=/rhev/data-center/mnt/glusterSD/server:_path/test bs=1M count=12800
iflag=fullblock oflag=direct conv=notrunc
According to your theory gluster will fail to allocate shards fast
enough and fail the I/O.
Nir
I can also try the commands above, just to see the behavior, and report
here.
As soon as I can connect to the system
Gianluca