On Sat, Mar 28, 2020 at 8:26 PM Nir Soffer <nsoffer@redhat.com> wrote:

[snip]

> Hey Nir,
> You are right ... This is just a theory based on my knowledge and it might not be valid.
> We nees the libvirt logs to confirm or reject  the theory, but I'm convinced that is the reason.
>
> Yet,  it's quite  possible.
> Qemu tries to write to the qcow disk on gluster.
> Gluster is creating shards based of the ofset, as it was not done initially (preallocated  disk  take the full size  on gluster  and all shards are created  immediately). This takes time and requires  to be done on all bricks.
> As the shard size  is too small (default 64MB), gluster has to create the next shard almost immediately,  but if it can't do it as fast as qemu is filling it's qcow2  disk

Gluster can block the I/O until it can write the data to a new shard.
There is no reason
to return an error unless a real error happened.

Also the VMs mentioned here are using raw disks, not qcow2:

[snip]
            <target bus="scsi" dev="sda"/>
            <source
file="/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6">
                <seclabel model="dac" relabel="no" type="none"/>
            </source>
            <driver cache="none" error_policy="stop" io="threads"
name="qemu" type="raw"/>

[snip]

Note type="raw"

>  -  qemu will get an I/O error and we know what happens there.
> Later gluster manages to create the shard(s) , and the VM is unpaused.
>
> That's why the oVirt team made all gluster-based disks to be fully preallocated.

Yes, in my disk definition I used default proposed.
Possibly I only chose virito-scsi (see the sda name): I don't remember in 4.3.9 and red hat core os as os type if virtio would be the default one or not...


Gluster disks are thin (raw-sparse) by default just like any other
file based storage.

If this theory was correct, this would fail consistently on gluster:

1. create raw sparse image

    truncate -s 100g /rhev/data-center/mnt/glusterSD/server:_path/test

2. Fill image quickly with data

    dd if=/dev/zero bs=1M | tr "\0" "U" | dd
of=/rhev/data-center/mnt/glusterSD/server:_path/test bs=1M count=12800
iflag=fullblock oflag=direct conv=notrunc

According to your theory gluster will fail to allocate shards fast
enough and fail the I/O.

Nir

I can also try the commands above, just to see the behavior, and report here.
As soon as I can connect to the system

Gianluca