[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

28 Mar 2020

      On Sat, Mar 28, 2020 at 9:47 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
...
On March 28, 2020 7:26:33 PM GMT+02:00, Nir Soffer <nsoffer@redhat.com> wrote:
...
On Sat, Mar 28, 2020 at 1:59 PM Strahil Nikolov <hunter86_bg@yahoo.com>
wrote:
...
On March 28, 2020 11:03:54 AM GMT+02:00, Gianluca Cecchi
<gianluca.cecchi@gmail.com> wrote:
...
...
On Sat, Mar 28, 2020 at 8:39 AM Strahil Nikolov
<hunter86_bg@yahoo.com>
wrote:
...
On March 28, 2020 3:21:45 AM GMT+02:00, Gianluca Cecchi <
gianluca.cecchi@gmail.com> wrote:
[snip]
...
Actually it only happened with empty disk (thin provisioned) and
sudden
...
high I/O during the initial phase of install of the OS; it didn't
happened
then during normal operaton (even with 600MB/s of throughput).
[snip]
...
Hi Gianluca,
Is it happening to machines with preallocated disks or on machines
with
thin disks ?
Best Regards,
Strahil Nikolov
thin provisioned. But as I have tro create many VMs with 120Gb of
disk
size
of which probably only a part during time will be allocated, it
would
be
unfeasible to make them all preallocated. I learned that thin is not
good
for block based storage domains and heavy I/O, but I would hope that
it
is
not the same with file based storage domains...
Thanks,
Gianluca
This is normal - gluster cannot allocate fast enough the needed
shards (due to high IO),  so the qemu pauses  the VM until  storage  is
available  again .
I don't know glusterfs internals, but I think this is very unlikely.
For block storage thin provisioning in vdsm, vdsm is responsible for
allocating
more space, but vdsm is not in the datapath, it is monitoring the
allocation and
allocate more data when free space reaches a limit. It has no way to
block I/O
before more space is available. Gluster is in the datapath and can
block I/O until
it can process it.
Can you explain what is the source for this theory?
...
You can think about VDO (with deduplication ) as a  PV for the  Thin
LVM and this way you can preallocate your VMs , while saving space
(deduplication, zero-block elimination  and even compression).
Of  course, VDO will reduce  performance (unless  you have
battery-backed write cache and compression is disabled),  but  tbe
benefits will be alot more.
Another approach is to increase the shard size - so gluster will
create fewer  shards,  but allocation on disk will be higher.
Best Regards,
Strahil Nikolov
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/77DYUF7A5D6BIA...

Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2LC5HGDMXJPOMV...
Hey Nir,
You are right ... This is just a theory based on my knowledge and it might not be valid.
We nees the libvirt logs to confirm or reject  the theory, but I'm convinced that is the reason.
Yet,  it's quite  possible.
Qemu tries to write to the qcow disk on gluster.
Gluster is creating shards based of the ofset, as it was not done initially (preallocated  disk  take the full size  on gluster  and all shards are created  immediately). This takes time and requires  to be done on all bricks.
As the shard size  is too small (default 64MB), gluster has to create the next shard almost immediately,  but if it can't do it as fast as qemu is filling it's qcow2  disk
Gluster can block the I/O until it can write the data to a new shard.
There is no reason
to return an error unless a real error happened.

Also the VMs mentioned here are using raw disks, not qcow2:

        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sda"/>
            <source
file="/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6">
                <seclabel model="dac" relabel="no" type="none"/>
            </source>
            <driver cache="none" error_policy="stop" io="threads"
name="qemu" type="raw"/>
            <alias name="ua-0a91c346-23a5-4432-8af7-ae0a28f9c208"/>
            <address bus="0" controller="0" target="0" type="drive" unit="0"/>
            <boot order="1"/>
            <serial>0a91c346-23a5-4432-8af7-ae0a28f9c208</serial>
        </disk>

Note type="raw"
...
-  qemu will get an I/O error and we know what happens there.
Later gluster manages to create the shard(s) , and the VM is unpaused.
That's why the oVirt team made all gluster-based disks to be fully preallocated.
Gluster disks are thin (raw-sparse) by default just like any other
file based storage.

If this theory was correct, this would fail consistently on gluster:

1. create raw sparse image

    truncate -s 100g /rhev/data-center/mnt/glusterSD/server:_path/test

2. Fill image quickly with data

    dd if=/dev/zero bs=1M | tr "\0" "U" | dd
of=/rhev/data-center/mnt/glusterSD/server:_path/test bs=1M count=12800
iflag=fullblock oflag=direct conv=notrunc

According to your theory gluster will fail to allocate shards fast
enough and fail the I/O.

Nir