On Sat, Mar 28, 2020 at 9:47 PM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
On March 28, 2020 7:26:33 PM GMT+02:00, Nir Soffer <nsoffer(a)redhat.com> wrote:
>On Sat, Mar 28, 2020 at 1:59 PM Strahil Nikolov <hunter86_bg(a)yahoo.com>
>wrote:
>>
>> On March 28, 2020 11:03:54 AM GMT+02:00, Gianluca Cecchi
><gianluca.cecchi(a)gmail.com> wrote:
>> >On Sat, Mar 28, 2020 at 8:39 AM Strahil Nikolov
><hunter86_bg(a)yahoo.com>
>> >wrote:
>> >
>> >> On March 28, 2020 3:21:45 AM GMT+02:00, Gianluca Cecchi <
>> >> gianluca.cecchi(a)gmail.com> wrote:
>> >>
>> >>
>> >[snip]
>> >
>> >>Actually it only happened with empty disk (thin provisioned) and
>> >sudden
>> >> >high I/O during the initial phase of install of the OS; it
didn't
>> >> >happened
>> >> >then during normal operaton (even with 600MB/s of throughput).
>> >>
>> >
>> >[snip]
>> >
>> >
>> >> Hi Gianluca,
>> >>
>> >> Is it happening to machines with preallocated disks or on machines
>> >with
>> >> thin disks ?
>> >>
>> >> Best Regards,
>> >> Strahil Nikolov
>> >>
>> >
>> >thin provisioned. But as I have tro create many VMs with 120Gb of
>disk
>> >size
>> >of which probably only a part during time will be allocated, it
>would
>> >be
>> >unfeasible to make them all preallocated. I learned that thin is not
>> >good
>> >for block based storage domains and heavy I/O, but I would hope that
>it
>> >is
>> >not the same with file based storage domains...
>> >Thanks,
>> >Gianluca
>>
>> This is normal - gluster cannot allocate fast enough the needed
>shards (due to high IO), so the qemu pauses the VM until storage is
>available again .
>
>I don't know glusterfs internals, but I think this is very unlikely.
>
>For block storage thin provisioning in vdsm, vdsm is responsible for
>allocating
>more space, but vdsm is not in the datapath, it is monitoring the
>allocation and
>allocate more data when free space reaches a limit. It has no way to
>block I/O
>before more space is available. Gluster is in the datapath and can
>block I/O until
>it can process it.
>
>Can you explain what is the source for this theory?
>
>> You can think about VDO (with deduplication ) as a PV for the Thin
>LVM and this way you can preallocate your VMs , while saving space
>(deduplication, zero-block elimination and even compression).
>> Of course, VDO will reduce performance (unless you have
>battery-backed write cache and compression is disabled), but tbe
>benefits will be alot more.
>>
>> Another approach is to increase the shard size - so gluster will
>create fewer shards, but allocation on disk will be higher.
>>
>> Best Regards,
>> Strahil Nikolov
>> _______________________________________________
>> Users mailing list -- users(a)ovirt.org
>> To unsubscribe send an email to users-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>https://lists.ovirt.org/archives/list/users@ovirt.org/message/77DYUF7A5D6BIAYGVCBDKRBX2YWWJDJ4/
>_______________________________________________
>Users mailing list -- users(a)ovirt.org
>To unsubscribe send an email to users-leave(a)ovirt.org
>Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>oVirt Code of Conduct:
>https://www.ovirt.org/community/about/community-guidelines/
>List Archives:
>https://lists.ovirt.org/archives/list/users@ovirt.org/message/2LC5HGDMXJPOMVIYABLM77BRWG6LYOZJ/
Hey Nir,
You are right ... This is just a theory based on my knowledge and it might not be valid.
We nees the libvirt logs to confirm or reject the theory, but I'm convinced that is
the reason.
Yet, it's quite possible.
Qemu tries to write to the qcow disk on gluster.
Gluster is creating shards based of the ofset, as it was not done initially (preallocated
disk take the full size on gluster and all shards are created immediately). This
takes time and requires to be done on all bricks.
As the shard size is too small (default 64MB), gluster has to create the next shard
almost immediately, but if it can't do it as fast as qemu is filling it's qcow2
disk
Gluster can block the I/O until it can write the data to a new shard.
There is no reason
to return an error unless a real error happened.
Also the VMs mentioned here are using raw disks, not qcow2:
<disk device="disk" snapshot="no"
type="file">
<target bus="scsi" dev="sda"/>
<source
file="/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6">
<seclabel model="dac" relabel="no"
type="none"/>
</source>
<driver cache="none" error_policy="stop"
io="threads"
name="qemu" type="raw"/>
<alias name="ua-0a91c346-23a5-4432-8af7-ae0a28f9c208"/>
<address bus="0" controller="0" target="0"
type="drive" unit="0"/>
<boot order="1"/>
<serial>0a91c346-23a5-4432-8af7-ae0a28f9c208</serial>
</disk>
Note type="raw"
- qemu will get an I/O error and we know what happens there.
Later gluster manages to create the shard(s) , and the VM is unpaused.
That's why the oVirt team made all gluster-based disks to be fully preallocated.
Gluster disks are thin (raw-sparse) by default just like any other
file based storage.
If this theory was correct, this would fail consistently on gluster:
1. create raw sparse image
truncate -s 100g /rhev/data-center/mnt/glusterSD/server:_path/test
2. Fill image quickly with data
dd if=/dev/zero bs=1M | tr "\0" "U" | dd
of=/rhev/data-center/mnt/glusterSD/server:_path/test bs=1M count=12800
iflag=fullblock oflag=direct conv=notrunc
According to your theory gluster will fail to allocate shards fast
enough and fail the I/O.
Nir