[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Sunday, 29 March 2020

On Sat, Mar 28, 2020 at 8:26 PM Nir Soffer <nsoffer(a)redhat.com&gt; wrote:

[snip]

...
 Hey Nir,
 > You are right ... This is just a theory based on my knowledge and it
 might not be valid.
 > We nees the libvirt logs to confirm or reject  the theory, but I'm
 convinced that is the reason.
 >
 > Yet,  it's quite  possible.
 > Qemu tries to write to the qcow disk on gluster.
 > Gluster is creating shards based of the ofset, as it was not done
 initially (preallocated  disk  take the full size  on gluster  and all
 shards are created  immediately). This takes time and requires  to be done
 on all bricks.
 > As the shard size  is too small (default 64MB), gluster has to create
 the next shard almost immediately,  but if it can't do it as fast as qemu
 is filling it's qcow2  disk

 Gluster can block the I/O until it can write the data to a new shard.
 There is no reason
 to return an error unless a real error happened.

 Also the VMs mentioned here are using raw disks, not qcow2:

 [snip] 
...
             <target bus="scsi" dev="sda"/>
             <source

file="/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6">
                 <seclabel model="dac" relabel="no"
type="none"/>
             </source>
             <driver cache="none" error_policy="stop"
io="threads"
 name="qemu" type="raw"/>

 [snip] 
...

 Note type="raw"

 >  -  qemu will get an I/O error and we know what happens there.
 > Later gluster manages to create the shard(s) , and the VM is unpaused.
 >
 > That's why the oVirt team made all gluster-based disks to be fully
 preallocated.

Yes, in my disk definition I used default proposed.
Possibly I only chose virito-scsi (see the sda name): I don't remember in
4.3.9 and red hat core os as os type if virtio would be the default one or
not...

...
 Gluster disks are thin (raw-sparse) by default just like any other
 file based storage.

 If this theory was correct, this would fail consistently on gluster:

 1. create raw sparse image

     truncate -s 100g /rhev/data-center/mnt/glusterSD/server:_path/test

 2. Fill image quickly with data

     dd if=/dev/zero bs=1M | tr "\0" "U" | dd
 of=/rhev/data-center/mnt/glusterSD/server:_path/test bs=1M count=12800
 iflag=fullblock oflag=direct conv=notrunc

 According to your theory gluster will fail to allocate shards fast
 enough and fail the I/O.

 Nir

I can also try the commands above, just to see the behavior, and report
here.
As soon as I can connect to the system

Gianluca

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster