ok I think I figured out what is happening...
I am currently running some redundancy tests on ovirt+replica2+arbiter glusterfs
This is happening under small file 4k fio random write test. like this
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
--filename=test --bs=4k --iodepth=32 --size=5G --readwrite=randwrite
I have raid 6 on all 3 gluster servers with capacitor backed raid cache. But it seems that
write back cache is not a good option for vm's, when a gluster node went down vm
became non responsive, and never recovered.. at some point the whole virtual disk became a
mess and had to be deleted.
Changing RAID controller cache to write-through and ping timeout on gluster volume
network.ping-timeout to 10 seconds seems to improve things..
Still large shard block sizes (128MB and above) sometimes get the vm in a paused like
situation like this..
kernel:NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [fio:3232]
Small shard block sizes while being slower handling the situation a lot better.. almost no
pausing of the vm and a consistent performance.
It seems here that the only "safe" option after quite a lot of testing is to
have shard block size at 8mb with all disks on one volume. It seems larger shard block
sizes, and multiple disks on various volume get the vm non responding and disks trashed..
This setup seems to work for me for the max reliability I can get with libgfapi..