[ovirt-users] Re: VM hanging at sustained high throughput

23 Sep 2021

      I replaced the SSD intent log drive with an NVME drive, and the system is
much more stable now.

*David Johnson*
*Director of Development, Maxis Technology*
844.696.2947 ext 702 (o) | 479.531.3590 (c)
<https://www.linkedin.com/in/pojoguy/>
<https://maxistechnology.com/wp-content/uploads/vcards/vcard-David_Johnson.vcf>
<https://maxistechnology.com/>

*Follow us:*  <https://www.linkedin.com/company/maxis-tech-inc/>

On Thu, May 27, 2021 at 5:57 PM David Johnson <djohnson@maxistechnology.com>
wrote:
...
Hi ovirt gurus,
This is an interesting issue, one I never expected to have.
When I push high volumes of writes to my NAS, I will cause VM's to go into
a paused state. I'm looking at this from a number of angles, including
upgrades on the NAS appliance.
I can reproduce this problem at will running a centos 7.9 VM on Ovirt 4.5.
*Questions:*
1. Is my analysis of the failure (below) reasonable/correct?
2. What am I looking for to validate this?
3. Is there a configuration that I can set to make it a little more robust
while I acquire the hardware to improve the NAS?
*Reproduction:*
Standard test of file write speed:
[root@cen-79-pgsql-01 ~]# dd if=/dev/zero of=./test bs=512k count=4096
oflag=direct
4096+0 records in
4096+0 records out
2147483648 bytes (2.1 GB) copied, 1.68431 s, 1.3 GB/s
Give it more data
[root@cen-79-pgsql-01 ~]# dd if=/dev/zero of=./test bs=512k count=12228
oflag=direct
12228+0 records in
12228+0 records out
6410993664 bytes (6.4 GB) copied, 7.22078 s, 888 MB/s
The odds are about 50/50 that 6 GB will kill the VM, but 100% when I hit 8
GB.
*Analysis:*
What I think appears to be happening is that the intent cache on the NAS
is on an SSD, and my VM's are pushing data about three times as fast as the
SSD can handle. When the SSD gets queued up beyond a certain point, the NAS
(which places reliability over speed) says "Whoah Nellie!", and the VM
chokes.
*David Johnson*