Hello,
I'm facing a strange issue on my OVirt Dev pool.
Indeed, when I create high disk load a VM (kickstart installation or iozone test for
example) , VM is paused due to storage I/O error.
Problem is 100% reproducible, and is located only on NFS (v3 and v4) on my EMC VNXe3200
NAS's (I have a 10TB and a 20TB NAS)
I have done test (simple iozone -a) with VM 1 vCPU / 2GB RAM and 2 disk (1*20GB +
1*10GB). Both VMs disk are placed into the same SAN / NAS for each test. Results are:
- EMC VNXe3200 (10TB) NFSv3 => VM stopped after 10- 30s iozone lauch
- EMC VNXe3200 (20TB) NFSv3 => VM stopped after 10- 30s iozone lauch
- EMC VNXe3200 (10TB) ISCSI => No problem, iozone test finish, and performance are
"standard" regarding load of the VNXe (60MB/s sequential Write for info)
- EMC VNXe3200 (20TB) ISCSI=> No problem, iozone test finish, and performance are
"standard"regarding load of the VNXe (40-60MB/S sequential Write for info)
- NETAPP FAS2240 NFSv3 => No problem, iozone test finish, and performance are good
(100MB/s sequential Write for info)
- Freebsd10 NAS NFSv3 => No problem, iozone test finish, and performance are good
regarding NAS conf (80MB/s sequential Write for info)
I can't explain why I have an issue on NFS and I have not issue on ISCSI (on the same
EMC VNxe3200...).
NFS default params are keeped when storage added to
datacenter:(rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=XXXXXXX,local_lock=none,addr=XXXXXX)
(rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=XXXXXX,mountvers=3,mountport=1234,mountproto=udp,local_lock=all,addr=XXXXXX)
Debug logs on host does not help me a lot:
2018-08-22 15:36:13,883+0200 INFO (periodic/22) [vdsm.api] START multipath_health()
from=internal, task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 (api:46)
2018-08-22 15:36:13,883+0200 INFO (periodic/22) [vdsm.api] FINISH multipath_health
return={} from=internal, task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 (api:52)
2018-08-22 15:36:15,161+0200 INFO (libvirt/events) [virt.vm]
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') abnormal vm stop device
ua-179375b0-0a18-4fcb-a884-4aeb1c 8fed97
error eother (vm:5116)
2018-08-22 15:36:15,161+0200 INFO (libvirt/events) [virt.vm]
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onIOError (vm:6157)
2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] values:
{'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2',
'resumeBehavior': 'au
to_resume', 'memGuaranteedSize': 1024, 'launchPaused':
'false', 'startTime': 1534944832.058459, 'destroy_on_reboot':
False, 'pauseTime': 4999289.49} (metadata:596)
2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] values
updated: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2',
'resumeBehavi or':
'auto_resume', 'memGuaranteedSize': 1024, 'launchPaused':
'false', 'startTime': 1534944832.058459, 'destroy_on_reboot':
False, 'pauseTime': 4999289.49} (metadat
a:601)
2018-08-22 15:36:15,168+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] dumped
metadata for b139a9b9-16bc-40ee-ba84-d1d59e5ce17a: <?xml version='1.0' encoding
metadata blablablabla............................>
2018-08-22 15:36:15,169+0200 DEBUG (libvirt/events) [virt.vm]
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') event Suspended detail 2 opaque None
(vm:5520)
2018-08-22 15:36:15,169+0200 INFO (libvirt/events) [virt.vm]
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onSuspend (vm:6157)
2018-08-22 15:36:15,174+0200 WARN (libvirt/events) [virt.vm]
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') device sda reported I/O error
(vm:4065)
2018-08-22 15:36:15,340+0200 DEBUG (vmchannels) [virt.vm]
(vmId='46d496af-e2d0-4caa-9a13-10c624f265d8') Guest's message heartbeat:
{u'memory-stat': {u'swap_out': 0,
u'majflt': 0, u'swap_usage': 0, u'mem_cached': 119020,
u'mem_free': 3693900, u'mem_buffers': 2108, u'swap_in': 0,
u'swap_total': 8257532, u'pageflt': 141, u'mem_tota
l': 3878980, u'mem_unused': 3572772},
u'free-ram': u'3607', u'apiVersion': 3} (guestagent:337)
Do you have some idea ?