On Mon, Jul 3, 2017 at 11:00 AM, M Mahboubian <m_mahboubian@yahoo.com> wrote:
Hi Yanis,

Thank you for your reply. 

| Interesting - what interface are they using?
| Is that raw or raw sparse? How did you perform the conversion? (or no conversion - just copied the disks over?)

The VM disks are in the SAN storage in order to use oVirt we just pointed them to the oVirt VMs. This is how we did it precisely:
First we created the VMs in oVirt with disks which are the same size as the existing disks. then we deleted these disks which was generated by oVirt and renamed our existing disks to match the deleted ones naming-wise. Finally we started the oVirt VMs and they were able to run and these VMs are always ok without any issue.

The new VMs which have problem are from scratch (no template). One thing though, all these new VMs are created based on an CentOS 7 ISO. We have not tried any other flavor of Linux.

The kernel 4.1 is actually from Oracle Linux repository since we needed to have OCFS2 support. So after installing oVirt we updated the kernel to Oracle Linux kernel 4.1 since that kernel supports OCFS2.

Would it be possible to test with the regular CentOS kernel? Just to ensure it's not the kernel causing this?
Y.
 

| We might need to get libvirt debug logs (and perhaps journal output of the host).

I'll get this information and post here.

Regards






On Monday, July 3, 2017 3:01 PM, Yaniv Kaul <ykaul@redhat.com> wrote:




On Mon, Jul 3, 2017 at 6:49 AM, M Mahboubian <m_mahboubian@yahoo.com> wrote:
Hi Yaniv,

Thanks for your reply. Apologies for my late reply we had a long holiday here. 

To answer you:

Yes the  guest VM become completely frozen and non responsive as soon as its disk has any activity for example when we shutdown or do a yum update. 


Versions of all the components involved - guest OS, host OS (qemu-kvm version), how do you run the VM (vdsm log would be helpful here), exact storage specification (1Gb or 10Gb link? What is the NFS version? What is it hosted on? etc.)
 Y.

Some facts about our environment:

1) Previously, this environment was using XEN using raw disk and we change it to Ovirt (Ovirt were able to read the VMs's disks without any conversion.) 

Interesting - what interface are they using?
Is that raw or raw sparse? How did you perform the conversion? (or no conversion - just copied the disks over?)
 
2) The issue we are facing is not happening for any of the existing VMs. 
3) This issue only happens for new VMs.

New VMs from blank, or from a template (as a snapshot over the previous VMs) ?
 
4) Guest (kernel v3.10) and host(kernel v4.1) OSes are both CentOS 7 minimal installation. 

Kernel 4.1? From where?
 
5) NFS version 4 and Using Ovirt 4.1
6) The network speed is 1 GB.

That might be very slow (but should not cause such an issue, unless severely overloaded. 

7) The output for rpm -qa | grep qemu-kvm shows:
     qemu-kvm-common-ev-2.6.0-28. e17_3.6.1.x86_64
     qemu-kvm-tools-ev-2.6.0-28. e17_3.6.1.x86_64
     qemu-kvm-ev-2.6.0-28.e17_3.6. 1.x86_64

That's good - that's almost the latest-greatest.
 
8) The storage is from a SAN device which is connected to the NFS server using fiber channel.

So for example during shutdown also it froze and shows something like this in event section:

VM ILMU_WEB has been paused due to storage I/O problem.

We might need to get libvirt debug logs (and perhaps journal output of the host).
Y.
 


More information:

VDSM log at the time of this issue (The issue happened at Jul 3, 2017 9:50:43 AM):

2017-07-03 09:50:37,113+0800 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515) 2017-07-03 09:50:37,897+0800 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.02 seconds (__init__:515) 2017-07-03 09:50:42,510+0800 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)
2017-07-03 09:50:43,548+0800 INFO (jsonrpc/3) [dispatcher] Run and protect: repoStats(options=None) (logUtils:51) 2017-07-03 09:50:43,548+0800 INFO (jsonrpc/3) [dispatcher] Run and protect: repoStats, Return response: {u'e01186c1-7e44-4808-b551- 4722f0f8e84b': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000144822', 'lastCheck': '8.9', 'valid': True}, u'721b5233-b0ba-4722-8a7d- ba2a372190a0': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000327909', 'lastCheck': '8.9', 'valid': True}, u'94775bd3-3244-45b4-8a06- 37eff8856afa': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000256425', 'lastCheck': '8.9', 'valid': True}, u'731bb771-5b73-4b5c-ac46- 56499df97721': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000238159', 'lastCheck': '8.9', 'valid': True}, u'f620781f-93d4-4410-8697- eb41045cacd6': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00022004', 'lastCheck': '8.9', 'valid': True}, u'a1a7d0a4-e3b6-4bd5-862b- 96e70dae3f29': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000298581', 'lastCheck': '8.8', 'valid': True}} (logUtils:54) 2017-07-03 09:50:43,563+0800 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.01 seconds (__init__:515) 2017-07-03 09:50:46,737+0800 INFO (periodic/3) [dispatcher] Run and protect: getVolumeSize(sdUUID=u' 721b5233-b0ba-4722-8a7d- ba2a372190a0', spUUID=u'b04ca6e4-2660-4eaa- acdb-c1dae4e21f2d', imgUUID=u'3c26476e-1dae-44d7- 9208-531b91ae5ae1', volUUID=u'a7e789fb-6646-4d0a- 9b51-f5ab8242c8d5', options=None) (logUtils:51) 2017-07-03 09:50:46,738+0800 INFO (periodic/0) [dispatcher] Run and protect: getVolumeSize(sdUUID=u' f620781f-93d4-4410-8697- eb41045cacd6', spUUID=u'b04ca6e4-2660-4eaa- acdb-c1dae4e21f2d', imgUUID=u'2158fdae-54e1-413d- a844-73da5d1bb4ca', volUUID=u'6ee0b0eb-0bba-4e18- 9c00-c1539b632e8a', options=None) (logUtils:51) 2017-07-03 09:50:46,740+0800 INFO (periodic/2) [dispatcher] Run and protect: getVolumeSize(sdUUID=u' f620781f-93d4-4410-8697- eb41045cacd6', spUUID=u'b04ca6e4-2660-4eaa- acdb-c1dae4e21f2d', imgUUID=u'a967016d-a56b-41e8- b7a2-57903cbd2825', volUUID=u'784514cb-2b33-431c- b193-045f23c596d8', options=None) (logUtils:51) 2017-07-03 09:50:46,741+0800 INFO (periodic/1) [dispatcher] Run and protect: getVolumeSize(sdUUID=u' 721b5233-b0ba-4722-8a7d- ba2a372190a0', spUUID=u'b04ca6e4-2660-4eaa- acdb-c1dae4e21f2d', imgUUID=u'bb35c163-f068-4f08- a1c2-28c4cb1b76d9', volUUID=u'fce7e0a0-7411-4d8c- b72c-2f46c4b4db1e', options=None) (logUtils:51) 2017-07-03 09:50:46,743+0800 INFO (periodic/0) [dispatcher] Run and protect: getVolumeSize, Return response: {'truesize': '6361276416', 'apparentsize': '107374182400'} (logUtils:54)

......
......

2017-07-03 09:52:16,941+0800 INFO  (libvirt/events) [virt.vm] (vmId='c84f519e-398d-40a3- 85b2-b7e53f3d7f67') abnormal vm stop device scsi0-0-0-0 error eio (vm:4112)
2017-07-03 09:52:16,941+0800 INFO  (libvirt/events) [virt.vm] (vmId='c84f519e-398d-40a3- 85b2-b7e53f3d7f67') CPU stopped: onIOError (vm:4997)
2017-07-03 09:52:16,942+0800 INFO  (libvirt/events) [virt.vm] (vmId='c84f519e-398d-40a3- 85b2-b7e53f3d7f67') CPU stopped: onSuspend (vm:4997)
2017-07-03 09:52:16,942+0800 INFO  (libvirt/events) [virt.vm] (vmId='c84f519e-398d-40a3- 85b2-b7e53f3d7f67') abnormal vm stop device scsi0-0-0-0 error eio (vm:4112)
2017-07-03 09:52:16,943+0800 INFO  (libvirt/events) [virt.vm] (vmId='c84f519e-398d-40a3- 85b2-b7e53f3d7f67') CPU stopped: onIOError (vm:4997)
2017-07-03 09:52:16,943+0800 INFO  (libvirt/events) [virt.vm] (vmId='c84f519e-398d-40a3- 85b2-b7e53f3d7f67') abnormal vm stop device scsi0-0-0-0 error eio (vm:4112)
2017-07-03 09:52:16,944+0800 INFO  (libvirt/events) [virt.vm] (vmId='c84f519e-398d-40a3- 85b2-b7e53f3d7f67') CPU stopped: onIOError





On Thursday, June 22, 2017, 2:48 PM, Yaniv Kaul <ykaul@redhat.com> wrote:


On Thu, Jun 22, 2017 at 5:07 AM, M Mahboubian <m_mahboubian@yahoo.com> wrote:
Dear all,
I appreciate if anybody could possibly help with the issue I am facing.

In our environment we have 2 hosts 1 NFS server and 1 ovirt engine server. The NFS server provides storage to the VMs in the hosts.

I can create new VMs and install os but once i do something like yum update the VM freezes. I can reproduce this every single time I do yum update.

Is it paused, or completely frozen?
 

what information/log files should I provide you to trubleshoot this?

Versions of all the components involved - guest OS, host OS (qemu-kvm version), how do you run the VM (vdsm log would be helpful here), exact storage specification (1Gb or 10Gb link? What is the NFS version? What is it hosted on? etc.)
 Y.


 Regards

______________________________ _________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/ mailman/listinfo/users