VDSM Command failed: Heartbeat Exceeded

Hi guys, Please could someone assist me, my DC seems to be trying to re-negotiate SPM and apparently it's failing. I tried to delete an old autogenerated snapshot and shortly after that the issue seemed to start, however after about an hour, the snapshot said successfully deleted, and then SPM negotiated again albeit for a short period before it started trying to re-negotiate again. Last week I upgraded from ovirt 3.5 to 3.6, I also upgraded one of my 4 hosts using the 3.6 repo to the latest available from that repo and did a yum update too. I have 4 nodes and my ovirt engine is a KVM guest on another physical machine on the network. I'm using an FC SAN with ATTO HBA's and recently we've started seeing some degraded IO. The SAN appears to be alright and the disks all seem to check out, but we are having rather slow IOPS at the moment, which we trying to track down. ovirt engine CentOS release 6.9 (Final) ebay-cors-filter-1.0.1-0.1.ovirt.el6.noarch ovirt-engine-3.6.7.5-1.el6.noarch ovirt-engine-backend-3.6.7.5-1.el6.noarch ovirt-engine-cli-3.6.2.0-1.el6.noarch ovirt-engine-dbscripts-3.6.7.5-1.el6.noarch ovirt-engine-extension-aaa-jdbc-1.0.7-1.el6.noarch ovirt-engine-extensions-api-impl-3.6.7.5-1.el6.noarch ovirt-engine-jboss-as-7.1.1-1.el6.x86_64 ovirt-engine-lib-3.6.7.5-1.el6.noarch ovirt-engine-restapi-3.6.7.5-1.el6.noarch ovirt-engine-sdk-python-3.6.7.0-1.el6.noarch ovirt-engine-setup-3.6.7.5-1.el6.noarch ovirt-engine-setup-base-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-common-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-websocket-proxy-3.6.7.5-1.el6.noarch ovirt-engine-tools-3.6.7.5-1.el6.noarch ovirt-engine-tools-backup-3.6.7.5-1.el6.noarch ovirt-engine-userportal-3.6.7.5-1.el6.noarch ovirt-engine-vmconsole-proxy-helper-3.6.7.5-1.el6.noarch ovirt-engine-webadmin-portal-3.6.7.5-1.el6.noarch ovirt-engine-websocket-proxy-3.6.7.5-1.el6.noarch ovirt-engine-wildfly-8.2.1-1.el6.x86_64 ovirt-engine-wildfly-overlay-8.0.5-1.el6.noarch ovirt-host-deploy-1.4.1-1.el6.noarch ovirt-host-deploy-java-1.4.1-1.el6.noarch ovirt-image-uploader-3.6.0-1.el6.noarch ovirt-iso-uploader-3.6.0-1.el6.noarch ovirt-release34-1.0.3-1.noarch ovirt-release35-006-1.noarch ovirt-release36-3.6.7-1.noarch ovirt-setup-lib-1.0.1-1.el6.noarch ovirt-vmconsole-1.0.2-1.el6.noarch ovirt-vmconsole-proxy-1.0.2-1.el6.noarch node01 (CentOS 6.9) vdsm-4.16.30-0.el6.x86_64 vdsm-cli-4.16.30-0.el6.noarch vdsm-jsonrpc-4.16.30-0.el6.noarch vdsm-python-4.16.30-0.el6.noarch vdsm-python-zombiereaper-4.16.30-0.el6.noarch vdsm-xmlrpc-4.16.30-0.el6.noarch vdsm-yajsonrpc-4.16.30-0.el6.noarch gpxe-roms-qemu-0.9.7-6.16.el6.noarch qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 libvirt-0.10.2-62.el6.x86_64 libvirt-client-0.10.2-62.el6.x86_64 libvirt-lock-sanlock-0.10.2-62.el6.x86_64 libvirt-python-0.10.2-62.el6.x86_64 node01 was upgraded out of desperation after I tried changing my DC and cluster version to 3.6, but then found that none of my hosts could be activated out of maintenance due to an incompatibility with 3.6 (I'm still not sure why as searching seemed to indicate Centos 6.x was compatible. I then had to remove all 4 hosts, and change the cluster version back to 3.5 and then re-add them. When I tried changing the cluster version to 3.6 I did get a complaint about using the "legacy protocol" so on each host under Advanced, I changed them to use the JSON protocol, and this seemed to resolve it, however once changing the DC/Cluster back to 3.5 the option to change the protocol back to Legacy is no longer shown. node02 (Centos 6.7) vdsm-4.16.30-0.el6.x86_64 vdsm-cli-4.16.30-0.el6.noarch vdsm-jsonrpc-4.16.30-0.el6.noarch vdsm-python-4.16.30-0.el6.noarch vdsm-python-zombiereaper-4.16.30-0.el6.noarch vdsm-xmlrpc-4.16.30-0.el6.noarch vdsm-yajsonrpc-4.16.30-0.el6.noarch gpxe-roms-qemu-0.9.7-6.14.el6.noarch qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 libvirt-0.10.2-54.el6_7.6.x86_64 libvirt-client-0.10.2-54.el6_7.6.x86_64 libvirt-lock-sanlock-0.10.2-54.el6_7.6.x86_64 libvirt-python-0.10.2-54.el6_7.6.x86_64 node03 CentOS 6.7 vdsm-4.16.30-0.el6.x86_64 vdsm-cli-4.16.30-0.el6.noarch vdsm-jsonrpc-4.16.30-0.el6.noarch vdsm-python-4.16.30-0.el6.noarch vdsm-python-zombiereaper-4.16.30-0.el6.noarch vdsm-xmlrpc-4.16.30-0.el6.noarch vdsm-yajsonrpc-4.16.30-0.el6.noarch gpxe-roms-qemu-0.9.7-6.14.el6.noarch qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 libvirt-0.10.2-54.el6_7.6.x86_64 libvirt-client-0.10.2-54.el6_7.6.x86_64 libvirt-lock-sanlock-0.10.2-54.el6_7.6.x86_64 libvirt-python-0.10.2-54.el6_7.6.x86_64 node04 (Centos 6.7) vdsm-4.16.20-1.git3a90f62.el6.x86_64 vdsm-cli-4.16.20-1.git3a90f62.el6.noarch vdsm-jsonrpc-4.16.20-1.git3a90f62.el6.noarch vdsm-python-4.16.20-1.git3a90f62.el6.noarch vdsm-python-zombiereaper-4.16.20-1.git3a90f62.el6.noarch vdsm-xmlrpc-4.16.20-1.git3a90f62.el6.noarch vdsm-yajsonrpc-4.16.20-1.git3a90f62.el6.noarch gpxe-roms-qemu-0.9.7-6.15.el6.noarch qemu-img-0.12.1.2-2.491.el6_8.1.x86_64 qemu-kvm-0.12.1.2-2.491.el6_8.1.x86_64 qemu-kvm-tools-0.12.1.2-2.503.el6_9.3.x86_64 libvirt-0.10.2-60.el6.x86_64 libvirt-client-0.10.2-60.el6.x86_64 libvirt-lock-sanlock-0.10.2-60.el6.x86_64 libvirt-python-0.10.2-60.el6.x86_64 I'm seeing a rather confusing error in the /var/log/messages on all 4 hosts as follows.... Jul 31 16:41:36 node01 multipathd: 36001b4d80001c80d0000000000000000: sdb - directio checker reports path is down Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 01 00 Jul 31 16:41:41 node01 kernel: end_request: I/O error, dev sdb, sector 0 I say confusing, because I don't have a 3000GB LUN [root@node01 ~]# fdisk -l | grep 3000 Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes I did have one on Friday, last week, but I trashed it and changed it to a 1500GB LUN instead, so I'm not sure if perhaps this error is still trying to connect to the old LUN perhaps? My LUNS are as follows... Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes (this one doesn't actually exist anymore) Disk /dev/sdc: 1000.0 GB, 999999668224 bytes Disk /dev/sdd: 1000.0 GB, 999999668224 bytes Disk /dev/sde: 1000.0 GB, 999999668224 bytes Disk /dev/sdf: 1000.0 GB, 999999668224 bytes Disk /dev/sdg: 1000.0 GB, 999999668224 bytes Disk /dev/sdh: 1000.0 GB, 999999668224 bytes Disk /dev/sdi: 1000.0 GB, 999999668224 bytes Disk /dev/sdj: 1000.0 GB, 999999668224 bytes Disk /dev/sdk: 1000.0 GB, 999999668224 bytes Disk /dev/sdm: 1000.0 GB, 999999668224 bytes Disk /dev/sdl: 1000.0 GB, 999999668224 bytes Disk /dev/sdn: 1000.0 GB, 999999668224 bytes Disk /dev/sdo: 1000.0 GB, 999999668224 bytes Disk /dev/sdp: 1000.0 GB, 999999668224 bytes Disk /dev/sdq: 1000.0 GB, 999999668224 bytes Disk /dev/sdr: 1000.0 GB, 999988133888 bytes Disk /dev/sds: 1500.0 GB, 1499999764480 bytes Disk /dev/sdt: 1500.0 GB, 1499999502336 bytes I'm quite low on SAN disk space currently so I'm a little hesitant to migrate VM's around for fear of the migrations creating too many snapshots and filling up my SAN. We are in the process of expanding the SAN Array too, but we trying to get to the bottom of the bad IOPS at the moment before adding on addition overhead. Ping tests between hosts and engine all look alright, so I don't suspect network issues. I know this is very vague, everything is currently operational, however as you can see in the attached logs, I'm getting lots of ERROR messages. Any help or guidance is greatly appreciated. Thanks. Regards. Neil Wilson.

Hi guys, Sorry to repost but I'm rather desperate here. Thanks Regards. Neil Wilson. On 31 Jul 2017 16:51, "Neil" <nwilson123@gmail.com> wrote:
Hi guys,
Please could someone assist me, my DC seems to be trying to re-negotiate SPM and apparently it's failing. I tried to delete an old autogenerated snapshot and shortly after that the issue seemed to start, however after about an hour, the snapshot said successfully deleted, and then SPM negotiated again albeit for a short period before it started trying to re-negotiate again.
Last week I upgraded from ovirt 3.5 to 3.6, I also upgraded one of my 4 hosts using the 3.6 repo to the latest available from that repo and did a yum update too.
I have 4 nodes and my ovirt engine is a KVM guest on another physical machine on the network. I'm using an FC SAN with ATTO HBA's and recently we've started seeing some degraded IO. The SAN appears to be alright and the disks all seem to check out, but we are having rather slow IOPS at the moment, which we trying to track down.
ovirt engine CentOS release 6.9 (Final) ebay-cors-filter-1.0.1-0.1.ovirt.el6.noarch ovirt-engine-3.6.7.5-1.el6.noarch ovirt-engine-backend-3.6.7.5-1.el6.noarch ovirt-engine-cli-3.6.2.0-1.el6.noarch ovirt-engine-dbscripts-3.6.7.5-1.el6.noarch ovirt-engine-extension-aaa-jdbc-1.0.7-1.el6.noarch ovirt-engine-extensions-api-impl-3.6.7.5-1.el6.noarch ovirt-engine-jboss-as-7.1.1-1.el6.x86_64 ovirt-engine-lib-3.6.7.5-1.el6.noarch ovirt-engine-restapi-3.6.7.5-1.el6.noarch ovirt-engine-sdk-python-3.6.7.0-1.el6.noarch ovirt-engine-setup-3.6.7.5-1.el6.noarch ovirt-engine-setup-base-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-common-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.7.5-1.el6.noarch ovirt-engine-setup-plugin-websocket-proxy-3.6.7.5-1.el6.noarch ovirt-engine-tools-3.6.7.5-1.el6.noarch ovirt-engine-tools-backup-3.6.7.5-1.el6.noarch ovirt-engine-userportal-3.6.7.5-1.el6.noarch ovirt-engine-vmconsole-proxy-helper-3.6.7.5-1.el6.noarch ovirt-engine-webadmin-portal-3.6.7.5-1.el6.noarch ovirt-engine-websocket-proxy-3.6.7.5-1.el6.noarch ovirt-engine-wildfly-8.2.1-1.el6.x86_64 ovirt-engine-wildfly-overlay-8.0.5-1.el6.noarch ovirt-host-deploy-1.4.1-1.el6.noarch ovirt-host-deploy-java-1.4.1-1.el6.noarch ovirt-image-uploader-3.6.0-1.el6.noarch ovirt-iso-uploader-3.6.0-1.el6.noarch ovirt-release34-1.0.3-1.noarch ovirt-release35-006-1.noarch ovirt-release36-3.6.7-1.noarch ovirt-setup-lib-1.0.1-1.el6.noarch ovirt-vmconsole-1.0.2-1.el6.noarch ovirt-vmconsole-proxy-1.0.2-1.el6.noarch
node01 (CentOS 6.9) vdsm-4.16.30-0.el6.x86_64 vdsm-cli-4.16.30-0.el6.noarch vdsm-jsonrpc-4.16.30-0.el6.noarch vdsm-python-4.16.30-0.el6.noarch vdsm-python-zombiereaper-4.16.30-0.el6.noarch vdsm-xmlrpc-4.16.30-0.el6.noarch vdsm-yajsonrpc-4.16.30-0.el6.noarch gpxe-roms-qemu-0.9.7-6.16.el6.noarch qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 libvirt-0.10.2-62.el6.x86_64 libvirt-client-0.10.2-62.el6.x86_64 libvirt-lock-sanlock-0.10.2-62.el6.x86_64 libvirt-python-0.10.2-62.el6.x86_64 node01 was upgraded out of desperation after I tried changing my DC and cluster version to 3.6, but then found that none of my hosts could be activated out of maintenance due to an incompatibility with 3.6 (I'm still not sure why as searching seemed to indicate Centos 6.x was compatible. I then had to remove all 4 hosts, and change the cluster version back to 3.5 and then re-add them. When I tried changing the cluster version to 3.6 I did get a complaint about using the "legacy protocol" so on each host under Advanced, I changed them to use the JSON protocol, and this seemed to resolve it, however once changing the DC/Cluster back to 3.5 the option to change the protocol back to Legacy is no longer shown.
node02 (Centos 6.7) vdsm-4.16.30-0.el6.x86_64 vdsm-cli-4.16.30-0.el6.noarch vdsm-jsonrpc-4.16.30-0.el6.noarch vdsm-python-4.16.30-0.el6.noarch vdsm-python-zombiereaper-4.16.30-0.el6.noarch vdsm-xmlrpc-4.16.30-0.el6.noarch vdsm-yajsonrpc-4.16.30-0.el6.noarch gpxe-roms-qemu-0.9.7-6.14.el6.noarch qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 libvirt-0.10.2-54.el6_7.6.x86_64 libvirt-client-0.10.2-54.el6_7.6.x86_64 libvirt-lock-sanlock-0.10.2-54.el6_7.6.x86_64 libvirt-python-0.10.2-54.el6_7.6.x86_64
node03 CentOS 6.7 vdsm-4.16.30-0.el6.x86_64 vdsm-cli-4.16.30-0.el6.noarch vdsm-jsonrpc-4.16.30-0.el6.noarch vdsm-python-4.16.30-0.el6.noarch vdsm-python-zombiereaper-4.16.30-0.el6.noarch vdsm-xmlrpc-4.16.30-0.el6.noarch vdsm-yajsonrpc-4.16.30-0.el6.noarch gpxe-roms-qemu-0.9.7-6.14.el6.noarch qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 libvirt-0.10.2-54.el6_7.6.x86_64 libvirt-client-0.10.2-54.el6_7.6.x86_64 libvirt-lock-sanlock-0.10.2-54.el6_7.6.x86_64 libvirt-python-0.10.2-54.el6_7.6.x86_64
node04 (Centos 6.7) vdsm-4.16.20-1.git3a90f62.el6.x86_64 vdsm-cli-4.16.20-1.git3a90f62.el6.noarch vdsm-jsonrpc-4.16.20-1.git3a90f62.el6.noarch vdsm-python-4.16.20-1.git3a90f62.el6.noarch vdsm-python-zombiereaper-4.16.20-1.git3a90f62.el6.noarch vdsm-xmlrpc-4.16.20-1.git3a90f62.el6.noarch vdsm-yajsonrpc-4.16.20-1.git3a90f62.el6.noarch gpxe-roms-qemu-0.9.7-6.15.el6.noarch qemu-img-0.12.1.2-2.491.el6_8.1.x86_64 qemu-kvm-0.12.1.2-2.491.el6_8.1.x86_64 qemu-kvm-tools-0.12.1.2-2.503.el6_9.3.x86_64 libvirt-0.10.2-60.el6.x86_64 libvirt-client-0.10.2-60.el6.x86_64 libvirt-lock-sanlock-0.10.2-60.el6.x86_64 libvirt-python-0.10.2-60.el6.x86_64
I'm seeing a rather confusing error in the /var/log/messages on all 4 hosts as follows....
Jul 31 16:41:36 node01 multipathd: 36001b4d80001c80d0000000000000000: sdb - directio checker reports path is down Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 01 00 Jul 31 16:41:41 node01 kernel: end_request: I/O error, dev sdb, sector 0
I say confusing, because I don't have a 3000GB LUN
[root@node01 ~]# fdisk -l | grep 3000 Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes
I did have one on Friday, last week, but I trashed it and changed it to a 1500GB LUN instead, so I'm not sure if perhaps this error is still trying to connect to the old LUN perhaps?
My LUNS are as follows...
Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes (this one doesn't actually exist anymore) Disk /dev/sdc: 1000.0 GB, 999999668224 bytes Disk /dev/sdd: 1000.0 GB, 999999668224 bytes Disk /dev/sde: 1000.0 GB, 999999668224 bytes Disk /dev/sdf: 1000.0 GB, 999999668224 bytes Disk /dev/sdg: 1000.0 GB, 999999668224 bytes Disk /dev/sdh: 1000.0 GB, 999999668224 bytes Disk /dev/sdi: 1000.0 GB, 999999668224 bytes Disk /dev/sdj: 1000.0 GB, 999999668224 bytes Disk /dev/sdk: 1000.0 GB, 999999668224 bytes Disk /dev/sdm: 1000.0 GB, 999999668224 bytes Disk /dev/sdl: 1000.0 GB, 999999668224 bytes Disk /dev/sdn: 1000.0 GB, 999999668224 bytes Disk /dev/sdo: 1000.0 GB, 999999668224 bytes Disk /dev/sdp: 1000.0 GB, 999999668224 bytes Disk /dev/sdq: 1000.0 GB, 999999668224 bytes Disk /dev/sdr: 1000.0 GB, 999988133888 bytes Disk /dev/sds: 1500.0 GB, 1499999764480 bytes Disk /dev/sdt: 1500.0 GB, 1499999502336 bytes
I'm quite low on SAN disk space currently so I'm a little hesitant to migrate VM's around for fear of the migrations creating too many snapshots and filling up my SAN. We are in the process of expanding the SAN Array too, but we trying to get to the bottom of the bad IOPS at the moment before adding on addition overhead.
Ping tests between hosts and engine all look alright, so I don't suspect network issues.
I know this is very vague, everything is currently operational, however as you can see in the attached logs, I'm getting lots of ERROR messages.
Any help or guidance is greatly appreciated.
Thanks.
Regards.
Neil Wilson.

On Mon, Jul 31, 2017 at 5:54 PM Neil <nwilson123@gmail.com> wrote:
Hi guys,
Please could someone assist me, my DC seems to be trying to re-negotiate SPM and apparently it's failing. I tried to delete an old autogenerated snapshot and shortly after that the issue seemed to start, however after about an hour, the snapshot said successfully deleted, and then SPM negotiated again albeit for a short period before it started trying to re-negotiate again.
This is a known issue in old versions, should be improved in 4.1. Oved, do you know if there is something we can tune on engine side to avoid too eager disconnections when vdsm hartbeat is delayed?
Last week I upgraded from ovirt 3.5 to 3.6, I also upgraded one of my 4 hosts using the 3.6 repo to the latest available from that repo and did a yum update too.
I have 4 nodes and my ovirt engine is a KVM guest on another physical machine on the network. I'm using an FC SAN with ATTO HBA's and recently we've started seeing some degraded IO. The SAN appears to be alright and the disks all seem to check out, but we are having rather slow IOPS at the moment, which we trying to track down.
ovirt engine CentOS release 6.9 (Final)
...
ovirt-engine-3.6.7.5-1.el6.noarchvdsm-4.16.30-0.el6.x86_64
...
vdsm-cli-4.16.30-0.el6.noarch
Using ovirt 3.6 at this time is not a good idea. You want to use 4.1.
I'm seeing a rather confusing error in the /var/log/messages on all 4 hosts as follows....
Jul 31 16:41:36 node01 multipathd: 36001b4d80001c80d0000000000000000: sdb - directio checker reports path is down Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 01 00 Jul 31 16:41:41 node01 kernel: end_request: I/O error, dev sdb, sector 0
I say confusing, because I don't have a 3000GB LUN
[root@node01 ~]# fdisk -l | grep 3000 Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes
I did have one on Friday, last week, but I trashed it and changed it to a 1500GB LUN instead, so I'm not sure if perhaps this error is still trying to connect to the old LUN perhaps?
Right. oVirt consume the devices you map to the host, we don't manage adding or removing devices. After you unmap a device on the storage, the host still see the old device as faulty path; you have to remove it. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/htm... My LUNS are as follows...
Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes (this one doesn't actually exist anymore) Disk /dev/sdc: 1000.0 GB, 999999668224 bytes Disk /dev/sdd: 1000.0 GB, 999999668224 bytes Disk /dev/sde: 1000.0 GB, 999999668224 bytes Disk /dev/sdf: 1000.0 GB, 999999668224 bytes Disk /dev/sdg: 1000.0 GB, 999999668224 bytes Disk /dev/sdh: 1000.0 GB, 999999668224 bytes Disk /dev/sdi: 1000.0 GB, 999999668224 bytes Disk /dev/sdj: 1000.0 GB, 999999668224 bytes Disk /dev/sdk: 1000.0 GB, 999999668224 bytes Disk /dev/sdm: 1000.0 GB, 999999668224 bytes Disk /dev/sdl: 1000.0 GB, 999999668224 bytes Disk /dev/sdn: 1000.0 GB, 999999668224 bytes Disk /dev/sdo: 1000.0 GB, 999999668224 bytes Disk /dev/sdp: 1000.0 GB, 999999668224 bytes Disk /dev/sdq: 1000.0 GB, 999999668224 bytes Disk /dev/sdr: 1000.0 GB, 999988133888 bytes Disk /dev/sds: 1500.0 GB, 1499999764480 bytes Disk /dev/sdt: 1500.0 GB, 1499999502336 bytes
I'm quite low on SAN disk space currently so I'm a little hesitant to migrate VM's around for fear of the migrations creating too many snapshots and filling up my SAN.
Migrating disk of live vms creates one 1g snapshot per disk. after migration is done, the old disk is deleted on the source storage domain. You need to remove the snapshot manually in 3.6 (removed automatically in 4.1). Migrating disk when vms are not running does not require additional space on the source storage domain. Nir
We are in the process of expanding the SAN Array too, but we trying to get to the bottom of the bad IOPS at the moment before adding on addition overhead.
Ping tests between hosts and engine all look alright, so I don't suspect network issues.
I know this is very vague, everything is currently operational, however as you can see in the attached logs, I'm getting lots of ERROR messages.
Any help or guidance is greatly appreciated.
Thanks.
Regards.
Neil Wilson.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (2)
-
Neil
-
Nir Soffer