Hi there, recently there was a network failure in our ovirt infrastructure,
causing ovirt engine to become unstable. It will restarted after
10-20minutes. Load average was high. Command issued will hanged.
Looking at host logs, there was endless locking errors
(/var/log/sanlock.log) below.
I tried to re-initialize by stopp HE HA agent/broker in all hosts, by
issuing following command in one of the host:
# su - vdsm -s /bin/bash
$ sanlock direct init -s hosted-engine:0:/rhev/data-center/mnt/192.168.10.10
\\:_engine/a184f8ac-b779-4bf8-81c3-751115e15436/ha_agent/hosted-engine.lockspace:0
and than restart both agent and broker in the same host.
However i m still getting the same problem.
Any advice on this matter?
Installation Infos:
----------------------
ovirt 3.5.3
vdsm-xmlrpc-4.16.24-0.el6.noarch
vdsm-python-zombiereaper-4.16.24-0.el6.noarch
vdsm-python-4.16.24-0.el6.noarch
vdsm-jsonrpc-4.16.24-0.el6.noarch
vdsm-4.16.24-0.el6.x86_64
vdsm-cli-4.16.24-0.el6.noarch
vdsm-yajsonrpc-4.16.24-0.el6.noarch
ovirt-hosted-engine-ha-1.2.6-2.el6.noarch
ovirt-hosted-engine-setup-1.2.6-0.0.master.20150812080635.git5295df1.el6.noarch
----- end installation infos
------ /var/log/sanlock.log
2015-08-18 04:20:02+0800 1704 [9385]: s2 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/dom_md/ids
2015-08-18 04:20:02+0800 1704 [9385]: s2 renewal error -202 delta_length 11
last_success 1662
2015-08-18 04:20:11+0800 1713 [9385]: a184f8ac aio collect 0
0x7fc5040008c0:0x7fc5040008d0:0x7fc50b9f7000 result 1048576:0 other free
2015-08-18 04:20:11+0800 1713 [9833]: hosted-e aio collect 0
0x7fc4f80008c0:0x7fc4f80008d0:0x7fc50baf9000 result 1048576:0 other free
2015-08-18 04:20:11+0800 1713 [9385]: a184f8ac aio collect 0
0x7fc504000910:0x7fc504000920:0x7fc50bbfb000 result 1048576:0 other free
2015-08-18 04:20:11+0800 1713 [9833]: hosted-e aio collect 0
0x7fc4f8000910:0x7fc4f8000920:0x7fc50beff000 result 1048576:0 other free
2015-08-18 04:21:43+0800 1805 [9385]: a184f8ac aio timeout 0
0x7fc5040008c0:0x7fc5040008d0:0x7fc50adf2000 ioto 10 to_count 18
2015-08-18 04:21:43+0800 1805 [9385]: s2 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/dom_md/ids
2015-08-18 04:21:43+0800 1805 [9385]: s2 renewal error -202 delta_length 10
last_success 1774
2015-08-18 04:21:43+0800 1805 [9833]: hosted-e aio timeout 0
0x7fc4f80008c0:0x7fc4f80008d0:0x7fc50aef4000 ioto 10 to_count 14
2015-08-18 04:21:43+0800 1805 [9833]: s3 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/images/190f4d2a-77f4-4403-af0d-62853560c653/2be7db4d-f30e-4873-b4ef-cff9e757341c
2015-08-18 04:21:43+0800 1805 [9833]: s3 renewal error -202 delta_length 10
last_success 1774
2015-08-18 04:21:52+0800 1814 [9385]: a184f8ac aio collect 0
0x7fc5040008c0:0x7fc5040008d0:0x7fc50adf2000 result 1048576:0 other free
2015-08-18 04:21:52+0800 1814 [9833]: hosted-e aio collect 0
0x7fc4f80008c0:0x7fc4f80008d0:0x7fc50aef4000 result 1048576:0 other free
2015-08-18 04:23:04+0800 1885 [9833]: hosted-e aio timeout 0
0x7fc4f80008c0:0x7fc4f80008d0:0x7fc50bbfb000 ioto 10 to_count 15
2015-08-18 04:23:04+0800 1885 [9833]: s3 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/images/190f4d2a-77f4-4403-af0d-62853560c653/2be7db4d-f30e-4873-b4ef-cff9e757341c
2015-08-18 04:23:04+0800 1885 [9833]: s3 renewal error -202 delta_length 10
last_success 1855
2015-08-18 04:23:04+0800 1886 [9385]: a184f8ac aio timeout 0
0x7fc5040008c0:0x7fc5040008d0:0x7fc50beff000 ioto 10 to_count 19
2015-08-18 04:23:04+0800 1886 [9385]: s2 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/dom_md/ids
2015-08-18 04:23:04+0800 1886 [9385]: s2 renewal error -202 delta_length 10
last_success 1855
2015-08-18 04:23:15+0800 1896 [9833]: hosted-e aio timeout 0
0x7fc4f8000910:0x7fc4f8000920:0x7fc50baf9000 ioto 10 to_count 16
2015-08-18 04:23:15+0800 1896 [9833]: s3 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/images/190f4d2a-77f4-4403-af0d-62853560c653/2be7db4d-f30e-4873-b4ef-cff9e757341c
2015-08-18 04:23:15+0800 1896 [9833]: s3 renewal error -202 delta_length 10
last_success 1855
2015-08-18 04:23:15+0800 1897 [9385]: a184f8ac aio timeout 0
0x7fc504000910:0x7fc504000920:0x7fc50b9f7000 ioto 10 to_count 20
2015-08-18 04:23:15+0800 1897 [9385]: s2 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/dom_md/ids
2015-08-18 04:23:15+0800 1897 [9385]: s2 renewal error -202 delta_length 11
last_success 1855
2015-08-18 04:23:26+0800 1907 [9833]: hosted-e aio timeout 0
0x7fc4f8000960:0x7fc4f8000970:0x7fc50aef4000 ioto 10 to_count 17
2015-08-18 04:23:26+0800 1907 [9833]: s3 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/images/190f4d2a-77f4-4403-af0d-62853560c653/2be7db4d-f30e-4873-b4ef-cff9e757341c
2015-08-18 04:23:26+0800 1907 [9833]: s3 renewal error -202 delta_length 10
last_success 1855
2015-08-18 04:23:26+0800 1908 [9385]: a184f8ac aio timeout 0
0x7fc504000960:0x7fc504000970:0x7fc50adf2000 ioto 10 to_count 21
2015-08-18 04:23:26+0800 1908 [9385]: s2 delta_renew read rv -202 offset 0
/rhev/data-center/mnt/192.168.10.10:
_engine/a184f8ac-b779-4bf8-81c3-751115e15436/dom_md/ids
2015-08-18 04:23:26+0800 1908 [9385]: s2 renewal error -202 delta_length 11
last_success 1855
----- end /var/log/sanlock.log