
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue. ovirt-hosted-engine-setup: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at: 2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME until several minutes later when: 2015-02-12 13:24:31 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/system/clock.py", line 111, in _set_clock format="YYYYmmddHHMMSS+0000" File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/dialog/machine.py", line 235, in queryValue opcode, variable = self._readline().split(' ', 1) File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file resumed the process. Around the time when host-deploy got stucked I see in sanlock logs: 2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:32+0100 7261 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:32+0100 7261 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:32+0100 7261 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 And in VDSM logs: MainThread::DEBUG::2015-02-12 13:08:25,529::vdsm::58::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2015-02-12 13:08:25,529::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor MainThread::INFO::2015-02-12 13:08:25,530::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2015-02-12 13:08:25,530::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2015-02-12 13:08:25,531::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2015-02-12 13:08:25,535::momIF::91::MOM::(stop) Shutting down MOM MainThread::DEBUG::2015-02-12 13:08:25,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c52bd13a-24f0-4d8a-97d8-f6c53ca8bddc`::moving from state init -> state preparing MainThread::INFO::2015-02-12 13:08:25,535::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2015-02-12 13:08:25,535::storageServer::706::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped ioprocess communication (16795)::ERROR::2015-02-12 13:08:25,530::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed ioprocess communication (16795)::DEBUG::2015-02-12 13:08:25,537::__init__::298::IOProcessClient::(_run) Starting IOProcess... Followed by: MainThread::INFO::2015-02-12 13:08:25,548::domainMonitor::137::Storage.DomainMonitor::(close) Stopping domain monitors MainThread::INFO::2015-02-12 13:08:25,549::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::INFO::2015-02-12 13:08:25,549::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f (id: 1) Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'a7dba6e3-09ac-430a-8d6e-eea17cf18b8f', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) supervdsm logs doesn't show anything interesting but: # service supervdsmd status Redirecting to /bin/systemctl status supervdsmd.service supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: inactive (dead) since gio 2015-02-12 13:08:30 CET; 27min ago Main PID: 15859 (code=exited, status=0/SUCCESS) CGroup: /system.slice/supervdsmd.service feb 12 13:08:30 minidell.home systemd[1]: Stopping "Auxiliary vdsm service for running helper functions as root"... feb 12 13:08:30 minidell.home daemonAdapter[15859]: Traceback (most recent call last): feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers feb 12 13:08:30 minidell.home daemonAdapter[15859]: finalizer() feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ feb 12 13:08:30 minidell.home daemonAdapter[15859]: res = self._callback(*self._args, **self._kwargs) feb 12 13:08:30 minidell.home daemonAdapter[15859]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 client mech dispose feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 common mech dispose feb 12 13:08:30 minidell.home systemd[1]: Stopped "Auxiliary vdsm service for running helper functions as root". After a while the VM became unreachable as ovirt-hosted-engine-setup shows: 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, [Errno 110] Connection timed out 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in state Gluster logs don't show anything special. I'm guessing that the 15 minutes of host-deploy being stuck caused the issue here, but I don't see any relevant change that may have caused it. Attaching the tar.gz with the code used for ovirt-hosted-engine-setup; the rest is 3.5-snapshot over RHEL 7.1 Any hint is welcome. -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
until several minutes later when: 2015-02-12 13:24:31 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/system/clock.py", line 111, in _set_clock format="YYYYmmddHHMMSS+0000" File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/dialog/machine.py", line 235, in queryValue opcode, variable = self._readline().split(' ', 1) File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file
resumed the process.
Around the time when host-deploy got stucked I see in sanlock logs:
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:32+0100 7261 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:32+0100 7261 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:32+0100 7261 [16910]: s3 renewal error -107 delta_length 0 last_success 7239
Sanlock cannot access your storage. The rest of the errors are less interesting, as nothing will work without storage. While sanlock is failing, can you access that storage from another host or from the host running sanlock? Adding David for helping with debugging the sanlock errors.
And in VDSM logs:
MainThread::DEBUG::2015-02-12 13:08:25,529::vdsm::58::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2015-02-12 13:08:25,529::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor MainThread::INFO::2015-02-12 13:08:25,530::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2015-02-12 13:08:25,530::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2015-02-12 13:08:25,531::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2015-02-12 13:08:25,535::momIF::91::MOM::(stop) Shutting down MOM MainThread::DEBUG::2015-02-12 13:08:25,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c52bd13a-24f0-4d8a-97d8-f6c53ca8bddc`::moving from state init -> state preparing MainThread::INFO::2015-02-12 13:08:25,535::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2015-02-12 13:08:25,535::storageServer::706::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped ioprocess communication (16795)::ERROR::2015-02-12 13:08:25,530::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed ioprocess communication (16795)::DEBUG::2015-02-12 13:08:25,537::__init__::298::IOProcessClient::(_run) Starting IOProcess...
Followed by:
MainThread::INFO::2015-02-12 13:08:25,548::domainMonitor::137::Storage.DomainMonitor::(close) Stopping domain monitors MainThread::INFO::2015-02-12 13:08:25,549::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::INFO::2015-02-12 13:08:25,549::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f (id: 1) Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'a7dba6e3-09ac-430a-8d6e-eea17cf18b8f', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
supervdsm logs doesn't show anything interesting but:
# service supervdsmd status Redirecting to /bin/systemctl status supervdsmd.service supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: inactive (dead) since gio 2015-02-12 13:08:30 CET; 27min ago Main PID: 15859 (code=exited, status=0/SUCCESS) CGroup: /system.slice/supervdsmd.service
feb 12 13:08:30 minidell.home systemd[1]: Stopping "Auxiliary vdsm service for running helper functions as root"... feb 12 13:08:30 minidell.home daemonAdapter[15859]: Traceback (most recent call last): feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers feb 12 13:08:30 minidell.home daemonAdapter[15859]: finalizer() feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ feb 12 13:08:30 minidell.home daemonAdapter[15859]: res = self._callback(*self._args, **self._kwargs) feb 12 13:08:30 minidell.home daemonAdapter[15859]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 client mech dispose feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 common mech dispose feb 12 13:08:30 minidell.home systemd[1]: Stopped "Auxiliary vdsm service for running helper functions as root".
After a while the VM became unreachable as ovirt-hosted-engine-setup shows:
2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, [Errno 110] Connection timed out 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in state
Gluster logs don't show anything special.
I'm guessing that the 15 minutes of host-deploy being stuck caused the issue here, but I don't see any relevant change that may have caused it.
Attaching the tar.gz with the code used for ovirt-hosted-engine-setup; the rest is 3.5-snapshot over RHEL 7.1
Any hint is welcome.
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Il 12/02/2015 14:03, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
until several minutes later when: 2015-02-12 13:24:31 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/system/clock.py", line 111, in _set_clock format="YYYYmmddHHMMSS+0000" File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/dialog/machine.py", line 235, in queryValue opcode, variable = self._readline().split(' ', 1) File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file
resumed the process.
Around the time when host-deploy got stucked I see in sanlock logs:
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:32+0100 7261 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:32+0100 7261 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:32+0100 7261 [16910]: s3 renewal error -107 delta_length 0 last_success 7239
Sanlock cannot access your storage. The rest of the errors are less interesting, as nothing will work without storage.
While sanlock is failing, can you access that storage from another host or from the host running sanlock?
Adding David for helping with debugging the sanlock errors.
If you mean something like: # mount -t glusterfs minidell.home:/hosted_engine_glusterfs /tmp/test # # ll /tmp/test/ totale 4 drwxr-xr-x. 6 vdsm kvm 4096 12 feb 11.49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rwxr-xr-x. 1 vdsm kvm 0 12 feb 13.04 __DIRECT_IO_TEST__ yes, I can still access the gluster volume. # gluster volume info hosted_engine_glusterfs Volume Name: hosted_engine_glusterfs Type: Distribute Volume ID: 8d78571d-69ef-41fb-bc0a-34dc7d903df5 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: minidell.home:/path1 Options Reconfigured: nfs.disable: on storage.owner-gid: 36 storage.owner-uid: 36 network.ping-timeout: 10 auth.allow: * cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off server.allow-insecure: on user.cifs: disable # gluster volume status hosted_engine_glusterfs Status of volume: hosted_engine_glusterfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick minidell.home:/path1 49217 Y 16660 Task Status of Volume hosted_engine_glusterfs ------------------------------------------------------------------------------ There are no active volume tasks
And in VDSM logs:
MainThread::DEBUG::2015-02-12 13:08:25,529::vdsm::58::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2015-02-12 13:08:25,529::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor MainThread::INFO::2015-02-12 13:08:25,530::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2015-02-12 13:08:25,530::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2015-02-12 13:08:25,531::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2015-02-12 13:08:25,535::momIF::91::MOM::(stop) Shutting down MOM MainThread::DEBUG::2015-02-12 13:08:25,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c52bd13a-24f0-4d8a-97d8-f6c53ca8bddc`::moving from state init -> state preparing MainThread::INFO::2015-02-12 13:08:25,535::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2015-02-12 13:08:25,535::storageServer::706::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped ioprocess communication (16795)::ERROR::2015-02-12 13:08:25,530::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed ioprocess communication (16795)::DEBUG::2015-02-12 13:08:25,537::__init__::298::IOProcessClient::(_run) Starting IOProcess...
Followed by:
MainThread::INFO::2015-02-12 13:08:25,548::domainMonitor::137::Storage.DomainMonitor::(close) Stopping domain monitors MainThread::INFO::2015-02-12 13:08:25,549::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::INFO::2015-02-12 13:08:25,549::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f (id: 1) Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'a7dba6e3-09ac-430a-8d6e-eea17cf18b8f', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
supervdsm logs doesn't show anything interesting but:
# service supervdsmd status Redirecting to /bin/systemctl status supervdsmd.service supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: inactive (dead) since gio 2015-02-12 13:08:30 CET; 27min ago Main PID: 15859 (code=exited, status=0/SUCCESS) CGroup: /system.slice/supervdsmd.service
feb 12 13:08:30 minidell.home systemd[1]: Stopping "Auxiliary vdsm service for running helper functions as root"... feb 12 13:08:30 minidell.home daemonAdapter[15859]: Traceback (most recent call last): feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers feb 12 13:08:30 minidell.home daemonAdapter[15859]: finalizer() feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ feb 12 13:08:30 minidell.home daemonAdapter[15859]: res = self._callback(*self._args, **self._kwargs) feb 12 13:08:30 minidell.home daemonAdapter[15859]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 client mech dispose feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 common mech dispose feb 12 13:08:30 minidell.home systemd[1]: Stopped "Auxiliary vdsm service for running helper functions as root".
After a while the VM became unreachable as ovirt-hosted-engine-setup shows:
2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, [Errno 110] Connection timed out 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in state
Gluster logs don't show anything special.
I'm guessing that the 15 minutes of host-deploy being stuck caused the issue here, but I don't see any relevant change that may have caused it.
Attaching the tar.gz with the code used for ovirt-hosted-engine-setup; the rest is 3.5-snapshot over RHEL 7.1
Any hint is welcome.
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "David Teigland" <teigland@redhat.com> Sent: Thursday, February 12, 2015 4:24:55 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 14:03, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
until several minutes later when: 2015-02-12 13:24:31 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/system/clock.py", line 111, in _set_clock format="YYYYmmddHHMMSS+0000" File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/dialog/machine.py", line 235, in queryValue opcode, variable = self._readline().split(' ', 1) File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file
resumed the process.
Around the time when host-deploy got stucked I see in sanlock logs:
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:32+0100 7261 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:32+0100 7261 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:32+0100 7261 [16910]: s3 renewal error -107 delta_length 0 last_success 7239
Sanlock cannot access your storage. The rest of the errors are less interesting, as nothing will work without storage.
While sanlock is failing, can you access that storage from another host or from the host running sanlock?
Adding David for helping with debugging the sanlock errors.
If you mean something like: # mount -t glusterfs minidell.home:/hosted_engine_glusterfs /tmp/test # # ll /tmp/test/ totale 4 drwxr-xr-x. 6 vdsm kvm 4096 12 feb 11.49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rwxr-xr-x. 1 vdsm kvm 0 12 feb 13.04 __DIRECT_IO_TEST__
yes, I can still access the gluster volume.
While sanlock fail to update the lease or now? Try also to write to this storage.
# gluster volume info hosted_engine_glusterfs
Volume Name: hosted_engine_glusterfs Type: Distribute Volume ID: 8d78571d-69ef-41fb-bc0a-34dc7d903df5 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: minidell.home:/path1 Options Reconfigured: nfs.disable: on storage.owner-gid: 36 storage.owner-uid: 36 network.ping-timeout: 10 auth.allow: * cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off server.allow-insecure: on user.cifs: disable
# gluster volume status hosted_engine_glusterfs Status of volume: hosted_engine_glusterfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick minidell.home:/path1 49217 Y 16660
Task Status of Volume hosted_engine_glusterfs ------------------------------------------------------------------------------ There are no active volume tasks
And in VDSM logs:
MainThread::DEBUG::2015-02-12 13:08:25,529::vdsm::58::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2015-02-12 13:08:25,529::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor MainThread::INFO::2015-02-12 13:08:25,530::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2015-02-12 13:08:25,530::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2015-02-12 13:08:25,531::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2015-02-12 13:08:25,535::momIF::91::MOM::(stop) Shutting down MOM MainThread::DEBUG::2015-02-12 13:08:25,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c52bd13a-24f0-4d8a-97d8-f6c53ca8bddc`::moving from state init -> state preparing MainThread::INFO::2015-02-12 13:08:25,535::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2015-02-12 13:08:25,535::storageServer::706::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped ioprocess communication (16795)::ERROR::2015-02-12 13:08:25,530::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed ioprocess communication (16795)::DEBUG::2015-02-12 13:08:25,537::__init__::298::IOProcessClient::(_run) Starting IOProcess...
Followed by:
MainThread::INFO::2015-02-12 13:08:25,548::domainMonitor::137::Storage.DomainMonitor::(close) Stopping domain monitors MainThread::INFO::2015-02-12 13:08:25,549::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::INFO::2015-02-12 13:08:25,549::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f (id: 1) Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'a7dba6e3-09ac-430a-8d6e-eea17cf18b8f', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
supervdsm logs doesn't show anything interesting but:
# service supervdsmd status Redirecting to /bin/systemctl status supervdsmd.service supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: inactive (dead) since gio 2015-02-12 13:08:30 CET; 27min ago Main PID: 15859 (code=exited, status=0/SUCCESS) CGroup: /system.slice/supervdsmd.service
feb 12 13:08:30 minidell.home systemd[1]: Stopping "Auxiliary vdsm service for running helper functions as root"... feb 12 13:08:30 minidell.home daemonAdapter[15859]: Traceback (most recent call last): feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers feb 12 13:08:30 minidell.home daemonAdapter[15859]: finalizer() feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ feb 12 13:08:30 minidell.home daemonAdapter[15859]: res = self._callback(*self._args, **self._kwargs) feb 12 13:08:30 minidell.home daemonAdapter[15859]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 client mech dispose feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 common mech dispose feb 12 13:08:30 minidell.home systemd[1]: Stopped "Auxiliary vdsm service for running helper functions as root".
After a while the VM became unreachable as ovirt-hosted-engine-setup shows:
2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, [Errno 110] Connection timed out 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in state
Gluster logs don't show anything special.
I'm guessing that the 15 minutes of host-deploy being stuck caused the issue here, but I don't see any relevant change that may have caused it.
Attaching the tar.gz with the code used for ovirt-hosted-engine-setup; the rest is 3.5-snapshot over RHEL 7.1
Any hint is welcome.
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Il 12/02/2015 15:31, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "David Teigland" <teigland@redhat.com> Sent: Thursday, February 12, 2015 4:24:55 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 14:03, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
until several minutes later when: 2015-02-12 13:24:31 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/system/clock.py", line 111, in _set_clock format="YYYYmmddHHMMSS+0000" File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/dialog/machine.py", line 235, in queryValue opcode, variable = self._readline().split(' ', 1) File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file
resumed the process.
Around the time when host-deploy got stucked I see in sanlock logs:
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:32+0100 7261 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:32+0100 7261 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:32+0100 7261 [16910]: s3 renewal error -107 delta_length 0 last_success 7239
Sanlock cannot access your storage. The rest of the errors are less interesting, as nothing will work without storage.
While sanlock is failing, can you access that storage from another host or from the host running sanlock?
Adding David for helping with debugging the sanlock errors.
If you mean something like: # mount -t glusterfs minidell.home:/hosted_engine_glusterfs /tmp/test # # ll /tmp/test/ totale 4 drwxr-xr-x. 6 vdsm kvm 4096 12 feb 11.49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rwxr-xr-x. 1 vdsm kvm 0 12 feb 13.04 __DIRECT_IO_TEST__
yes, I can still access the gluster volume.
While sanlock fail to update the lease or now?
Try also to write to this storage.
dd if=/dev/zero of=/tmp/test/my_write_test bs=4k count=1k 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 0.0289396 s, 145 MB/s ll /tmp/test/ total 4100 -rwxr-xr-x. 1 vdsm kvm 0 Feb 12 13:04 __DIRECT_IO_TEST__ drwxr-xr-x. 6 vdsm kvm 4096 Feb 12 11:49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rw-r--r--. 1 root root 4194304 Feb 12 15:33 my_write_test looks like it works
# gluster volume info hosted_engine_glusterfs
Volume Name: hosted_engine_glusterfs Type: Distribute Volume ID: 8d78571d-69ef-41fb-bc0a-34dc7d903df5 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: minidell.home:/path1 Options Reconfigured: nfs.disable: on storage.owner-gid: 36 storage.owner-uid: 36 network.ping-timeout: 10 auth.allow: * cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off server.allow-insecure: on user.cifs: disable
# gluster volume status hosted_engine_glusterfs Status of volume: hosted_engine_glusterfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick minidell.home:/path1 49217 Y 16660
Task Status of Volume hosted_engine_glusterfs ------------------------------------------------------------------------------ There are no active volume tasks
And in VDSM logs:
MainThread::DEBUG::2015-02-12 13:08:25,529::vdsm::58::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2015-02-12 13:08:25,529::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor MainThread::INFO::2015-02-12 13:08:25,530::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2015-02-12 13:08:25,530::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2015-02-12 13:08:25,531::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2015-02-12 13:08:25,535::momIF::91::MOM::(stop) Shutting down MOM MainThread::DEBUG::2015-02-12 13:08:25,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c52bd13a-24f0-4d8a-97d8-f6c53ca8bddc`::moving from state init -> state preparing MainThread::INFO::2015-02-12 13:08:25,535::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2015-02-12 13:08:25,535::storageServer::706::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped ioprocess communication (16795)::ERROR::2015-02-12 13:08:25,530::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed ioprocess communication (16795)::DEBUG::2015-02-12 13:08:25,537::__init__::298::IOProcessClient::(_run) Starting IOProcess...
Followed by:
MainThread::INFO::2015-02-12 13:08:25,548::domainMonitor::137::Storage.DomainMonitor::(close) Stopping domain monitors MainThread::INFO::2015-02-12 13:08:25,549::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::INFO::2015-02-12 13:08:25,549::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f (id: 1) Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'a7dba6e3-09ac-430a-8d6e-eea17cf18b8f', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
supervdsm logs doesn't show anything interesting but:
# service supervdsmd status Redirecting to /bin/systemctl status supervdsmd.service supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: inactive (dead) since gio 2015-02-12 13:08:30 CET; 27min ago Main PID: 15859 (code=exited, status=0/SUCCESS) CGroup: /system.slice/supervdsmd.service
feb 12 13:08:30 minidell.home systemd[1]: Stopping "Auxiliary vdsm service for running helper functions as root"... feb 12 13:08:30 minidell.home daemonAdapter[15859]: Traceback (most recent call last): feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers feb 12 13:08:30 minidell.home daemonAdapter[15859]: finalizer() feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ feb 12 13:08:30 minidell.home daemonAdapter[15859]: res = self._callback(*self._args, **self._kwargs) feb 12 13:08:30 minidell.home daemonAdapter[15859]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 client mech dispose feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 common mech dispose feb 12 13:08:30 minidell.home systemd[1]: Stopped "Auxiliary vdsm service for running helper functions as root".
After a while the VM became unreachable as ovirt-hosted-engine-setup shows:
2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, [Errno 110] Connection timed out 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in state
Gluster logs don't show anything special.
I'm guessing that the 15 minutes of host-deploy being stuck caused the issue here, but I don't see any relevant change that may have caused it.
Attaching the tar.gz with the code used for ovirt-hosted-engine-setup; the rest is 3.5-snapshot over RHEL 7.1
Any hint is welcome.
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "David Teigland" <teigland@redhat.com> Sent: Thursday, February 12, 2015 4:34:48 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 15:31, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "David Teigland" <teigland@redhat.com> Sent: Thursday, February 12, 2015 4:24:55 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 14:03, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
until several minutes later when: 2015-02-12 13:24:31 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/system/clock.py", line 111, in _set_clock format="YYYYmmddHHMMSS+0000" File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/dialog/machine.py", line 235, in queryValue opcode, variable = self._readline().split(' ', 1) File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file
resumed the process.
Around the time when host-deploy got stucked I see in sanlock logs:
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:32+0100 7261 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:32+0100 7261 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:32+0100 7261 [16910]: s3 renewal error -107 delta_length 0 last_success 7239
Sanlock cannot access your storage. The rest of the errors are less interesting, as nothing will work without storage.
While sanlock is failing, can you access that storage from another host or from the host running sanlock?
Adding David for helping with debugging the sanlock errors.
If you mean something like: # mount -t glusterfs minidell.home:/hosted_engine_glusterfs /tmp/test # # ll /tmp/test/ totale 4 drwxr-xr-x. 6 vdsm kvm 4096 12 feb 11.49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rwxr-xr-x. 1 vdsm kvm 0 12 feb 13.04 __DIRECT_IO_TEST__
yes, I can still access the gluster volume.
While sanlock fail to update the lease or now?
?
Try also to write to this storage.
dd if=/dev/zero of=/tmp/test/my_write_test bs=4k count=1k 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 0.0289396 s, 145 MB/s
ll /tmp/test/ total 4100 -rwxr-xr-x. 1 vdsm kvm 0 Feb 12 13:04 __DIRECT_IO_TEST__ drwxr-xr-x. 6 vdsm kvm 4096 Feb 12 11:49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rw-r--r--. 1 root root 4194304 Feb 12 15:33 my_write_test
looks like it works
# gluster volume info hosted_engine_glusterfs
Volume Name: hosted_engine_glusterfs Type: Distribute Volume ID: 8d78571d-69ef-41fb-bc0a-34dc7d903df5 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: minidell.home:/path1 Options Reconfigured: nfs.disable: on storage.owner-gid: 36 storage.owner-uid: 36 network.ping-timeout: 10 auth.allow: * cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off server.allow-insecure: on user.cifs: disable
# gluster volume status hosted_engine_glusterfs Status of volume: hosted_engine_glusterfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick minidell.home:/path1 49217 Y 16660
Task Status of Volume hosted_engine_glusterfs ------------------------------------------------------------------------------ There are no active volume tasks
And in VDSM logs:
MainThread::DEBUG::2015-02-12 13:08:25,529::vdsm::58::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2015-02-12 13:08:25,529::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor MainThread::INFO::2015-02-12 13:08:25,530::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2015-02-12 13:08:25,530::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2015-02-12 13:08:25,531::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2015-02-12 13:08:25,535::momIF::91::MOM::(stop) Shutting down MOM MainThread::DEBUG::2015-02-12 13:08:25,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c52bd13a-24f0-4d8a-97d8-f6c53ca8bddc`::moving from state init -> state preparing MainThread::INFO::2015-02-12 13:08:25,535::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2015-02-12 13:08:25,535::storageServer::706::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped ioprocess communication (16795)::ERROR::2015-02-12 13:08:25,530::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed ioprocess communication (16795)::DEBUG::2015-02-12 13:08:25,537::__init__::298::IOProcessClient::(_run) Starting IOProcess...
Followed by:
MainThread::INFO::2015-02-12 13:08:25,548::domainMonitor::137::Storage.DomainMonitor::(close) Stopping domain monitors MainThread::INFO::2015-02-12 13:08:25,549::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::INFO::2015-02-12 13:08:25,549::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f (id: 1) Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'a7dba6e3-09ac-430a-8d6e-eea17cf18b8f', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
supervdsm logs doesn't show anything interesting but:
# service supervdsmd status Redirecting to /bin/systemctl status supervdsmd.service supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: inactive (dead) since gio 2015-02-12 13:08:30 CET; 27min ago Main PID: 15859 (code=exited, status=0/SUCCESS) CGroup: /system.slice/supervdsmd.service
feb 12 13:08:30 minidell.home systemd[1]: Stopping "Auxiliary vdsm service for running helper functions as root"... feb 12 13:08:30 minidell.home daemonAdapter[15859]: Traceback (most recent call last): feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers feb 12 13:08:30 minidell.home daemonAdapter[15859]: finalizer() feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ feb 12 13:08:30 minidell.home daemonAdapter[15859]: res = self._callback(*self._args, **self._kwargs) feb 12 13:08:30 minidell.home daemonAdapter[15859]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 client mech dispose feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 common mech dispose feb 12 13:08:30 minidell.home systemd[1]: Stopped "Auxiliary vdsm service for running helper functions as root".
After a while the VM became unreachable as ovirt-hosted-engine-setup shows:
2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, [Errno 110] Connection timed out 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in state
Gluster logs don't show anything special.
I'm guessing that the 15 minutes of host-deploy being stuck caused the issue here, but I don't see any relevant change that may have caused it.
Attaching the tar.gz with the code used for ovirt-hosted-engine-setup; the rest is 3.5-snapshot over RHEL 7.1
Any hint is welcome.
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Il 12/02/2015 15:41, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "David Teigland" <teigland@redhat.com> Sent: Thursday, February 12, 2015 4:34:48 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 15:31, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "David Teigland" <teigland@redhat.com> Sent: Thursday, February 12, 2015 4:24:55 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 14:03, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
until several minutes later when: 2015-02-12 13:24:31 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/system/clock.py", line 111, in _set_clock format="YYYYmmddHHMMSS+0000" File "/tmp/ovirt-ePMfjwrlpe/otopi-plugins/otopi/dialog/machine.py", line 235, in queryValue opcode, variable = self._readline().split(' ', 1) File "/tmp/ovirt-ePMfjwrlpe/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file
resumed the process.
Around the time when host-deploy got stucked I see in sanlock logs:
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:31+0100 7260 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:31+0100 7260 [16910]: s3 renewal error -107 delta_length 0 last_success 7239 2015-02-12 13:08:32+0100 7261 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res 2015-02-12 13:08:32+0100 7261 [16910]: s3 delta_renew read rv -107 offset 0 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/dom_md/ids 2015-02-12 13:08:32+0100 7261 [16910]: s3 renewal error -107 delta_length 0 last_success 7239
Sanlock cannot access your storage. The rest of the errors are less interesting, as nothing will work without storage.
While sanlock is failing, can you access that storage from another host or from the host running sanlock?
Adding David for helping with debugging the sanlock errors.
If you mean something like: # mount -t glusterfs minidell.home:/hosted_engine_glusterfs /tmp/test # # ll /tmp/test/ totale 4 drwxr-xr-x. 6 vdsm kvm 4096 12 feb 11.49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rwxr-xr-x. 1 vdsm kvm 0 12 feb 13.04 __DIRECT_IO_TEST__
yes, I can still access the gluster volume.
While sanlock fail to update the lease or now?
?
monitoring the storage while trying to reproduce on different host. attached the messages log requested in IRC
Try also to write to this storage.
dd if=/dev/zero of=/tmp/test/my_write_test bs=4k count=1k 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 0.0289396 s, 145 MB/s
ll /tmp/test/ total 4100 -rwxr-xr-x. 1 vdsm kvm 0 Feb 12 13:04 __DIRECT_IO_TEST__ drwxr-xr-x. 6 vdsm kvm 4096 Feb 12 11:49 a7dba6e3-09ac-430a-8d6e-eea17cf18b8f -rw-r--r--. 1 root root 4194304 Feb 12 15:33 my_write_test
looks like it works
# gluster volume info hosted_engine_glusterfs
Volume Name: hosted_engine_glusterfs Type: Distribute Volume ID: 8d78571d-69ef-41fb-bc0a-34dc7d903df5 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: minidell.home:/path1 Options Reconfigured: nfs.disable: on storage.owner-gid: 36 storage.owner-uid: 36 network.ping-timeout: 10 auth.allow: * cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off server.allow-insecure: on user.cifs: disable
# gluster volume status hosted_engine_glusterfs Status of volume: hosted_engine_glusterfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick minidell.home:/path1 49217 Y 16660
Task Status of Volume hosted_engine_glusterfs ------------------------------------------------------------------------------ There are no active volume tasks
And in VDSM logs:
MainThread::DEBUG::2015-02-12 13:08:25,529::vdsm::58::vds::(sigtermHandler) Received signal 15 MainThread::DEBUG::2015-02-12 13:08:25,529::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor MainThread::INFO::2015-02-12 13:08:25,530::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2015-02-12 13:08:25,530::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2015-02-12 13:08:25,531::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2015-02-12 13:08:25,535::momIF::91::MOM::(stop) Shutting down MOM MainThread::DEBUG::2015-02-12 13:08:25,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c52bd13a-24f0-4d8a-97d8-f6c53ca8bddc`::moving from state init -> state preparing MainThread::INFO::2015-02-12 13:08:25,535::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2015-02-12 13:08:25,535::storageServer::706::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped ioprocess communication (16795)::ERROR::2015-02-12 13:08:25,530::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed ioprocess communication (16795)::DEBUG::2015-02-12 13:08:25,537::__init__::298::IOProcessClient::(_run) Starting IOProcess...
Followed by:
MainThread::INFO::2015-02-12 13:08:25,548::domainMonitor::137::Storage.DomainMonitor::(close) Stopping domain monitors MainThread::INFO::2015-02-12 13:08:25,549::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Thread-94::INFO::2015-02-12 13:08:25,549::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f (id: 1) Thread-94::DEBUG::2015-02-12 13:08:25,549::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain a7dba6e3-09ac-430a-8d6e-eea17cf18b8f Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'a7dba6e3-09ac-430a-8d6e-eea17cf18b8f', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
supervdsm logs doesn't show anything interesting but:
# service supervdsmd status Redirecting to /bin/systemctl status supervdsmd.service supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: inactive (dead) since gio 2015-02-12 13:08:30 CET; 27min ago Main PID: 15859 (code=exited, status=0/SUCCESS) CGroup: /system.slice/supervdsmd.service
feb 12 13:08:30 minidell.home systemd[1]: Stopping "Auxiliary vdsm service for running helper functions as root"... feb 12 13:08:30 minidell.home daemonAdapter[15859]: Traceback (most recent call last): feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers feb 12 13:08:30 minidell.home daemonAdapter[15859]: finalizer() feb 12 13:08:30 minidell.home daemonAdapter[15859]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ feb 12 13:08:30 minidell.home daemonAdapter[15859]: res = self._callback(*self._args, **self._kwargs) feb 12 13:08:30 minidell.home daemonAdapter[15859]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 client mech dispose feb 12 13:08:30 minidell.home python[15859]: DIGEST-MD5 common mech dispose feb 12 13:08:30 minidell.home systemd[1]: Stopped "Auxiliary vdsm service for running helper functions as root".
After a while the VM became unreachable as ovirt-hosted-engine-setup shows:
2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, [Errno 110] Connection timed out 2015-02-12 13:10:33 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in state
Gluster logs don't show anything special.
I'm guessing that the 15 minutes of host-deploy being stuck caused the issue here, but I don't see any relevant change that may have caused it.
Attaching the tar.gz with the code used for ovirt-hosted-engine-setup; the rest is 3.5-snapshot over RHEL 7.1
Any hint is welcome.
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease
open(2) error was EPERM
2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2
open(2) error was ENOENT
2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res
aio result on open file was ENOTCONN Maybe gluster logged some errors about what happened here?

----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Sandro Bonazzola" <sbonazzo@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:13:49 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease
open(2) error was EPERM
selinux acting again? Sandro, can you attach also /var/log/audit/audit.log?
2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2
open(2) error was ENOENT
2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res
aio result on open file was ENOTCONN
Maybe gluster logged some errors about what happened here?

Il 12/02/2015 16:20, Nir Soffer ha scritto:
----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Sandro Bonazzola" <sbonazzo@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:13:49 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease
open(2) error was EPERM
selinux acting again?
Sandro, can you attach also /var/log/audit/audit.log?
Sure, but other than what I already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1191989 I don't think you'll find much more.
2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2
open(2) error was ENOENT
2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res
aio result on open file was ENOTCONN
Maybe gluster logged some errors about what happened here?
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com>, "David Teigland" <teigland@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:22:48 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:20, Nir Soffer ha scritto:
----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Sandro Bonazzola" <sbonazzo@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:13:49 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease
open(2) error was EPERM
selinux acting again?
Sandro, can you attach also /var/log/audit/audit.log?
Sure, but other than what I already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1191989 I don't think you'll find much more.
Can you reproduce this in permissive mode?
2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2
open(2) error was ENOENT
2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res
aio result on open file was ENOTCONN
Maybe gluster logged some errors about what happened here?
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Il 12/02/2015 16:26, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com>, "David Teigland" <teigland@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:22:48 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:20, Nir Soffer ha scritto:
----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Sandro Bonazzola" <sbonazzo@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:13:49 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
2015-02-12 13:08:25+0100 7254 [683]: open error -1 /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 06-68a2-4742-964c-413a3f77c0b4.lease
open(2) error was EPERM
selinux acting again?
Sandro, can you attach also /var/log/audit/audit.log?
Sure, but other than what I already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1191989 I don't think you'll find much more.
Can you reproduce this in permissive mode?
Yes, it's fully reproducible on a host in permissive mode. Also writing to the storage works: while true; do echo starting dd `date`; dd if=/dev/zero of=/tmp/test/testfile bs=4k count=1k oflag=direct ; echo dd finished `date`; rm -f /tmp/test/testfile ; sleep 10; done didn't show any issue while sanlock is failing. Attached the tar.gz of the ovirt-hosted-engine-setup I'm running and the full /var/log directory content. Note that both glusterd and vdsmd are running on the same host for the hyper-converged setup.
2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2
open(2) error was ENOENT
2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res
aio result on open file was ENOTCONN
Maybe gluster logged some errors about what happened here?
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "David Teigland" <teigland@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:43:28 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:26, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com>, "David Teigland" <teigland@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:22:48 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:20, Nir Soffer ha scritto:
----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Sandro Bonazzola" <sbonazzo@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:13:49 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
> 2015-02-12 13:08:25+0100 7254 [683]: open error -1 > /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 > 06-68a2-4742-964c-413a3f77c0b4.lease
open(2) error was EPERM
selinux acting again?
Sandro, can you attach also /var/log/audit/audit.log?
Sure, but other than what I already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1191989 I don't think you'll find much more.
Can you reproduce this in permissive mode?
Yes, it's fully reproducible on a host in permissive mode.
Also writing to the storage works:
while true; do echo starting dd `date`; dd if=/dev/zero of=/tmp/test/testfile bs=4k count=1k oflag=direct ; echo dd finished `date`; rm -f /tmp/test/testfile ; sleep 10; done
What does "df -T /tmp" show? /tmp is typically tempfs, so your dd test never touched any storage. Also, copying zeros can be ignored by smart storage, so better copy "real" file data: dd if=/dev/urandom of=testfile bs=1M count=1 Now use testfile for writing to storage: dd if=testfile of=/path/to/glusterfs/mount/testfile bs=1M count=1 oflag=direct You should do something like: 1. mount your gluster volume - same as vdsm does 2. start the write test, writing to the gluster storage 3. start your hosted engine flow 4. when it fails, keep your write test running until for a while
didn't show any issue while sanlock is failing.
Attached the tar.gz of the ovirt-hosted-engine-setup I'm running and the full /var/log directory content.
Note that both glusterd and vdsmd are running on the same host for the hyper-converged setup.
> 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2
open(2) error was ENOENT
> 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 > 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res
aio result on open file was ENOTCONN
Maybe gluster logged some errors about what happened here?
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Il 12/02/2015 17:03, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "David Teigland" <teigland@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:43:28 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:26, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com>, "David Teigland" <teigland@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:22:48 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:20, Nir Soffer ha scritto:
----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Sandro Bonazzola" <sbonazzo@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:13:49 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
>> 2015-02-12 13:08:25+0100 7254 [683]: open error -1 >> /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 >> 06-68a2-4742-964c-413a3f77c0b4.lease
open(2) error was EPERM
selinux acting again?
Sandro, can you attach also /var/log/audit/audit.log?
Sure, but other than what I already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1191989 I don't think you'll find much more.
Can you reproduce this in permissive mode?
Yes, it's fully reproducible on a host in permissive mode.
Also writing to the storage works:
while true; do echo starting dd `date`; dd if=/dev/zero of=/tmp/test/testfile bs=4k count=1k oflag=direct ; echo dd finished `date`; rm -f /tmp/test/testfile ; sleep 10; done
What does "df -T /tmp" show?
# df -T /tmp /tmp/test/ File system Tipo 1K-blocchi Usati Disponib. Uso% Montato su /dev/sda6 ext4 40185112 5347916 32772812 15% / 192.168.1.107:/hosted_engine_glusterfs fuse.glusterfs 40185088 5347968 32772736 15% /tmp/test
/tmp is typically tempfs, so your dd test never touched any storage.
Also, copying zeros can be ignored by smart storage, so better copy "real" file data:
dd if=/dev/urandom of=testfile bs=1M count=1
Now use testfile for writing to storage:
dd if=testfile of=/path/to/glusterfs/mount/testfile bs=1M count=1 oflag=direct
You should do something like:
1. mount your gluster volume - same as vdsm does 2. start the write test, writing to the gluster storage 3. start your hosted engine flow 4. when it fails, keep your write test running until for a while
If you prefer I can also do a double test, one on /tmp/test and one on /rhev/data-center/mnt/glusterSD/mididell.home\:_hosted__engine__glusterfs/ I'll try to do the test tomorrow.
didn't show any issue while sanlock is failing.
Attached the tar.gz of the ovirt-hosted-engine-setup I'm running and the full /var/log directory content.
Note that both glusterd and vdsmd are running on the same host for the hyper-converged setup.
>> 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2
open(2) error was ENOENT
>> 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 >> 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res
aio result on open file was ENOTCONN
Maybe gluster logged some errors about what happened here?
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "David Teigland" <teigland@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 6:11:16 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 17:03, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "David Teigland" <teigland@redhat.com>, "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:43:28 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:26, Nir Soffer ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com>, "David Teigland" <teigland@redhat.com> Cc: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 5:22:48 PM Subject: Re: [ovirt-devel] [HC] Weird issue while deploying hosted-engine
Il 12/02/2015 16:20, Nir Soffer ha scritto:
----- Original Message ----- > From: "David Teigland" <teigland@redhat.com> > To: "Nir Soffer" <nsoffer@redhat.com> > Cc: "Sandro Bonazzola" <sbonazzo@redhat.com>, "Yeela Kaplan" > <ykaplan@redhat.com>, "Alon Bar-Lev" > <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv > Dary" > <ydary@redhat.com>, "Federico Simoncelli" > <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah > Bar > David" <didi@redhat.com>, "Simone > Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" > <mskrivan@redhat.com> > Sent: Thursday, February 12, 2015 5:13:49 PM > Subject: Re: [ovirt-devel] [HC] Weird issue while deploying > hosted-engine > >>> 2015-02-12 13:08:25+0100 7254 [683]: open error -1 >>> /rhev/data-center/mnt/glusterSD/minidell.home:_hosted__engine__glusterfs/a7dba6e3-09ac-430a-8d6e-eea17cf18b8f/images/51156c24-5ea1-42df-bf33-daebf2c4780c/3719b0 >>> 06-68a2-4742-964c-413a3f77c0b4.lease > > open(2) error was EPERM
selinux acting again?
Sandro, can you attach also /var/log/audit/audit.log?
Sure, but other than what I already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1191989 I don't think you'll find much more.
Can you reproduce this in permissive mode?
Yes, it's fully reproducible on a host in permissive mode.
Also writing to the storage works:
while true; do echo starting dd `date`; dd if=/dev/zero of=/tmp/test/testfile bs=4k count=1k oflag=direct ; echo dd finished `date`; rm -f /tmp/test/testfile ; sleep 10; done
What does "df -T /tmp" show?
# df -T /tmp /tmp/test/ File system Tipo 1K-blocchi Usati Disponib. Uso% Montato su /dev/sda6 ext4 40185112 5347916 32772812 15% / 192.168.1.107:/hosted_engine_glusterfs fuse.glusterfs 40185088 5347968 32772736 15% /tmp/test
/tmp is typically tempfs, so your dd test never touched any storage.
Also, copying zeros can be ignored by smart storage, so better copy "real" file data:
dd if=/dev/urandom of=testfile bs=1M count=1
Now use testfile for writing to storage:
dd if=testfile of=/path/to/glusterfs/mount/testfile bs=1M count=1 oflag=direct
You should do something like:
1. mount your gluster volume - same as vdsm does 2. start the write test, writing to the gluster storage 3. start your hosted engine flow 4. when it fails, keep your write test running until for a while
If you prefer I can also do a double test, one on /tmp/test and one on /rhev/data-center/mnt/glusterSD/mididell.home\:_hosted__engine__glusterfs/ I'll try to do the test tomorrow.
Writing to /tmp/test seems ok. It will be nice to see a test using random data instead of zeros. I think David should continue to investigate this to understand why sanlock cannot access storage.
didn't show any issue while sanlock is failing.
Attached the tar.gz of the ovirt-hosted-engine-setup I'm running and the full /var/log directory content.
Note that both glusterd and vdsmd are running on the same host for the hyper-converged setup.
> >>> 2015-02-12 13:08:25+0100 7254 [683]: r4 release_token open error -2 > > open(2) error was ENOENT > >>> 2015-02-12 13:08:31+0100 7260 [16910]: a7dba6e3 aio collect 0 >>> 0x7f1d640008c0:0x7f1d640008d0:0x7f1d7ec52000 result -107:0 match res > > aio result on open file was ENOTCONN > > Maybe gluster logged some errors about what happened here? > >
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
Not sure about the rest, but this seems to be some time checking issue. Did not check the sources, only my own host-deploy logs. My guess: otopi (internal plugin?) checks the system clock. It tries ntpq, which failed for you, then chronyc - on my system, you do not seem to have it - and then just asks the engine for the time, and perhaps the engine does not know how to reply. Try installing chrony. -- Didi

Il 12/02/2015 14:16, Yedidyah Bar David ha scritto:
----- Original Message -----
From: "Sandro Bonazzola" <sbonazzo@redhat.com> To: "Yeela Kaplan" <ykaplan@redhat.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "Allon Mureinik" <amureini@redhat.com>, "Yaniv Dary" <ydary@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Sahina Bose" <sabose@redhat.com>, "Yedidyah Bar David" <didi@redhat.com>, "Simone Tiraboschi" <stirabos@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Thursday, February 12, 2015 2:55:57 PM Subject: [HC] Weird issue while deploying hosted-engine
While deploying Hosted Engine on Hyper Converged Gluster Storage I've hit the following issue.
ovirt-hosted-engine-setup:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational...
Accessed to the engine, VM and engine working correctly, installed using 3.5 snapshot repositories. host-deploy started on the host (see attached logs) and got stuck at:
2015-02-12 13:08:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:215 DIALOG:SEND ### Response is VALUE TIME=type:value or ABORT TIME
Not sure about the rest, but this seems to be some time checking issue. Did not check the sources, only my own host-deploy logs. My guess:
otopi (internal plugin?) checks the system clock. It tries ntpq, which failed for you, then chronyc - on my system, you do not seem to have it - and then just asks the engine for the time, and perhaps the engine does not know how to reply. Try installing chrony.
Not sure that chrony is really needed, I've ntpd. But I've seen that by default it's not enabled on RHEL 7.1. # service ntpd status Redirecting to /bin/systemctl status ntpd.service ntpd.service - Network Time Service Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled) Active: inactive (dead) and host-deploy doesn't check if it's alive before calling /sbin/ntpq -c rv I'm not sure about what the engine should send in this case because the VM get paused here. -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
participants (4)
-
David Teigland
-
Nir Soffer
-
Sandro Bonazzola
-
Yedidyah Bar David