<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 27-03-2015 7:03, Dan Kenigsberg wrote:<br>
<blockquote cite="mid:20150327100348.GB11819@redhat.com" type="cite">
<pre wrap="">On Thu, Mar 26, 2015 at 06:16:24PM -0300, Christopher Pereira wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Continuing with the 3.6 Night Builds testing...
While hosted-engine-setup was adding the host to the newly created cluster,
VDSM crashed, probably because the gluster engine storage disappeared as in
BZ 1201355 [1]
Facts:
- the engine storage (/rhev/data-center/mmt/...) was umounted during
this process
- another mount of the same volume was still mounted after the VDSM
crash (maybe the problem is not related with gluster)
</pre>
</blockquote>
<pre wrap="">
What exactly happened to vdsm? Did the process die? Why? Was it stopped?
did it segfault? Did it stop responding? Can you share vdsm.log and
/var/log/message showing what happened during the crash?
</pre>
</blockquote>
Hi Dan,<br>
<br>
You will find relevants logs here:<br>
<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1201355#c4">https://bugzilla.redhat.com/show_bug.cgi?id=1201355#c4</a><br>
<br>
Summary:<br>
<br>
1) During setup, VDSM receives a SIGTERM:<br>
MainThread::DEBUG::2015-03-26
18:36:56,767::vdsm::66::vds::(sigtermHandler) Received signal 15<br>
<br>
Maybe the activation process installs VDSM and/or restarts it.<br>
<br>
2) Since the gluster storage is mounted from a VDSM ChildProcess, it
disappears when VDSM stops.<br>
Thus, the VM is paused and will never resume (even after remounting
the storage, because the paused QEMU process keeps invalid file
descriptors):<br>
<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1058300">https://bugzilla.redhat.com/show_bug.cgi?id=1058300</a><br>
<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1172905">https://bugzilla.redhat.com/show_bug.cgi?id=1172905</a><br>
<br>
3) After the VDSM stopped, it's not possible to restart it since you
will get an "invalid lockspace" in sanlock.<br>
This can be solved with hosted-engine --start-pool.<br>
<br>
4) You will be able to reproduce the VDSM sigterm with less effort
(no need to re-deploy) by accessing the engine portal and
reactivating the host.<br>
You will see that VDSM gets stopped and the storage lost.<br>
As a workarround to avoid the storage to get lost, you can mount it
manually so that it doesn't relay on the VDSM ChildProcess.<br>
<br>
Questions:<br>
<br>
1) I'm affraid that by activating the host manually after an
interrupted setup I may be skipping some special configurations.<br>
Is there any difference between activating the host manually from
the web-manager and activating the host with the setup script?<br>
How can I complete the setup manually?<br>
<br>
Status:<br>
<br>
I'm still unable to activate the host manually, because engine is
now having problems with the JsonRPC communcation:<br>
<blockquote>2015-03-27 10:11:54,889 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
Reactor) [] Connecting to h2.imatronix.com/209.126.105.36<br>
2015-03-27 10:11:54,893 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp
Reactor) [] <b>Unable to process messages</b><br>
2015-03-27 10:11:54,893 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand]
(DefaultQuartzScheduler_Worker-96) [] Command
'ListVDSCommand(HostName = h2, HostId =
46d4659a-4efe-4427-aa68-a4536508fa08,
vds=Host[h2,46d4659a-4efe-4427-aa68-a4536508fa08])' execution
failed: VDSGenericException: VDSNetworkException: General
SSLEngine problem<br>
2015-03-27 10:11:54,894 ERROR
[org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl]
(DefaultQuartzScheduler_Worker-96) [] Failed to invoke scheduled
method vmsMonitoring: null<br>
<br>
2015-03-27 10:11:57,894 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
Reactor) [] Connecting to h2.imatronix.com/209.126.105.36<br>
2015-03-27 10:11:57,897 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp
Reactor) [] <b>Unable to process messages</b><br>
2015-03-27 10:11:57,897 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler_Worker-95) [] Command
'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand'
return value
'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturnForXmlRpc@79313585'<br>
2015-03-27 10:11:57,898 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler_Worker-95) [] HostName = h2<br>
2015-03-27 10:11:57,898 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler_Worker-95) [] Command
'GetCapabilitiesVDSCommand(HostName = h2, HostId =
46d4659a-4efe-4427-aa68-a4536508fa08,
vds=Host[h2,46d4659a-4efe-4427-aa68-a4536508fa08])' execution
failed: VDSGenericException: VDSNetworkException: <b>General
SSLEngine problem</b><br>
2015-03-27 10:11:57,898 ERROR
[org.ovirt.engine.core.vdsbroker.HostMonitoring]
(DefaultQuartzScheduler_Worker-95) [] Failure to refresh Vds
runtime info: VDSGenericException: VDSNetworkException: General
SSLEngine problem<br>
2015-03-27 10:11:57,898 ERROR
[org.ovirt.engine.core.vdsbroker.HostMonitoring]
(DefaultQuartzScheduler_Worker-95) [] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: General SSLEngine
problem<br>
at
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:183)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:16)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:55)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
[dal.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:465)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:587)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:111)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:76)
[vdsbroker.jar:]<br>
at
org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:199)
[vdsbroker.jar:]<br>
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown
Source) [:1.7.0_75]<br>
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[rt.jar:1.7.0_75]<br>
at java.lang.reflect.Method.invoke(Method.java:606)
[rt.jar:1.7.0_75]<br>
at
org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
[scheduler.jar:]<br>
at
org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
[scheduler.jar:]<br>
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
[quartz.jar:]<br>
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
[quartz.jar:]<br>
<br>
2015-03-27 10:11:57,899 WARN
[org.ovirt.engine.core.vdsbroker.VdsManager]
(DefaultQuartzScheduler_Worker-95) [] Failed to refresh VDS,
network error, continuing,
vds='h2'(46d4659a-4efe-4427-aa68-a4536508fa08):
VDSGenericException: VDSNetworkException: <b>General SSLEngine
problem</b><br>
[...]<br>
<br>
</blockquote>
On the VDSM side, we have:<br>
<br>
<blockquote>clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::592::Storage.TaskManager.Task::(_updateState)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::moving from state
init -> state preparing<br>
clientIFinit::INFO::2015-03-27
10:11:53,098::logUtils::48::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList(options=None)<br>
clientIFinit::INFO::2015-03-27
10:11:53,098::logUtils::51::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList, Return response: {'poollist': []}<br>
clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::1188::Storage.TaskManager.Task::(prepare)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::finished:
{'poollist': []}<br>
clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::592::Storage.TaskManager.Task::(_updateState)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::moving from state
preparing -> state finished<br>
clientIFinit::DEBUG::2015-03-27
10:11:53,098::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}<br>
clientIFinit::DEBUG::2015-03-27
10:11:53,098::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}<br>
clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::990::Storage.TaskManager.Task::(_decref)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::ref 0 aborting False<br>
Detector thread::DEBUG::2015-03-27
10:11:53,450::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
<b>Adding connection 209.239.124.8:54218</b><br>
Detector thread::DEBUG::2015-03-27
10:11:53,459::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
<b>Error during handshake: sslv3 alert certificate unknown</b><br>
Detector thread::DEBUG::2015-03-27
10:11:53,459::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 209.239.124.8:54218<br>
Detector thread::DEBUG::2015-03-27
10:11:55,249::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
Adding connection 209.126.113.73:54119<br>
Detector thread::DEBUG::2015-03-27
10:11:55,252::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
Error during handshake: unexpected eof<br>
Detector thread::DEBUG::2015-03-27
10:11:55,252::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 209.126.113.73:54119<br>
Detector thread::DEBUG::2015-03-27
10:11:56,582::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
Adding connection 209.239.124.8:39606<br>
Detector thread::DEBUG::2015-03-27
10:11:56,629::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
Error during handshake: sslv3 alert certificate unknown<br>
Detector thread::DEBUG::2015-03-27
10:11:56,629::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 209.239.124.8:39606<br>
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::592::Storage.TaskManager.Task::(_updateState)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::moving from state
init -> state preparing<br>
clientIFinit::INFO::2015-03-27
10:11:58,104::logUtils::48::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList(options=None)<br>
clientIFinit::INFO::2015-03-27
10:11:58,104::logUtils::51::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList, Return response: {'poollist': []}<br>
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::1188::Storage.TaskManager.Task::(prepare)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::finished:
{'poollist': []}<br>
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::592::Storage.TaskManager.Task::(_updateState)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::moving from state
preparing -> state finished<br>
clientIFinit::DEBUG::2015-03-27
10:11:58,104::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}<br>
clientIFinit::DEBUG::2015-03-27
10:11:58,104::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}<br>
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::990::Storage.TaskManager.Task::(_decref)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::ref 0 aborting False<br>
[...]<br>
</blockquote>
I guess this is related to an invalid certificate or some protocol
version missmatch.<br>
How can I fix it?<br>
<br>
Regards,<br>
Christopher<br>
<br>
</body>
</html>