<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    On 27-03-2015 7:03, Dan Kenigsberg wrote:<br>
    <blockquote cite="mid:20150327100348.GB11819@redhat.com" type="cite">
      <pre wrap="">On Thu, Mar 26, 2015 at 06:16:24PM -0300, Christopher Pereira wrote:
</pre>
      <blockquote type="cite">
        <pre wrap="">Continuing with the 3.6 Night Builds testing...

While hosted-engine-setup was adding the host to the newly created cluster,
VDSM crashed, probably because the gluster engine storage disappeared as in
BZ 1201355 [1]

Facts:
    - the engine storage (/rhev/data-center/mmt/...) was umounted during
this process
    - another mount of the same volume was still mounted after the VDSM
crash (maybe the problem is not related with gluster)
</pre>
      </blockquote>
      <pre wrap="">
What exactly happened to vdsm? Did the process die? Why? Was it stopped?
did it segfault? Did it stop responding? Can you share vdsm.log and
/var/log/message showing what happened during the crash?
</pre>
    </blockquote>
    Hi Dan,<br>
    <br>
    You will find relevants logs here:<br>
    <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1201355#c4">https://bugzilla.redhat.com/show_bug.cgi?id=1201355#c4</a><br>
    <br>
    Summary:<br>
    <br>
    1) During setup, VDSM receives a SIGTERM:<br>
    MainThread::DEBUG::2015-03-26
    18:36:56,767::vdsm::66::vds::(sigtermHandler) Received signal 15<br>
    <br>
    Maybe the activation process installs VDSM and/or restarts it.<br>
    <br>
    2) Since the gluster storage is mounted from a VDSM ChildProcess, it
    disappears when VDSM stops.<br>
    Thus, the VM is paused and will never resume (even after remounting
    the storage, because the paused QEMU process keeps invalid file
    descriptors):<br>
    <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1058300">https://bugzilla.redhat.com/show_bug.cgi?id=1058300</a><br>
    <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1172905">https://bugzilla.redhat.com/show_bug.cgi?id=1172905</a><br>
    <br>
    3) After the VDSM stopped, it's not possible to restart it since you
    will get an "invalid lockspace" in sanlock.<br>
    This can be solved with hosted-engine --start-pool.<br>
    <br>
    4) You will be able to reproduce the VDSM sigterm with less effort
    (no need to re-deploy) by accessing the engine portal and
    reactivating the host.<br>
    You will see that VDSM gets stopped and the storage lost.<br>
    As a workarround to avoid the storage to get lost, you can mount it
    manually so that it doesn't relay on the VDSM ChildProcess.<br>
    <br>
    Questions:<br>
    <br>
    1) I'm affraid that by activating the host manually after an
    interrupted setup I may be skipping some special configurations.<br>
    Is there any difference between activating the host manually from
    the web-manager and activating the host with the setup script?<br>
    How can I complete the setup manually?<br>
    <br>
    Status:<br>
    <br>
    I'm still unable to activate the host manually, because engine is
    now having problems with the JsonRPC communcation:<br>
    <blockquote>2015-03-27 10:11:54,889 INFO 
      [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
      Reactor) [] Connecting to h2.imatronix.com/209.126.105.36<br>
      2015-03-27 10:11:54,893 ERROR
      [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp
      Reactor) [] <b>Unable to process messages</b><br>
      2015-03-27 10:11:54,893 ERROR
      [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand]
      (DefaultQuartzScheduler_Worker-96) [] Command
      'ListVDSCommand(HostName = h2, HostId =
      46d4659a-4efe-4427-aa68-a4536508fa08,
      vds=Host[h2,46d4659a-4efe-4427-aa68-a4536508fa08])' execution
      failed: VDSGenericException: VDSNetworkException: General
      SSLEngine problem<br>
      2015-03-27 10:11:54,894 ERROR
      [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl]
      (DefaultQuartzScheduler_Worker-96) [] Failed to invoke scheduled
      method vmsMonitoring: null<br>
      <br>
      2015-03-27 10:11:57,894 INFO 
      [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
      Reactor) [] Connecting to h2.imatronix.com/209.126.105.36<br>
      2015-03-27 10:11:57,897 ERROR
      [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp
      Reactor) [] <b>Unable to process messages</b><br>
      2015-03-27 10:11:57,897 INFO 
      [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
      (DefaultQuartzScheduler_Worker-95) [] Command
      'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand'
      return value
'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturnForXmlRpc@79313585'<br>
      2015-03-27 10:11:57,898 INFO 
      [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
      (DefaultQuartzScheduler_Worker-95) [] HostName = h2<br>
      2015-03-27 10:11:57,898 ERROR
      [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
      (DefaultQuartzScheduler_Worker-95) [] Command
      'GetCapabilitiesVDSCommand(HostName = h2, HostId =
      46d4659a-4efe-4427-aa68-a4536508fa08,
      vds=Host[h2,46d4659a-4efe-4427-aa68-a4536508fa08])' execution
      failed: VDSGenericException: VDSNetworkException: <b>General
        SSLEngine problem</b><br>
      2015-03-27 10:11:57,898 ERROR
      [org.ovirt.engine.core.vdsbroker.HostMonitoring]
      (DefaultQuartzScheduler_Worker-95) [] Failure to refresh Vds
      runtime info: VDSGenericException: VDSNetworkException: General
      SSLEngine problem<br>
      2015-03-27 10:11:57,898 ERROR
      [org.ovirt.engine.core.vdsbroker.HostMonitoring]
      (DefaultQuartzScheduler_Worker-95) [] Exception:
      org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
      VDSGenericException: VDSNetworkException: General SSLEngine
      problem<br>
              at
      org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:183)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:16)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:55)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
      [dal.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:465)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:587)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:111)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:76)
      [vdsbroker.jar:]<br>
              at
      org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:199)
      [vdsbroker.jar:]<br>
              at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown
      Source) [:1.7.0_75]<br>
              at
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      [rt.jar:1.7.0_75]<br>
              at java.lang.reflect.Method.invoke(Method.java:606)
      [rt.jar:1.7.0_75]<br>
              at
      org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
      [scheduler.jar:]<br>
              at
      org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
      [scheduler.jar:]<br>
              at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
      [quartz.jar:]<br>
              at
      org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
      [quartz.jar:]<br>
      <br>
      2015-03-27 10:11:57,899 WARN 
      [org.ovirt.engine.core.vdsbroker.VdsManager]
      (DefaultQuartzScheduler_Worker-95) [] Failed to refresh VDS,
      network error, continuing,
      vds='h2'(46d4659a-4efe-4427-aa68-a4536508fa08):
      VDSGenericException: VDSNetworkException: <b>General SSLEngine
        problem</b><br>
      [...]<br>
      <br>
    </blockquote>
    On the VDSM side, we have:<br>
    <br>
    <blockquote>clientIFinit::DEBUG::2015-03-27
      10:11:53,098::task::592::Storage.TaskManager.Task::(_updateState)
      Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::moving from state
      init -&gt; state preparing<br>
      clientIFinit::INFO::2015-03-27
      10:11:53,098::logUtils::48::dispatcher::(wrapper) Run and protect:
      getConnectedStoragePoolsList(options=None)<br>
      clientIFinit::INFO::2015-03-27
      10:11:53,098::logUtils::51::dispatcher::(wrapper) Run and protect:
      getConnectedStoragePoolsList, Return response: {'poollist': []}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:53,098::task::1188::Storage.TaskManager.Task::(prepare)
      Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::finished:
      {'poollist': []}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:53,098::task::592::Storage.TaskManager.Task::(_updateState)
      Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::moving from state
      preparing -&gt; state finished<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:53,098::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
      Owner.releaseAll requests {} resources {}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:53,098::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
      Owner.cancelAll requests {}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:53,098::task::990::Storage.TaskManager.Task::(_decref)
      Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::ref 0 aborting False<br>
      Detector thread::DEBUG::2015-03-27
      10:11:53,450::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
      <b>Adding connection 209.239.124.8:54218</b><br>
      Detector thread::DEBUG::2015-03-27
      10:11:53,459::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
      <b>Error during handshake: sslv3 alert certificate unknown</b><br>
      Detector thread::DEBUG::2015-03-27
      10:11:53,459::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
      Removing connection 209.239.124.8:54218<br>
      Detector thread::DEBUG::2015-03-27
      10:11:55,249::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
      Adding connection 209.126.113.73:54119<br>
      Detector thread::DEBUG::2015-03-27
      10:11:55,252::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
      Error during handshake: unexpected eof<br>
      Detector thread::DEBUG::2015-03-27
      10:11:55,252::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
      Removing connection 209.126.113.73:54119<br>
      Detector thread::DEBUG::2015-03-27
      10:11:56,582::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
      Adding connection 209.239.124.8:39606<br>
      Detector thread::DEBUG::2015-03-27
      10:11:56,629::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
      Error during handshake: sslv3 alert certificate unknown<br>
      Detector thread::DEBUG::2015-03-27
      10:11:56,629::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
      Removing connection 209.239.124.8:39606<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:58,104::task::592::Storage.TaskManager.Task::(_updateState)
      Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::moving from state
      init -&gt; state preparing<br>
      clientIFinit::INFO::2015-03-27
      10:11:58,104::logUtils::48::dispatcher::(wrapper) Run and protect:
      getConnectedStoragePoolsList(options=None)<br>
      clientIFinit::INFO::2015-03-27
      10:11:58,104::logUtils::51::dispatcher::(wrapper) Run and protect:
      getConnectedStoragePoolsList, Return response: {'poollist': []}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:58,104::task::1188::Storage.TaskManager.Task::(prepare)
      Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::finished:
      {'poollist': []}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:58,104::task::592::Storage.TaskManager.Task::(_updateState)
      Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::moving from state
      preparing -&gt; state finished<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:58,104::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
      Owner.releaseAll requests {} resources {}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:58,104::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
      Owner.cancelAll requests {}<br>
      clientIFinit::DEBUG::2015-03-27
      10:11:58,104::task::990::Storage.TaskManager.Task::(_decref)
      Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::ref 0 aborting False<br>
      [...]<br>
    </blockquote>
    I guess this is related to an invalid certificate or some protocol
    version missmatch.<br>
    How can I fix it?<br>
    <br>
    Regards,<br>
    Christopher<br>
    <br>
  </body>
</html>