<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 01/11/2017 09:29 AM, Yaniv Kaul
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAJgorsZPeVA_1u=Pp16MbQYsk_gUUqzw0V4rPekdJw2MFwNoqw@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Wed, Jan 11, 2017 at 10:26 AM,
            Francesco Romani <span dir="ltr">&lt;<a
                moz-do-not-send="true" href="mailto:fromani@redhat.com"
                target="_blank">fromani@redhat.com</a>&gt;</span> wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF">
                <p>Hi all<br>
                </p>
                <span class="gmail-"> <br>
                  <div
                    class="gmail-m_-5785517619226444427moz-cite-prefix">On
                    01/11/2017 08:52 AM, Eyal Edri wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">Adding Tomas from Virt.</div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Tue, Jan 10, 2017 at
                        10:54 AM, Piotr Kliczewski <span dir="ltr">&lt;<a
                            moz-do-not-send="true"
                            href="mailto:piotr.kliczewski@gmail.com"
                            target="_blank">piotr.kliczewski@gmail.com</a>&gt;</span>
                        wrote:<br>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex"><span>On
                            Tue, Jan 10, 2017 at 9:29 AM, Daniel Belenky
                            &lt;<a moz-do-not-send="true"
                              href="mailto:dbelenky@redhat.com"
                              target="_blank">dbelenky@redhat.com</a>&gt;
                            wrote:<br>
                            &gt; Hi all,<br>
                            &gt;<br>
                            &gt; test-repo_ovirt_experimental_m<wbr>aster
                            (link to Jenkins) job failed on<br>
                            &gt; basic_sanity scenario.<br>
                            &gt; The job was triggered by <a
                              moz-do-not-send="true"
                              href="https://gerrit.ovirt.org/#/c/69845/"
                              rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>9845/</a><br>
                            &gt;<br>
                            &gt; From looking at the logs, it seems that
                            the reason is VDSM.<br>
                            &gt;<br>
                            &gt; In the VDSM log, i see the following
                            error:<br>
                            &gt;<br>
                            &gt; 2017-01-09 16:47:41,331 ERROR (JsonRpc
                            (StompReactor)) [vds.dispatcher] SSL<br>
                            &gt; error receiving from
                            &lt;yajsonrpc.betterAsyncore.Disp<wbr>atcher
                            connected ('::1',<br>
                            &gt; 34942, 0, 0) at 0x36b95f0&gt;:
                            unexpected eof (betterAsyncore:119)<br>
                          </span></blockquote>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> Daniel, could you please remind me the jenkins
                link? I see something suspicious on the Vdsm log.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Please use my live system:</div>
            <div style="font-size:12.8px">ssh <a moz-do-not-send="true"
                href="mailto:mini@ykaul-mini.tlv.redhat.com"
                target="_blank"><span
                  class="gmail-m_1130033870917217609gmail-m_-7038708660695866355gmail-il"><span
                    class="gmail-m_1130033870917217609gmail-il">mini</span></span>@<span
class="gmail-m_1130033870917217609gmail-m_-7038708660695866355gmail-il">ykaul</span>-<span
class="gmail-m_1130033870917217609gmail-m_-7038708660695866355gmail-il"><span
                    class="gmail-m_1130033870917217609gmail-il">mini</span></span>.tlv.redhat<wbr>.com</a>
              (redacted)<br>
            </div>
            <div style="font-size:12.8px">then run a console to the VM:</div>
            <div><span style="font-size:12.8px">lagocli --prefix-path
                /dev/shm/run/current shell engine</span> </div>
            <div><br>
            </div>
            <div>(or 'host0' for the host)</div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> Most notably, Vdsm received
                SIGTERM. Is this expected and part of the test?</div>
            </blockquote>
            <div><br>
            </div>
            <div>It's not.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I fooled myself. We have two hosts here. host0 was holding the VM up
    until the suspend. Then Engine decided to resume the VM on the other
    one, host1.<br>
    While the VM was resuming, host0 experienced network issues, which
    led to soft-fencing. That explains the mess on host0, even though it
    is still<br>
    unclear why host0 had network issues and heartbeat exceeded in the
    first place.<br>
    <br>
    On host1 the waters are even darker. The VM was resumed ~02:36<br>
    <br>
    2017-01-11 02:36:04,700-05 INFO 
    [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (default
    task-17) [72c41f12-649b-4833-8485-44e8d20d2b49] FINISH,
    CreateVmVDSCommand, return: RestoringState, log id: 378da701<br>
    2017-01-11 02:36:04,700-05 INFO 
    [org.ovirt.engine.core.bll.RunVmCommand] (default task-17)
    [72c41f12-649b-4833-8485-44e8d20d2b49] Lock freed to object
    'EngineLock:{exclusiveLocks='[42860011-acc3-44d6-9ddf-dea64<br>
    2caf083=&lt;VM, ACTION_TYPE_FAILED_OBJECT_LOCKED&gt;]',
    sharedLocks='null'}'<br>
    2017-01-11 02:36:04,704-05 INFO 
    [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
    (default task-17) [72c41f12-649b-4833-8485-44e8d20d2b49] Correlation
    ID: 72c41f12-649b-4833-8485-44e8d20d2b<br>
    49, Job ID: a93b571e-aed1-40e7-8d71-831f646255fb, Call Stack: null,
    Custom Event ID: -1, Message: VM vm0 was started by
    admin@internal-authz (Host: lago-basic-suite-master-host1).<br>
    <br>
    While well within the timeout limit (600s), the vm was still
    restoring its state:<br>
    <br>
    2017-01-11 02:37:31,059-05 DEBUG
    [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
    (DefaultQuartzScheduler8) [5582058d] START,
    GetAllVmStatsVDSCommand(HostName = lago-basic-suite-master-host1,
    VdsIdVDSCommandParametersBase:{runAsync='true',
    hostId='0336661b-721f-4c55-9327-8fd2fd3a0542'}), log id: 1803c51b<br>
    2017-01-11 02:37:31,059-05 DEBUG
    [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.impl.Message]
    (DefaultQuartzScheduler8) [5582058d] SEND<br>
    destination:jms.topic.vdsm_requests<br>
    reply-to:jms.topic.vdsm_responses<br>
    content-length:103<br>
    <br>
{"jsonrpc":"2.0","method":"Host.getAllVmStats","params":{},"id":"f2997c1d-49cc-4d2e-937c-e4910fbb75df"}^@<br>
    2017-01-11 02:37:31,059-05 DEBUG
    [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient]
    (DefaultQuartzScheduler8) [5582058d] Message sent: SEND<br>
    destination:jms.topic.vdsm_requests<br>
    content-length:103<br>
    reply-to:jms.topic.vdsm_responses<br>
    <br>
    &lt;JsonRpcRequest id: "f2997c1d-49cc-4d2e-937c-e4910fbb75df",
    method: Host.getAllVmStats, params: {}&gt;<br>
    2017-01-11 02:37:31,062-05 DEBUG
    [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.impl.Message] (SSL
    Stomp Reactor) [5e453618] MESSAGE<br>
    content-length:829<br>
    destination:jms.topic.vdsm_responses<br>
    content-type:application/json<br>
    subscription:fe930de2-aa67-4bc4-a34c-be22edd1623e<br>
    <br>
    {"jsonrpc": "2.0", "id": "f2997c1d-49cc-4d2e-937c-e4910fbb75df",
    "result": [{"username": "Unknown", "displayInfo": [{"tlsPort": "-1",
    "ipAddress": "192.168.201.4", "type": "vnc", "port": "-1"},
    {"tlsPort": "-1", "ipAddress": "192.168.201.4", "type": "spice",
    "port": "-1"}], "hash": "-4942054084956770103", "acpiEnable":
    "true", "displayIp": "192.168.201.4", "guestFQDN": "", "vmId":
    "42860011-acc3-44d6-9ddf-dea642caf083", "pid": "0", "timeOffset":
    "0", "session": "Unknown", "displaySecurePort": "-1", "displayPort":
    "-1", "memUsage": "0", "guestIPs": "", "kvmEnable": "true",
    "displayType": "vnc", "cpuUser": "0.00", "monitorResponse": "0",
    "elapsedTime": "389", "vmType": "kvm", "cpuSys": "0.00", "status":
    "Restoring state", "guestCPUCount": -1, "appsList": [], "clientIp":
    "", "statusTime": "4295562220", "vmName": "vm0"}]}^@<br>
    2017-01-11 02:37:31,062-05 DEBUG
    [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
    (ResponseWorker) [] Message received: {"jsonrpc": "2.0", "id":
    "f2997c1d-49cc-4d2e-937c-e4910fbb75df", "result": [{"username":
    "Unknown", "displayInfo": [{"tlsPort": "-1", "ipAddress":
    "192.168.201.4", "type": "vnc", "port": "-1"}, {"tlsPort": "-1",
    "ipAddress": "192.168.201.4", "type": "spice", "port": "-1"}],
    "hash": "-4942054084956770103", "acpiEnable": "true", "displayIp":
    "192.168.201.4", "guestFQDN": "", "vmId":
    "42860011-acc3-44d6-9ddf-dea642caf083", "pid": "0", "timeOffset":
    "0", "session": "Unknown", "displaySecurePort": "-1", "displayPort":
    "-1", "memUsage": "0", "guestIPs": "", "kvmEnable": "true",
    "displayType": "vnc", "cpuUser": "0.00", "monitorResponse": "0",
    "elapsedTime": "389", "vmType": "kvm", "cpuSys": "0.00", "status":
    "Restoring state", "guestCPUCount": -1, "appsList": [], "clientIp":
    "", "statusTime": "4295562220", "vmName": "vm0"}]}<br>
    <br>
    Then a failed migration attempt, which fails<br>
    <br>
    2017-01-11 02:46:07,826-05 INFO 
    [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default
    task-4) [27938262-f5fe-41ff-a662-5177a3a25ada] Lock Acquired to
    object
'EngineLock:{exclusiveLocks='[42860011-acc3-44d6-9ddf-dea642caf083=&lt;VM,
    ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED$VmName vm0&gt;]',
    sharedLocks='null'}'<br>
    2017-01-11 02:46:07,827-05 WARN 
    [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default
    task-4) [27938262-f5fe-41ff-a662-5177a3a25ada] Validation of action
    'MigrateVmToServer' failed for user admin@internal-authz. Reasons:
VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__HOST_STATUS__UP,ACTION_TYPE_FAILED_VDS_STATUS_ILLEGAL<br>
    2017-01-11 02:46:07,827-05 INFO 
    [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default
    task-4) [27938262-f5fe-41ff-a662-5177a3a25ada] Lock freed to object
'EngineLock:{exclusiveLocks='[42860011-acc3-44d6-9ddf-dea642caf083=&lt;VM,
    ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED$VmName vm0&gt;]',
    sharedLocks='null'}'<br>
    <br>
    which makes sense, host0 was being reinstalled (!?) meantime.<br>
    <br>
    After that I see no more evidence of 'vm0' in the logs. I couldn't
    yet cross-correlate with the tests, perhaps the test failed
    meantime?<br>
    <br>
    TL;DR: we need a reproducer to keep investigating this, so far I've
    seen quite some suspicious activity but nothing really concrete.<br>
    <br>
    PS: last few lago failures I've seen are been triggered by
    completely unrelated patches, so far I thik they are sporadic
    failures<br>
    <br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Francesco Romani
Red Hat Engineering Virtualization R &amp; D
IRC: fromani</pre>
  </body>
</html>