Checking the logs I see that vdsm was killed:

2017-04-10 09:30:12,743-0400 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:68)

but there is nothing in sanlock logs and the engine was not the cause of it.

I see in messages log:

Apr 10 09:30:09 lago-basic-suite-master-host1 journal: vdsm MOM WARN MOM not available.
Apr 10 09:30:09 lago-basic-suite-master-host1 journal: vdsm MOM WARN MOM not available, KSM stats will be missing.
Apr 10 09:30:10 lago-basic-suite-master-host1 journal: vdsm MOM WARN MOM not available.
Apr 10 09:30:10 lago-basic-suite-master-host1 journal: vdsm MOM WARN MOM not available, KSM stats will be missing.

Apr 10 09:30:12 lago-basic-suite-master-host1 systemd: Stopping Virtual Desktop Server Manager...



On Mon, Apr 10, 2017 at 4:01 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:


On Mon, Apr 10, 2017 at 3:39 PM, Martin Betak <mbetak@redhat.com> wrote:
I have now enqueued another run of OST for the next VDSM patch in line (that was merged around the time of the first failure) https://gerrit.ovirt.org/#/c/75299/2 to see if it was the culprit.

This patch is xmlrpc related. Can you point me to the issue it caused?
 
Not sure why we allow only one (globally for all developers) concurrent run of the manual OST job... I've created a jira issue for this https://ovirt-jira.atlassian.net/browse/OST-61

Martin

On Mon, Apr 10, 2017 at 3:29 PM, Martin Betak <mbetak@redhat.com> wrote:
Actually it appears that this failure is not engine related (tried reverting my patches from saturday but still received similar errors with lago in the [ add_secondary_storage_domains] test).
But when running the current engine master with VDSM cutoff before the time of failure, it passed http://jenkins.ovirt.org/job/ovirt-system-tests_manual/229/

This was the VDSM commit that *worked* 
Commit 5c8aff6177bdaa81ee11c20d417c9bb10e651fb8 by Francesco Romani

On Mon, Apr 10, 2017 at 11:08 AM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

> On 9 Apr 2017, at 09:42, Barak Korren <bkorren@redhat.com> wrote:
>
> On 9 April 2017 at 10:39, Yaniv Kaul <ykaul@redhat.com> wrote:
>> 1. The error:
>> 2017-04-08 14:12:11,376-04 ERROR
>> [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able
>> to update response for "fc0cef3f-d8fa-4074-b315-e36cc2f63fa1"
>>
>> Is still seen, not sure why.
>>
>> 2. MOM is not available (unrelated, but still a bug)
>>
>> 3. New error I have not seen in the past on host1
>> (http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6257/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host1/_var_log/vdsm/vdsm.log
>> ) :
>>
>> 2017-04-08 14:12:22,208-0400 WARN  (jsonrpc/6) [storage.LVM] lvm pvs failed:
>> 5 [] ['  Failed to find physical volume
>> "/dev/mapper/360014050eacbb3c8b21428fb6683f074".'] (lvm:325)
>>
>> 4. New warning:
>> 2017-04-08 14:13:17,327-0400 WARN  (check/loop) [storage.asyncutils] Call
>> <bound method DirectioChecker._check of <DirectioChecker
>> /dev/0eb0f05c-0507-4f02-ba13-b8d865538d7a/metadata running
>> next_check=4295183.36 at 0x2fb9190>> delayed by 1.43 seconds
>> (asyncutils:138)
>>
>>
>> Do we know when it began?
>
> Yes. I linked to the patch that seems to have started this.

This should be easy to revert
Working on it
Then we’ll see what exactly is the culprit

>
>
>
> --
> Barak Korren
> bkorren@redhat.com
> RHCE, RHCi, RHV-DevOps Team
> https://ifireball.wordpress.com/
> _______________________________________________
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
>

_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel