<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000066">

    <font face="Ubuntu">It looks  that  any memory eager running and

      swap fails  afterwards <br>

      will write here who's response of this<br>

      <br>

      <br>

    </font><br>

    <div class="moz-cite-prefix">On 03/10/2017 01:56 PM,

      <a class="moz-txt-link-abbreviated" href="mailto:users-request@ovirt.org">users-request@ovirt.org</a> wrote:<br>

    </div>

    <blockquote cite="mid:mailman.84.1489150569.1826.users@ovirt.org"

      type="cite">

      <pre wrap="">Send Users mailing list submissions to

        <a class="moz-txt-link-abbreviated" href="mailto:users@ovirt.org">users@ovirt.org</a>

To subscribe or unsubscribe via the World Wide Web, visit

        <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a>

or, via email, send a message with subject or body 'help' to

        <a class="moz-txt-link-abbreviated" href="mailto:users-request@ovirt.org">users-request@ovirt.org</a>

You can reach the person managing the list at

        <a class="moz-txt-link-abbreviated" href="mailto:users-owner@ovirt.org">users-owner@ovirt.org</a>

When replying, please edit your Subject line so it is more specific

than "Re: Contents of Users digest..."

Today's Topics:

   1.  Error on Node upgrade 2 (FERNANDO FREDIANI)

   2. Re:  oVirt VM backup and restore (Gianluca Cecchi)

   3. Re:  VDSM hang (Nir Soffer)

----------------------------------------------------------------------

Message: 1

Date: Fri, 10 Mar 2017 09:37:00 -0300

From: FERNANDO FREDIANI <a class="moz-txt-link-rfc2396E" href="mailto:fernando.frediani@upx.com">&lt;fernando.frediani@upx.com&gt;</a>

To: <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org">"users@ovirt.org"</a> <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org">&lt;users@ovirt.org&gt;</a>

Subject: [ovirt-users] Error on Node upgrade 2

Message-ID: <a class="moz-txt-link-rfc2396E" href="mailto:3a306136-25aa-4533-4267-f1b8be58fd87@upx.com">&lt;3a306136-25aa-4533-4267-f1b8be58fd87@upx.com&gt;</a>

Content-Type: text/plain; charset=utf-8; format=flowed

I am not sure if another email I sent went through but has anyone got 

problems when upgrading a running oVirt-node-ng from 4.1.0 to 4.1.1.

Is the only solution a complete reinstall of the node ?

Thanks

Fernando

------------------------------

Message: 2

Date: Fri, 10 Mar 2017 13:52:35 +0100

From: Gianluca Cecchi <a class="moz-txt-link-rfc2396E" href="mailto:gianluca.cecchi@gmail.com">&lt;gianluca.cecchi@gmail.com&gt;</a>

To: Juan Hern?ndez <a class="moz-txt-link-rfc2396E" href="mailto:jhernand@redhat.com">&lt;jhernand@redhat.com&gt;</a>

Cc: <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org">"users@ovirt.org"</a> <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org">&lt;users@ovirt.org&gt;</a>

Subject: Re: [ovirt-users] oVirt VM backup and restore

Message-ID:

        <a class="moz-txt-link-rfc2396E" href="mailto:CAG2kNCwA3-jubtjcz=bzwX+V5-a5V85=HKi5pr=447Ad-tRD4w@mail.gmail.com">&lt;CAG2kNCwA3-jubtjcz=bzwX+V5-a5V85=HKi5pr=447Ad-tRD4w@mail.gmail.com&gt;</a>

Content-Type: text/plain; charset="utf-8"

On Thu, Mar 9, 2017 at 11:23 AM, Juan Hern?ndez <a class="moz-txt-link-rfc2396E" href="mailto:jhernand@redhat.com">&lt;jhernand@redhat.com&gt;</a> wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">

</pre>

        <blockquote type="cite">

          <pre wrap="">    Very good point Gialuca, you are right, the 'persist_memorystate'

</pre>

        </blockquote>

        <pre wrap="">flag

</pre>

        <blockquote type="cite">

          <pre wrap="">    is 'true' by default, and that makes the pause longer. Would you be

</pre>

        </blockquote>

        <pre wrap="">so

</pre>

        <blockquote type="cite">

          <pre wrap="">    kind to add it to the 'vm_backup.py' example that is part of version

</pre>

        </blockquote>

        <pre wrap="">4

</pre>

        <blockquote type="cite">

          <pre wrap="">    of the SDK?

    <a class="moz-txt-link-freetext" href="https://github.com/oVirt/ovirt-engine-sdk/blob/master/">https://github.com/oVirt/ovirt-engine-sdk/blob/master/</a>

</pre>

        </blockquote>

        <pre wrap="">sdk/examples/vm_backup.py#L143-L151

</pre>

        <blockquote type="cite">

          <pre wrap="">    &lt;<a class="moz-txt-link-freetext" href="https://github.com/oVirt/ovirt-engine-sdk/blob/master/">https://github.com/oVirt/ovirt-engine-sdk/blob/master/</a>

</pre>

        </blockquote>

        <pre wrap="">sdk/examples/vm_backup.py#L143-L151&gt;

</pre>

        <blockquote type="cite">

          <pre wrap="">

    (Note that that Gibhub is just a mirror, the change needs to be

</pre>

        </blockquote>

        <pre wrap="">submited

</pre>

        <blockquote type="cite">

          <pre wrap="">    using gerrit.ovirt.org <a class="moz-txt-link-rfc2396E" href="http://gerrit.ovirt.org">&lt;http://gerrit.ovirt.org&gt;</a>).

I already verified (on a 4.1 infra) that changing vm_backup.py

downloaded yesterday from master this way (apart connection paramters):

$ diff vm_backup.py vm_backup.py.orig

 150d149

&lt;         persist_memorystate=False,

I get the backup result and snapshot is correctly without memory saved

(and no pause at VM side)

</pre>

        </blockquote>

        <pre wrap="">

[snip]

</pre>

      </blockquote>

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">

I see that you already have a gerrit.ovirt.org, so it shouldn't be

difficult. The initial setup should be like this:

  $ git config --global user.name youruser

  $ git config --global user.email your@email

  $ git clone <a class="moz-txt-link-freetext" href="ssh://youruser@gerrit.ovirt.org:29418/ovirt-engine-sdk">ssh://youruser@gerrit.ovirt.org:29418/ovirt-engine-sdk</a>

  $ gitdir=$(git rev-parse --git-dir); scp -p -P 29418

<a class="moz-txt-link-abbreviated" href="mailto:youruser@gerrit.ovirt.org:hooks/commit-msg">youruser@gerrit.ovirt.org:hooks/commit-msg</a> ${gitdir}/hooks/

Then, to submit the patch:

  $ cd ovirt-engine-sdk

  $ Edit the vm_backup.py file, and make your changes.

  $ git commit -a -s

  $ git push origin HEAD:refs/for/master

</pre>

      </blockquote>

      <pre wrap="">Ok. I found the time to try and apparently it worked as expected.

The master (you? jenkins CI? ;-) should see my change...

Just learnt (a little...) another thing

sys admins often try to put an eye inside devs field but the reverse seldom

happens ;-)

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <a class="moz-txt-link-rfc2396E" href="http://lists.ovirt.org/pipermail/users/attachments/20170310/e8e41a41/attachment-0001.html">&lt;http://lists.ovirt.org/pipermail/users/attachments/20170310/e8e41a41/attachment-0001.html&gt;</a>

------------------------------

Message: 3

Date: Fri, 10 Mar 2017 14:56:01 +0200

From: Nir Soffer <a class="moz-txt-link-rfc2396E" href="mailto:nsoffer@redhat.com">&lt;nsoffer@redhat.com&gt;</a>

To: <a class="moz-txt-link-rfc2396E" href="mailto:paf1@email.cz">"paf1@email.cz"</a> <a class="moz-txt-link-rfc2396E" href="mailto:paf1@email.cz">&lt;paf1@email.cz&gt;</a>

Cc: users <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org">&lt;users@ovirt.org&gt;</a>

Subject: Re: [ovirt-users] VDSM hang

Message-ID:

        <a class="moz-txt-link-rfc2396E" href="mailto:CAMRbyyu=t7g+aqG9Unoqe=Gbh2eaDYUQBpoivY0RiA_vzb8ong@mail.gmail.com">&lt;CAMRbyyu=t7g+aqG9Unoqe=Gbh2eaDYUQBpoivY0RiA_vzb8ong@mail.gmail.com&gt;</a>

Content-Type: text/plain; charset=UTF-8

On Fri, Mar 10, 2017 at 1:07 PM, <a class="moz-txt-link-abbreviated" href="mailto:paf1@email.cz">paf1@email.cz</a> <a class="moz-txt-link-rfc2396E" href="mailto:paf1@email.cz">&lt;paf1@email.cz&gt;</a> wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">Hello everybody,

for production usage i'm testing  ovirt with gluster.

All components seems to be running fine but whenever I'm testing huge

workload, then node freez. Not the main OS, but VDSM mgmt and attached

services, VMs eg.

</pre>

      </blockquote>

      <pre wrap="">

What do you mean by freez?

</pre>

      <blockquote type="cite">

        <pre wrap="">

mgmt

oVirt - 4.1.0.4

centos 7.3-1611

nodes ( installed from ovirt image

"ovirt-node-ng-installer-ovirt-4.1-2017030804.iso"  )

OS Version: == RHEL - 7 - 3.1611.el7.centos

OS Description:== oVirt Node 4.1.0

Kernel Version:== 3.10.0 - 514.10.2.el7.x86_64

KVM Version:== 2.6.0 - 28.el7_3.3.1

LIBVIRT Version:== libvirt-2.0.0-10.el7_3.5

VDSM Version:== vdsm-4.19.4-1.el7.centos

SPICE Version:== 0.12.4 - 20.el7_3

GlusterFS Version:== glusterfs-3.8.9-1.el7  ( LVM thinprovisioning in

replica 2 - created from ovirt GUI )

concurently running

- huge import from export domain    ( net workload )

- sequential write to VMs local disk ( gluster replica sequential workload )

- VMs database huge select  (  random IOps )

- huge old snapshot delete  ( random IOps )

In this configuration / workload  is  runnig one hour eg, with no exceptions

, with 70-80% disk load, but in some point VDSM freez  all jobs for a

timeout and VM's are in "uknown" status .

The whole system revitalize then automaticaly in cca 20min time frame (

except the import and snapshot deleting(rollback) )

engine.log  - focus 10:39:07 time  ( Failed in 'HSMGetAllTasksStatusesVDS'

method )

========

n child command id: 'a8a3a4d5-cf7d-4423-8243-022911232508'

type:'RemoveSnapshotSingleDiskLive' to complete

2017-03-10 10:39:01,727+01 INFO

[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]

(DefaultQuartzScheduler2) [759c8e1f] Command 'RemoveSnapshotSingleDiskLive'

(id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id:

'33df2c1e-6ce3-44fd-a39b-d111883b4c4e' type:'DestroyImage' to complete

2017-03-10 10:39:03,929+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,

GlusterServersListVDSCommand(HostName = 2kvm1,

VdsIdVDSCommandParametersBase:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 446d0cd3

2017-03-10 10:39:04,343+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,

GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,

16.0.0.164:CONNECTED], log id: 446d0cd3

2017-03-10 10:39:04,353+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,

GlusterVolumesListVDSCommand(HostName = 2kvm1,

GlusterVolumesListVDSParameters:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 69ea1fda

2017-03-10 10:39:05,128+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,

GlusterVolumesListVDSCommand, return:

{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},

log id: 69ea1fda

2017-03-10 10:39:07,163+01 ERROR

[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]

(DefaultQuartzScheduler2) [759c8e1f] Failed in 'HSMGetAllTasksStatusesVDS'

method

2017-03-10 10:39:07,178+01 ERROR

[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]

(DefaultQuartzScheduler2) [759c8e1f] EVENT_ID:

VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null,

Custom Event ID: -1, Message: VDSM 2kvm2 command HSMGetAllTasksStatusesVDS

failed: Connection timed out

2017-03-10 10:39:07,182+01 INFO

[org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2)

[759c8e1f] BaseAsyncTask::onTaskEndSuccess: Task

'f594bf69-619b-4d1b-8f6d-a9826997e478' (Parent Command 'ImportVm',

Parameters Type

'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended

successfully.

2017-03-10 10:39:07,182+01 INFO

[org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (DefaultQuartzScheduler2)

[759c8e1f] Task with DB Task ID 'a05c7c07-9b98-4ab2-ac7b-9e70a75ba7b7' and

VDSM Task ID '7c60369f-70a3-4a6a-80c9-4753ac9ed372' is in state Polling. End

action for command 8deb3fe3-4a83-4605-816c-ffdc63fd9ac1 will proceed when

all the entity's tasks are completed.

2017-03-10 10:39:07,182+01 INFO

[org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2)

[759c8e1f] SPMAsyncTask::PollTask: Polling task

'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent Command 'DestroyImage',

Parameters Type

'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned

status 'finished', result 'cleanSuccess'.

2017-03-10 10:39:07,182+01 ERROR

[org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2)

[759c8e1f] BaseAsyncTask::logEndTaskFailure: Task

'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent Command 'DestroyImage',

Parameters Type

'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with

failure:

-- Result: 'cleanSuccess'

-- Message: 'VDSGenericException: VDSErrorException: Failed to

HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100',

-- Exception: 'VDSGenericException: VDSErrorException: Failed to

HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100'

2017-03-10 10:39:07,184+01 INFO

[org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler2)

[759c8e1f] CommandAsyncTask::endActionIfNecessary: All tasks of command

'33df2c1e-6ce3-44fd-a39b-d111883b4c4e' has ended -&gt; executing 'endAction'

2017-03-10 10:39:07,185+01 INFO

[org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler2)

[759c8e1f] CommandAsyncTask::endAction: Ending action for '1' tasks (command

ID: '33df2c1e-6ce3-44fd-a39b-d111883b4c4e'): calling endAction '.

2017-03-10 10:39:07,185+01 INFO

[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f]

CommandAsyncTask::endCommandAction [within thread] context: Attempting to

endAction 'DestroyImage',

2017-03-10 10:39:07,192+01 INFO

[org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f] Command

[id=33df2c1e-6ce3-44fd-a39b-d111883b4c4e]: Updating status to 'FAILED', The

command end method logic will be executed by one of its parent commands.

2017-03-10 10:39:07,192+01 INFO

[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f]

CommandAsyncTask::HandleEndActionResult [within thread]: endAction for

action type 'DestroyImage' completed, handling the result.

2017-03-10 10:39:07,192+01 INFO

[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f]

CommandAsyncTask::HandleEndActionResult [within thread]: endAction for

action type 'DestroyImage' succeeded, clearing tasks.

2017-03-10 10:39:07,192+01 INFO

[org.ovirt.engine.core.bll.tasks.SPMAsyncTask]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f] SPMAsyncTask::ClearAsyncTask:

Attempting to clear task 'f351e8f6-6dd7-49aa-bf54-650d84fc6352'

2017-03-10 10:39:07,193+01 INFO

[org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f] START,

SPMClearTaskVDSCommand(

SPMTaskGuidBaseVDSCommandParameters:{runAsync='true',

storagePoolId='00000001-0001-0001-0001-000000000311',

ignoreFailoverLimit='false',

taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id: 2b7080c2

2017-03-10 10:39:07,194+01 INFO

[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f] START,

HSMClearTaskVDSCommand(HostName = 2kvm2,

HSMTaskGuidBaseVDSCommandParameters:{runAsync='true',

hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8',

taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id: 2edff460

2017-03-10 10:39:08,208+01 INFO

[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH,

HSMClearTaskVDSCommand, log id: 2edff460

2017-03-10 10:39:08,208+01 INFO

[org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH,

SPMClearTaskVDSCommand, log id: 2b7080c2

2017-03-10 10:39:08,213+01 INFO

[org.ovirt.engine.core.bll.tasks.SPMAsyncTask]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f]

BaseAsyncTask::removeTaskFromDB: Removed task

'f351e8f6-6dd7-49aa-bf54-650d84fc6352' from DataBase

2017-03-10 10:39:08,213+01 INFO

[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]

(org.ovirt.thread.pool-6-thread-31) [759c8e1f]

CommandAsyncTask::HandleEndActionResult [within thread]: Removing

CommandMultiAsyncTasks object for entity

'33df2c1e-6ce3-44fd-a39b-d111883b4c4e'

2017-03-10 10:39:10,142+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START,

GlusterServersListVDSCommand(HostName = 2kvm1,

VdsIdVDSCommandParametersBase:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2e7278cb

2017-03-10 10:39:11,513+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH,

GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,

16.0.0.164:CONNECTED], log id: 2e7278cb

2017-03-10 10:39:11,523+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START,

GlusterVolumesListVDSCommand(HostName = 2kvm1,

GlusterVolumesListVDSParameters:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 43704ef2

2017-03-10 10:39:11,777+01 INFO

[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]

(DefaultQuartzScheduler9) [67e1d8ed] Command 'RemoveSnapshot' (id:

'13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:

'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'

to complete

2017-03-10 10:39:11,789+01 WARN

[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]

(DefaultQuartzScheduler9) [759c8e1f] Child command 'DESTROY_IMAGE' failed,

proceeding to verify

2017-03-10 10:39:11,789+01 INFO

[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]

(DefaultQuartzScheduler9) [759c8e1f] Executing Live Merge command step

'DESTROY_IMAGE_CHECK'

2017-03-10 10:39:11,832+01 INFO

[org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-7)

[4856f570] Running command: DestroyImageCheckCommand internal: true.

2017-03-10 10:39:11,833+01 INFO

[org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand]

(pool-5-thread-7) [4856f570] START, SPMGetVolumeInfoVDSCommand(

SPMGetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',

runAsync='true', storagePoolId='00000001-0001-0001-0001-000000000311',

ignoreFailoverLimit='false',

storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e',

imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428',

imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id: 2c8031f8

2017-03-10 10:39:11,833+01 INFO

[org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand]

(pool-5-thread-7) [4856f570] Executing GetVolumeInfo using the current SPM

2017-03-10 10:39:11,834+01 INFO

[org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand]

(pool-5-thread-7) [4856f570] START, GetVolumeInfoVDSCommand(HostName =

2kvm2,

GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',

runAsync='true', hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8',

storagePoolId='00000001-0001-0001-0001-000000000311',

storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e',

imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428',

imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id: 79ca86cc

2017-03-10 10:39:11,846+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH,

GlusterVolumesListVDSCommand, return:

{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},

log id: 43704ef2

2017-03-10 10:39:16,858+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START,

GlusterServersListVDSCommand(HostName = 2kvm1,

VdsIdVDSCommandParametersBase:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 6542adcd

2017-03-10 10:39:17,394+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH,

GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,

16.0.0.164:CONNECTED], log id: 6542adcd

2017-03-10 10:39:17,406+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START,

GlusterVolumesListVDSCommand(HostName = 2kvm1,

GlusterVolumesListVDSParameters:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 44ec33ed

2017-03-10 10:39:18,598+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH,

GlusterVolumesListVDSCommand, return:

{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},

log id: 44ec33ed

2017-03-10 10:39:21,865+01 INFO

[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]

(DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id:

'13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:

'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'

to complete

2017-03-10 10:39:21,881+01 INFO

[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]

(DefaultQuartzScheduler6) [4856f570] Command 'RemoveSnapshotSingleDiskLive'

(id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id:

'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc' type:'DestroyImageCheck' to complete

2017-03-10 10:39:23,611+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler6) [4856f570] START,

GlusterServersListVDSCommand(HostName = 2kvm1,

VdsIdVDSCommandParametersBase:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 4c2fc22d

2017-03-10 10:39:24,616+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand]

(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START,

GlusterTasksListVDSCommand(HostName = 2kvm1,

VdsIdVDSCommandParametersBase:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1f169371

2017-03-10 10:39:24,618+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler6) [4856f570] FINISH, GlusterServersListVDSCommand,

return: [172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id: 4c2fc22d

2017-03-10 10:39:24,629+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler6) [4856f570] START,

GlusterVolumesListVDSCommand(HostName = 2kvm1,

GlusterVolumesListVDSParameters:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2ac55735

2017-03-10 10:39:24,822+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand]

(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH,

GlusterTasksListVDSCommand, return: [], log id: 1f169371

2017-03-10 10:39:26,836+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler6) [4856f570] FINISH, GlusterVolumesListVDSCommand,

return:

{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},

log id: 2ac55735

2017-03-10 10:39:31,849+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,

GlusterServersListVDSCommand(HostName = 2kvm1,

VdsIdVDSCommandParametersBase:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2e8dbcd1

2017-03-10 10:39:31,932+01 INFO

[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]

(DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id:

'13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:

'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'

to complete

2017-03-10 10:39:31,944+01 INFO

[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]

(DefaultQuartzScheduler6) [4856f570] Command 'RemoveSnapshotSingleDiskLive'

(id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id:

'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc' type:'DestroyImageCheck' to complete

2017-03-10 10:39:33,213+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,

GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,

16.0.0.164:CONNECTED], log id: 2e8dbcd1

2017-03-10 10:39:33,226+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,

GlusterVolumesListVDSCommand(HostName = 2kvm1,

GlusterVolumesListVDSParameters:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1fb3f9e3

2017-03-10 10:39:34,375+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,

GlusterVolumesListVDSCommand, return:

{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},

log id: 1fb3f9e3

2017-03-10 10:39:39,392+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] START,

GlusterServersListVDSCommand(HostName = 2kvm1,

VdsIdVDSCommandParametersBase:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1e0b8eeb

2017-03-10 10:39:40,753+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]

(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] FINISH,

GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,

16.0.0.164:CONNECTED], log id: 1e0b8eeb

2017-03-10 10:39:40,763+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] START,

GlusterVolumesListVDSCommand(HostName = 2kvm1,

GlusterVolumesListVDSParameters:{runAsync='true',

hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 35b04b33

2017-03-10 10:39:41,952+01 INFO

[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]

(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] FINISH,

GlusterVolumesListVDSCommand, return:

{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},

log id: 35b04b33

2017-03-10 10:39:41,991+01 INFO

[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]

(DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id:

'13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:

'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'

to complete

gluster  ( nothing in logs )

======

## "etc-glusterfs-glusterd.vol.log"

[2017-03-10 10:13:52.599019] I [MSGID: 106499]

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1

[2017-03-10 10:16:48.639635] I [MSGID: 106499]

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1

The message "I [MSGID: 106499]

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1" repeated 3 times between

[2017-03-10 10:16:48.639635] and [2017-03-10 10:17:55.659379]

[2017-03-10 10:18:56.875516] I [MSGID: 106499]

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1

[2017-03-10 10:19:57.204689] I [MSGID: 106499]

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1

[2017-03-10 10:21:56.576879] I [MSGID: 106499]

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1

[2017-03-10 10:40:14.169864] I [MSGID: 106499]

[glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume slow1

## "rhev-data-center-mnt-glusterSD-localhost:_slow1.log"

[2017-03-10 09:43:40.346785] W [MSGID: 101159] [inode.c:1214:__inode_unlink]

0-inode:

be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61849:

dentry not found in 43e6968f-9c2a-40d8-8074-caf1a36f60cf

[2017-03-10 09:43:40.347076] W [MSGID: 101159] [inode.c:1214:__inode_unlink]

0-inode:

be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61879:

dentry not fou

[2017-03-10 10:04:03.103839] W [MSGID: 101159] [inode.c:1214:__inode_unlink]

0-inode:

be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.9787:

dentry not found in 6be71632-aa36-4975-b673-1357e0355027

[2017-03-10 10:06:02.406385] I [MSGID: 109066]

[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming

/1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta.new

(hash=slow1-replicate-0/cache=slow1-replicate-0) =&gt;

/1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta

(hash=slow1-replicate-0/cache=slow1-replicate-0)

... no other record

messages

========

several times occured:

Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.

Mar 10 09:10:01 2kvm2 systemd: Started Session 274 of user root.

Mar 10 09:10:01 2kvm2 systemd: Starting Session 274 of user root.

Mar 10 09:20:02 2kvm2 systemd: Started Session 275 of user root.

Mar 10 09:20:02 2kvm2 systemd: Starting Session 275 of user root.

Mar 10 09:22:59 2kvm2 sanlock[1673]: 2017-03-10 09:22:59+0100 136031 [2576]:

s3 delta_renew long write time 11 sec

</pre>

      </blockquote>

      <pre wrap="">

Sanlock cannot write to storage

</pre>

      <blockquote type="cite">

        <pre wrap="">Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2,

mode:0x104020

</pre>

      </blockquote>

      <pre wrap="">

Kernel cannot allocate page?

</pre>

      <blockquote type="cite">

        <pre wrap="">Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G

I    ------------   3.10.0-514.10.2.el7.x86_64 #1

Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+,

BIOS 1.0a 08/29/2014

Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9

ffff88301f4839d8 ffffffff816864ef

Mar 10 09:24:03 2kvm2 kernel: ffff88301f483a68 ffffffff81186ba0

000068fc00000000 0000000000000000

Mar 10 09:24:03 2kvm2 kernel: fffffffffffffffc 0010402000000000

ffff88301567ae80 00000000f7228dc9

Mar 10 09:24:03 2kvm2 kernel: Call Trace:

Mar 10 09:24:03 2kvm2 kernel: &lt;IRQ&gt;  [&lt;ffffffff816864ef&gt;]

dump_stack+0x19/0x1b

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81186ba0&gt;]

warn_alloc_failed+0x110/0x180

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81682083&gt;]

__alloc_pages_slowpath+0x6b7/0x725

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8118b155&gt;]

__alloc_pages_nodemask+0x405/0x420

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811cf30a&gt;]

alloc_pages_current+0xaa/0x170

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81185a7e&gt;] __get_free_pages+0xe/0x50

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811dabae&gt;]

kmalloc_order_trace+0x2e/0xa0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811dd381&gt;] __kmalloc+0x221/0x240

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffffa02f83fa&gt;]

bnx2x_frag_alloc.isra.62+0x2a/0x40 [bnx2x]

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffffa02f92f7&gt;] bnx2x_rx_int+0x227/0x17b0

[bnx2x]

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81033669&gt;] ? sched_clock+0x9/0x10

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffffa02fc72d&gt;] bnx2x_poll+0x1dd/0x260

[bnx2x]

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff815705e0&gt;] net_rx_action+0x170/0x380

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8108f2cf&gt;] __do_softirq+0xef/0x280

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8169859c&gt;] call_softirq+0x1c/0x30

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8102d365&gt;] do_softirq+0x65/0xa0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8108f665&gt;] irq_exit+0x115/0x120

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81699138&gt;] do_IRQ+0x58/0xf0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8168e2ad&gt;]

common_interrupt+0x6d/0x6d

Mar 10 09:24:03 2kvm2 kernel: &lt;EOI&gt;  [&lt;ffffffff81189a73&gt;] ?

free_hot_cold_page+0x103/0x160

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81189b16&gt;]

free_hot_cold_page_list+0x46/0xa0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81195193&gt;]

shrink_page_list+0x543/0xb00

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81195dda&gt;]

shrink_inactive_list+0x1fa/0x630

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81196975&gt;] shrink_lruvec+0x385/0x770

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81196dd6&gt;] shrink_zone+0x76/0x1a0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8119807c&gt;] balance_pgdat+0x48c/0x5e0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81198343&gt;] kswapd+0x173/0x450

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b17d0&gt;] ?

wake_up_atomic_t+0x30/0x30

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811981d0&gt;] ?

balance_pgdat+0x5e0/0x5e0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b06ff&gt;] kthread+0xcf/0xe0

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b0630&gt;] ?

kthread_create_on_node+0x140/0x140

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81696a58&gt;] ret_from_fork+0x58/0x90

Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b0630&gt;] ?

kthread_create_on_node+0x140/0x140

Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2,

mode:0x104020

Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G

I    ------------   3.10.0-514.10.2.el7.x86_64 #1

Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+,

BIOS 1.0a 08/29/2014

Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9

ffff88301f4839d8 ffffffff816864ef

and in critical time again

Mar 10 10:37:53 2kvm2 sanlock[1673]: 2017-03-10 10:37:53+0100 140524 [1673]:

s3 check_our_lease warning 73 last_success 140451

Mar 10 10:37:54 2kvm2 sanlock[1673]: 2017-03-10 10:37:54+0100 140525 [1673]:

s3 check_our_lease warning 74 last_success 140451

</pre>

      </blockquote>

      <pre wrap="">

Sanlock could not renew the lease for 74 seconds

</pre>

      <blockquote type="cite">

        <pre wrap="">Mar 10 10:37:54 2kvm2 wdmd[1732]: test warning now 140526 ping 140516 close

0 renewal 140451 expire 140531 client 1673

sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1

Mar 10 10:37:54 2kvm2 kernel: watchdog watchdog0: watchdog did not stop!

Mar 10 10:37:54 2kvm2 wdmd[1732]: /dev/watchdog0 closed unclean

Mar 10 10:37:55 2kvm2 sanlock[1673]: 2017-03-10 10:37:55+0100 140526 [1673]:

s3 check_our_lease warning 75 last_success 140451

Mar 10 10:37:55 2kvm2 wdmd[1732]: test warning now 140527 ping 140516 close

140526 renewal 140451 expire 140531 client 1673

sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1

Mar 10 10:37:56 2kvm2 sanlock[1673]: 2017-03-10 10:37:56+0100 140527 [1673]:

s3 check_our_lease warning 76 last_success 140451

Mar 10 10:37:56 2kvm2 wdmd[1732]: test warning now 140528 ping 140516 close

140526 renewal 140451 expire 140531 client 1673

sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1

Mar 10 10:37:57 2kvm2 sanlock[1673]: 2017-03-10 10:37:57+0100 140528 [1673]:

s3 check_our_lease warning 77 last_success 140451

Mar 10 10:37:57 2kvm2 wdmd[1732]: test warning now 140529 ping 140516 close

140526 renewal 140451 expire 140531 client 1673

sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1

Mar 10 10:37:58 2kvm2 sanlock[1673]: 2017-03-10 10:37:58+0100 140529 [1673]:

s3 check_our_lease warning 78 last_success 140451

Mar 10 10:37:58 2kvm2 wdmd[1732]: test warning now 140530 ping 140516 close

140526 renewal 140451 expire 140531 client 1673

sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1

Mar 10 10:37:59 2kvm2 sanlock[1673]: 2017-03-10 10:37:59+0100 140530 [1673]:

s3 check_our_lease warning 79 last_success 140451

Mar 10 10:37:59 2kvm2 wdmd[1732]: test failed rem 55 now 140531 ping 140516

close 140526 renewal 140451 expire 140531 client 1673

sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1

Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]:

s3 check_our_lease failed 80

</pre>

      </blockquote>

      <pre wrap="">

Sanlock fail to renew the lease after 80 seconds - game over

</pre>

      <blockquote type="cite">

        <pre wrap="">Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]:

s3 all pids clear

</pre>

      </blockquote>

      <pre wrap="">

If this host is the SPM, sanlock just killed vdsm, this explains why

your storage operation fail.

</pre>

      <blockquote type="cite">

        <pre wrap="">Mar 10 10:38:01 2kvm2 wdmd[1732]: /dev/watchdog0 reopen

Mar 10 10:38:10 2kvm2 journal: Cannot start job (query, none) for domain

TEST-LBS_EBSAPP; current job is (query, none) owned by (3284

remoteDispatchConnectGetAllDomainStats, 0 &lt;null&gt;) for (62s, 0s)

Mar 10 10:38:10 2kvm2 journal: Timed out during operation: cannot acquire

state change lock (held by remoteDispatchConnectGetAllDomainStats)

Mar 10 10:38:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving

from &lt;yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40590, 0, 0) at

0x3acdd88&gt;: unexpected eof

Mar 10 10:38:40 2kvm2 journal: Cannot start job (query, none) for domain

TEST1-LBS_ATRYA; current job is (query, none) owned by (3288

remoteDispatchConnectGetAllDomainStats, 0 &lt;null&gt;) for (47s, 0s)

Mar 10 10:38:40 2kvm2 journal: Timed out during operation: cannot acquire

state change lock (held by remoteDispatchConnectGetAllDomainStats)

Mar 10 10:38:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving

from &lt;yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40592, 0, 0) at

0x3fd5b90&gt;: unexpected eof

Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain

TEST-LBS_EBSAPP; current job is (query, none) owned by (3284

remoteDispatchConnectGetAllDomainStats, 0 &lt;null&gt;) for (122s, 0s)

Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire

state change lock (held by remoteDispatchConnectGetAllDomainStats)

Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain

TEST1-LBS_ATRYA; current job is (query, none) owned by (3288

remoteDispatchConnectGetAllDomainStats, 0 &lt;null&gt;) for (77s, 0s)

Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire

state change lock (held by remoteDispatchConnectGetAllDomainStats)

Mar 10 10:39:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving

from &lt;yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40594, 0, 0) at

0x2447290&gt;: unexpected eof

Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]:

s3 delta_renew write time 140 error -202

Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]:

s3 renewal error -202 delta_length 144 last_success 140451

Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain

TEST-LBS_EBSAPP; current job is (query, none) owned by (3284

remoteDispatchConnectGetAllDomainStats, 0 &lt;null&gt;) for (152s, 0s)

Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire

state change lock (held by remoteDispatchConnectGetAllDomainStats)

Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain

TEST1-LBS_ATRYA; current job is (query, none) owned by (3288

remoteDispatchConnectGetAllDomainStats, 0 &lt;null&gt;) for (107s, 0s)

Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire

state change lock (held by remoteDispatchConnectGetAllDomainStats)

Mar 10 10:39:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving

from &lt;yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40596, 0, 0) at

0x2472ef0&gt;: unexpected eof

Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42107 blocked for more

than 120 seconds.

</pre>

      </blockquote>

      <pre wrap="">

qemu-img is blocked for more than 120 seconds.

</pre>

      <blockquote type="cite">

        <pre wrap="">Mar 10 10:39:49 2kvm2 kernel: "echo 0 &gt;

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Mar 10 10:39:49 2kvm2 kernel: qemu-img        D ffff88010dad3e30     0 42107

3631 0x00000080

Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3b30 0000000000000082

ffff8814491f4e70 ffff88010dad3fd8

Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3fd8 ffff88010dad3fd8

ffff8814491f4e70 ffff88301f096c40

Mar 10 10:39:49 2kvm2 kernel: 0000000000000000 7fffffffffffffff

ffff88181f186c00 ffff88010dad3e30

Mar 10 10:39:49 2kvm2 kernel: Call Trace:

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8168bbb9&gt;] schedule+0x29/0x70

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81689609&gt;]

schedule_timeout+0x239/0x2d0

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8168b15e&gt;]

io_schedule_timeout+0xae/0x130

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8168b1f8&gt;] io_schedule+0x18/0x20

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8124d9e5&gt;]

wait_on_sync_kiocb+0x35/0x80

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffffa0a36091&gt;]

fuse_direct_IO+0x231/0x380 [fuse]

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff812a6ddd&gt;] ?

cap_inode_need_killpriv+0x2d/0x40

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff812a8cb6&gt;] ?

security_inode_need_killpriv+0x16/0x20

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81219e3f&gt;] ?

dentry_needs_remove_privs.part.13+0x1f/0x30

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81182a2d&gt;]

generic_file_direct_write+0xcd/0x190

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffffa0a36905&gt;]

fuse_file_aio_write+0x185/0x340 [fuse]

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff811fdabd&gt;] do_sync_write+0x8d/0xd0

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff811fe32d&gt;] vfs_write+0xbd/0x1e0

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff811ff002&gt;] SyS_pwrite64+0x92/0xc0

Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81696b09&gt;]

system_call_fastpath+0x16/0x1b

Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42111 blocked for more

than 120 seconds.

Mar 10 10:39:49 2kvm2 kernel: "echo 0 &gt;

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Mar 10 10:39:49 2kvm2 kernel: qemu-img        D ffff8818a76e7e30     0 42111

3632 0x00000080

Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7b30 0000000000000082

ffff88188aaeaf10 ffff8818a76e7fd8

Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7fd8 ffff8818a76e7fd8

ffff88188aaeaf10 ffff88301f156c40

memory

=======

# cat /proc/meminfo

MemTotal:       197983472 kB

MemFree:          834228 kB

MemAvailable:   165541204 kB

Buffers:           45548 kB

Cached:         159596272 kB

SwapCached:       119872 kB

Active:         40803264 kB

Inactive:       148022076 kB

Active(anon):   26594112 kB

Inactive(anon):  2626384 kB

Active(file):   14209152 kB

Inactive(file): 145395692 kB

Unevictable:       50488 kB

Mlocked:           50488 kB

SwapTotal:       4194300 kB

SwapFree:        3612188 kB

Dirty:               624 kB

Writeback:             0 kB

AnonPages:      29185032 kB

Mapped:            85176 kB

Shmem:             25908 kB

Slab:            6203384 kB

SReclaimable:    5857240 kB

SUnreclaim:       346144 kB

KernelStack:       19184 kB

PageTables:        86100 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    103186036 kB

Committed_AS:   52300288 kB

VmallocTotal:   34359738367 kB

VmallocUsed:     1560580 kB

VmallocChunk:   34257341440 kB

HardwareCorrupted:     0 kB

AnonHugePages:   5566464 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

DirectMap4k:      431292 kB

DirectMap2M:    19382272 kB

DirectMap1G:    183500800 kB

can anybody help me with this ??

I've got a small tip about swap problem ( in messages), but not shure .....

The similar problem occured in older versions in gluster/ovirt testing ( in

huge workload freeez - but not fatal overload )

</pre>

      </blockquote>

      <pre wrap="">

You have a storage issue, you should understand why

your storage is failing.

There is also kernel failure to allocated page, maybe this is

related to the storage failure?

Nir

------------------------------

_______________________________________________

Users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a>

<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a>

End of Users Digest, Vol 66, Issue 86

*************************************

</pre>

    </blockquote>

    <br>

    <div class="moz-signature">-- <br>

      <font color="#1a2a4b" size="-1">

        <b>Pavel Bauer</b>

        <br>

        <br>

        System Architect<br>

        <br>

        Mobile: +420 777 481 574

        <br>

        <b>Algotech</b> | FUTURAMA Business Park | Sokolovska 668/136 D

        | 186 00 Praha 8 | Czech republic

        <br>

        E-mail: <a href="mailto:pbauer@algotech.cz">pbauer@algotech.cz</a>|

        Web: <a href="http://www.algotech.cz">www.algotech.cz</a>

        <br>

        <br>

        <a

href="http://www.linkedink.c.com/company/2242954?trk=tyah&amp;trkInfo=tas%3AALGOTECH%20C%2Cidx%3A1-1-1">LinkedIn</a>

        | <a href="https://www.facebooom/AlgotechCzechRepublic">Facebook</a>

        | <a href="https://twitter.com/AlgotechCZ">Twitter</a>

      </font></div>

  </body>

</html>