<div dir="ltr"><div><div>Hi,<br><br></div>Could you please share your volume info output?<br><br></div>-Krutika<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 10, 2017 at 6:41 PM, <a href="mailto:paf1@email.cz">paf1@email.cz</a> <span dir="ltr">&lt;<a href="mailto:paf1@email.cz" target="_blank">paf1@email.cz</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000066">
    <font face="Ubuntu">freez / freezing<br>
      IO operations are paused from any reasons<br>
      available posibilities are<br>
      1) net - any tcp framework collapse <br>
      2) gluster interconnect due gluster daemon - process hang ??<br>
      3) VSD - pause managed services<br>
      4) XFS - RW issues<br>
      5) swap overfulled - any processes are killed - but why swap is
      full if max 30% of mem (196 GB )  is used by VMs ? ( unmanaged
      process forking ) <br>
      <br>
      regs<br>
      <br>
    </font><div><div class="h5"><br>
    <div class="m_7208465215679738326moz-cite-prefix">On 03/10/2017 01:56 PM, Nir Soffer
      wrote:<br>
    </div>
    <blockquote type="cite">
      <pre>On Fri, Mar 10, 2017 at 1:07 PM, <a class="m_7208465215679738326moz-txt-link-abbreviated" href="mailto:paf1@email.cz" target="_blank">paf1@email.cz</a> <a class="m_7208465215679738326moz-txt-link-rfc2396E" href="mailto:paf1@email.cz" target="_blank">&lt;paf1@email.cz&gt;</a> wrote:
</pre>
      <blockquote type="cite">
        <pre>Hello everybody,

for production usage i&#39;m testing  ovirt with gluster.
All components seems to be running fine but whenever I&#39;m testing huge
workload, then node freez. Not the main OS, but VDSM mgmt and attached
services, VMs eg.
</pre>
      </blockquote>
      <pre>What do you mean by freez?

</pre>
      <blockquote type="cite">
        <pre>mgmt
oVirt - 4.1.0.4
centos 7.3-1611


nodes ( installed from ovirt image
&quot;ovirt-node-ng-installer-<wbr>ovirt-4.1-2017030804.iso&quot;  )

OS Version: == RHEL - 7 - 3.1611.el7.centos
OS Description:== oVirt Node 4.1.0
Kernel Version:== 3.10.0 - 514.10.2.el7.x86_64
KVM Version:== 2.6.0 - 28.el7_3.3.1
LIBVIRT Version:== libvirt-2.0.0-10.el7_3.5
VDSM Version:== vdsm-4.19.4-1.el7.centos
SPICE Version:== 0.12.4 - 20.el7_3
GlusterFS Version:== glusterfs-3.8.9-1.el7  ( LVM thinprovisioning in
replica 2 - created from ovirt GUI )

concurently running
- huge import from export domain    ( net workload )
- sequential write to VMs local disk ( gluster replica sequential workload )
- VMs database huge select  (  random IOps )
- huge old snapshot delete  ( random IOps )

In this configuration / workload  is  runnig one hour eg, with no exceptions
, with 70-80% disk load, but in some point VDSM freez  all jobs for a
timeout and VM&#39;s are in &quot;uknown&quot; status .
The whole system revitalize then automaticaly in cca 20min time frame (
except the import and snapshot deleting(rollback) )

engine.log  - focus 10:39:07 time  ( Failed in &#39;HSMGetAllTasksStatusesVDS&#39;
method )
========

n child command id: &#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39;
type:&#39;<wbr>RemoveSnapshotSingleDiskLive&#39; to complete
2017-03-10 10:39:01,727+01 INFO
[org.ovirt.engine.core.bll.<wbr>snapshots.<wbr>RemoveSnapshotSingleDiskLiveCo<wbr>mmandCallback]
(DefaultQuartzScheduler2) [759c8e1f] Command &#39;RemoveSnapshotSingleDiskLive&#39;
(id: &#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39;) waiting on child command id:
&#39;33df2c1e-6ce3-44fd-a39b-<wbr>d111883b4c4e&#39; type:&#39;DestroyImage&#39; to complete
2017-03-10 10:39:03,929+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] START,
GlusterServersListVDSCommand(<wbr>HostName = 2kvm1,
VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 446d0cd3
2017-03-10 10:39:04,343+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] FINISH,
GlusterServersListVDSCommand, return: [<a href="http://172.16.5.163/24:CONNECTED" target="_blank">172.16.5.163/24:CONNECTED</a>,
16.0.0.164:CONNECTED], log id: 446d0cd3
2017-03-10 10:39:04,353+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] START,
GlusterVolumesListVDSCommand(<wbr>HostName = 2kvm1,
GlusterVolumesListVDSParameter<wbr>s:{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 69ea1fda
2017-03-10 10:39:05,128+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] FINISH,
GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-<wbr>a315a5eb9b22=org.ovirt.engine.<wbr>core.common.businessentities.<wbr>gluster.GlusterVolumeEntity@<wbr>7765e4ad},
log id: 69ea1fda
2017-03-10 10:39:07,163+01 ERROR
[org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>HSMGetAllTasksStatusesVDSComma<wbr>nd]
(DefaultQuartzScheduler2) [759c8e1f] Failed in &#39;HSMGetAllTasksStatusesVDS&#39;
method
2017-03-10 10:39:07,178+01 ERROR
[org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]
(DefaultQuartzScheduler2) [759c8e1f] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,<wbr>802), Correlation ID: null, Call Stack: null,
Custom Event ID: -1, Message: VDSM 2kvm2 command HSMGetAllTasksStatusesVDS
failed: Connection timed out
2017-03-10 10:39:07,182+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.SPMAsyncTask] (DefaultQuartzScheduler2)
[759c8e1f] BaseAsyncTask::<wbr>onTaskEndSuccess: Task
&#39;f594bf69-619b-4d1b-8f6d-<wbr>a9826997e478&#39; (Parent Command &#39;ImportVm&#39;,
Parameters Type
&#39;org.ovirt.engine.core.common.<wbr>asynctasks.<wbr>AsyncTaskParameters&#39;) ended
successfully.
2017-03-10 10:39:07,182+01 INFO
[org.ovirt.engine.core.bll.<wbr>CommandMultiAsyncTasks] (DefaultQuartzScheduler2)
[759c8e1f] Task with DB Task ID &#39;a05c7c07-9b98-4ab2-ac7b-<wbr>9e70a75ba7b7&#39; and
VDSM Task ID &#39;7c60369f-70a3-4a6a-80c9-<wbr>4753ac9ed372&#39; is in state Polling. End
action for command 8deb3fe3-4a83-4605-816c-<wbr>ffdc63fd9ac1 will proceed when
all the entity&#39;s tasks are completed.
2017-03-10 10:39:07,182+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.SPMAsyncTask] (DefaultQuartzScheduler2)
[759c8e1f] SPMAsyncTask::PollTask: Polling task
&#39;f351e8f6-6dd7-49aa-bf54-<wbr>650d84fc6352&#39; (Parent Command &#39;DestroyImage&#39;,
Parameters Type
&#39;org.ovirt.engine.core.common.<wbr>asynctasks.<wbr>AsyncTaskParameters&#39;) returned
status &#39;finished&#39;, result &#39;cleanSuccess&#39;.
2017-03-10 10:39:07,182+01 ERROR
[org.ovirt.engine.core.bll.<wbr>tasks.SPMAsyncTask] (DefaultQuartzScheduler2)
[759c8e1f] BaseAsyncTask::<wbr>logEndTaskFailure: Task
&#39;f351e8f6-6dd7-49aa-bf54-<wbr>650d84fc6352&#39; (Parent Command &#39;DestroyImage&#39;,
Parameters Type
&#39;org.ovirt.engine.core.common.<wbr>asynctasks.<wbr>AsyncTaskParameters&#39;) ended with
failure:
-- Result: &#39;cleanSuccess&#39;
-- Message: &#39;VDSGenericException: VDSErrorException: Failed to
HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100&#39;,
-- Exception: &#39;VDSGenericException: VDSErrorException: Failed to
HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100&#39;
2017-03-10 10:39:07,184+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.CommandAsyncTask] (DefaultQuartzScheduler2)
[759c8e1f] CommandAsyncTask::<wbr>endActionIfNecessary: All tasks of command
&#39;33df2c1e-6ce3-44fd-a39b-<wbr>d111883b4c4e&#39; has ended -&gt; executing &#39;endAction&#39;
2017-03-10 10:39:07,185+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.CommandAsyncTask] (DefaultQuartzScheduler2)
[759c8e1f] CommandAsyncTask::endAction: Ending action for &#39;1&#39; tasks (command
ID: &#39;33df2c1e-6ce3-44fd-a39b-<wbr>d111883b4c4e&#39;): calling endAction &#39;.
2017-03-10 10:39:07,185+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f]
CommandAsyncTask::<wbr>endCommandAction [within thread] context: Attempting to
endAction &#39;DestroyImage&#39;,
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.<wbr>storage.disk.image.<wbr>DestroyImageCommand]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f] Command
[id=33df2c1e-6ce3-44fd-a39b-<wbr>d111883b4c4e]: Updating status to &#39;FAILED&#39;, The
command end method logic will be executed by one of its parent commands.
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f]
CommandAsyncTask::<wbr>HandleEndActionResult [within thread]: endAction for
action type &#39;DestroyImage&#39; completed, handling the result.
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f]
CommandAsyncTask::<wbr>HandleEndActionResult [within thread]: endAction for
action type &#39;DestroyImage&#39; succeeded, clearing tasks.
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.SPMAsyncTask]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f] SPMAsyncTask::ClearAsyncTask:
Attempting to clear task &#39;f351e8f6-6dd7-49aa-bf54-<wbr>650d84fc6352&#39;
2017-03-10 10:39:07,193+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>SPMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f] START,
SPMClearTaskVDSCommand(
SPMTaskGuidBaseVDSCommandParam<wbr>eters:{runAsync=&#39;true&#39;,
storagePoolId=&#39;00000001-0001-<wbr>0001-0001-000000000311&#39;,
ignoreFailoverLimit=&#39;false&#39;,
taskId=&#39;f351e8f6-6dd7-49aa-<wbr>bf54-650d84fc6352&#39;}), log id: 2b7080c2
2017-03-10 10:39:07,194+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>HSMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f] START,
HSMClearTaskVDSCommand(<wbr>HostName = 2kvm2,
HSMTaskGuidBaseVDSCommandParam<wbr>eters:{runAsync=&#39;true&#39;,
hostId=&#39;905375e1-6de4-4fdf-<wbr>b69c-b2d546f869c8&#39;,
taskId=&#39;f351e8f6-6dd7-49aa-<wbr>bf54-650d84fc6352&#39;}), log id: 2edff460
2017-03-10 10:39:08,208+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>HSMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f] FINISH,
HSMClearTaskVDSCommand, log id: 2edff460
2017-03-10 10:39:08,208+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>SPMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f] FINISH,
SPMClearTaskVDSCommand, log id: 2b7080c2
2017-03-10 10:39:08,213+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.SPMAsyncTask]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f]
BaseAsyncTask::<wbr>removeTaskFromDB: Removed task
&#39;f351e8f6-6dd7-49aa-bf54-<wbr>650d84fc6352&#39; from DataBase
2017-03-10 10:39:08,213+01 INFO
[org.ovirt.engine.core.bll.<wbr>tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-<wbr>thread-31) [759c8e1f]
CommandAsyncTask::<wbr>HandleEndActionResult [within thread]: Removing
CommandMultiAsyncTasks object for entity
&#39;33df2c1e-6ce3-44fd-a39b-<wbr>d111883b4c4e&#39;
2017-03-10 10:39:10,142+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-<wbr>49d7eabbb93c] START,
GlusterServersListVDSCommand(<wbr>HostName = 2kvm1,
VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 2e7278cb
2017-03-10 10:39:11,513+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-<wbr>49d7eabbb93c] FINISH,
GlusterServersListVDSCommand, return: [<a href="http://172.16.5.163/24:CONNECTED" target="_blank">172.16.5.163/24:CONNECTED</a>,
16.0.0.164:CONNECTED], log id: 2e7278cb
2017-03-10 10:39:11,523+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-<wbr>49d7eabbb93c] START,
GlusterVolumesListVDSCommand(<wbr>HostName = 2kvm1,
GlusterVolumesListVDSParameter<wbr>s:{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 43704ef2
2017-03-10 10:39:11,777+01 INFO
[org.ovirt.engine.core.bll.<wbr>ConcurrentChildCommandsExecuti<wbr>onCallback]
(DefaultQuartzScheduler9) [67e1d8ed] Command &#39;RemoveSnapshot&#39; (id:
&#39;13c2cb7c-0809-4971-aceb-<wbr>37ae66105ab7&#39;) waiting on child command id:
&#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39; type:&#39;<wbr>RemoveSnapshotSingleDiskLive&#39;
to complete
2017-03-10 10:39:11,789+01 WARN
[org.ovirt.engine.core.bll.<wbr>snapshots.<wbr>RemoveSnapshotSingleDiskLiveCo<wbr>mmand]
(DefaultQuartzScheduler9) [759c8e1f] Child command &#39;DESTROY_IMAGE&#39; failed,
proceeding to verify
2017-03-10 10:39:11,789+01 INFO
[org.ovirt.engine.core.bll.<wbr>snapshots.<wbr>RemoveSnapshotSingleDiskLiveCo<wbr>mmand]
(DefaultQuartzScheduler9) [759c8e1f] Executing Live Merge command step
&#39;DESTROY_IMAGE_CHECK&#39;
2017-03-10 10:39:11,832+01 INFO
[org.ovirt.engine.core.bll.<wbr>DestroyImageCheckCommand] (pool-5-thread-7)
[4856f570] Running command: DestroyImageCheckCommand internal: true.
2017-03-10 10:39:11,833+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>SPMGetVolumeInfoVDSCommand]
(pool-5-thread-7) [4856f570] START, SPMGetVolumeInfoVDSCommand(
SPMGetVolumeInfoVDSCommandPara<wbr>meters:{expectedEngineErrors=&#39;<wbr>[VolumeDoesNotExist]&#39;,
runAsync=&#39;true&#39;, storagePoolId=&#39;00000001-0001-<wbr>0001-0001-000000000311&#39;,
ignoreFailoverLimit=&#39;false&#39;,
storageDomainId=&#39;1603cd90-<wbr>92ef-4c03-922c-cecb282fd00e&#39;,
imageGroupId=&#39;7543338a-3ca6-<wbr>4698-bb50-c14f0bd71428&#39;,
imageId=&#39;50b592f7-bfba-4398-<wbr>879c-8d6a19a2c000&#39;}), log id: 2c8031f8
2017-03-10 10:39:11,833+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>SPMGetVolumeInfoVDSCommand]
(pool-5-thread-7) [4856f570] Executing GetVolumeInfo using the current SPM
2017-03-10 10:39:11,834+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>GetVolumeInfoVDSCommand]
(pool-5-thread-7) [4856f570] START, GetVolumeInfoVDSCommand(<wbr>HostName =
2kvm2,
GetVolumeInfoVDSCommandParamet<wbr>ers:{expectedEngineErrors=&#39;[<wbr>VolumeDoesNotExist]&#39;,
runAsync=&#39;true&#39;, hostId=&#39;905375e1-6de4-4fdf-<wbr>b69c-b2d546f869c8&#39;,
storagePoolId=&#39;00000001-0001-<wbr>0001-0001-000000000311&#39;,
storageDomainId=&#39;1603cd90-<wbr>92ef-4c03-922c-cecb282fd00e&#39;,
imageGroupId=&#39;7543338a-3ca6-<wbr>4698-bb50-c14f0bd71428&#39;,
imageId=&#39;50b592f7-bfba-4398-<wbr>879c-8d6a19a2c000&#39;}), log id: 79ca86cc
2017-03-10 10:39:11,846+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-<wbr>49d7eabbb93c] FINISH,
GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-<wbr>a315a5eb9b22=org.ovirt.engine.<wbr>core.common.businessentities.<wbr>gluster.GlusterVolumeEntity@<wbr>7765e4ad},
log id: 43704ef2
2017-03-10 10:39:16,858+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-<wbr>f5fe5a93062f] START,
GlusterServersListVDSCommand(<wbr>HostName = 2kvm1,
VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 6542adcd
2017-03-10 10:39:17,394+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-<wbr>f5fe5a93062f] FINISH,
GlusterServersListVDSCommand, return: [<a href="http://172.16.5.163/24:CONNECTED" target="_blank">172.16.5.163/24:CONNECTED</a>,
16.0.0.164:CONNECTED], log id: 6542adcd
2017-03-10 10:39:17,406+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-<wbr>f5fe5a93062f] START,
GlusterVolumesListVDSCommand(<wbr>HostName = 2kvm1,
GlusterVolumesListVDSParameter<wbr>s:{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 44ec33ed
2017-03-10 10:39:18,598+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-<wbr>f5fe5a93062f] FINISH,
GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-<wbr>a315a5eb9b22=org.ovirt.engine.<wbr>core.common.businessentities.<wbr>gluster.GlusterVolumeEntity@<wbr>7765e4ad},
log id: 44ec33ed
2017-03-10 10:39:21,865+01 INFO
[org.ovirt.engine.core.bll.<wbr>ConcurrentChildCommandsExecuti<wbr>onCallback]
(DefaultQuartzScheduler6) [67e1d8ed] Command &#39;RemoveSnapshot&#39; (id:
&#39;13c2cb7c-0809-4971-aceb-<wbr>37ae66105ab7&#39;) waiting on child command id:
&#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39; type:&#39;<wbr>RemoveSnapshotSingleDiskLive&#39;
to complete
2017-03-10 10:39:21,881+01 INFO
[org.ovirt.engine.core.bll.<wbr>snapshots.<wbr>RemoveSnapshotSingleDiskLiveCo<wbr>mmandCallback]
(DefaultQuartzScheduler6) [4856f570] Command &#39;RemoveSnapshotSingleDiskLive&#39;
(id: &#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39;) waiting on child command id:
&#39;b1d63b8e-19d3-4d64-8fa8-<wbr>4eb3e2d1a8fc&#39; type:&#39;DestroyImageCheck&#39; to complete
2017-03-10 10:39:23,611+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] START,
GlusterServersListVDSCommand(<wbr>HostName = 2kvm1,
VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 4c2fc22d
2017-03-10 10:39:24,616+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterTasksListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-<wbr>f5fe5a93062f] START,
GlusterTasksListVDSCommand(<wbr>HostName = 2kvm1,
VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 1f169371
2017-03-10 10:39:24,618+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] FINISH, GlusterServersListVDSCommand,
return: [<a href="http://172.16.5.163/24:CONNECTED" target="_blank">172.16.5.163/24:CONNECTED</a>, 16.0.0.164:CONNECTED], log id: 4c2fc22d
2017-03-10 10:39:24,629+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] START,
GlusterVolumesListVDSCommand(<wbr>HostName = 2kvm1,
GlusterVolumesListVDSParameter<wbr>s:{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 2ac55735
2017-03-10 10:39:24,822+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterTasksListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-<wbr>f5fe5a93062f] FINISH,
GlusterTasksListVDSCommand, return: [], log id: 1f169371
2017-03-10 10:39:26,836+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] FINISH, GlusterVolumesListVDSCommand,
return:
{8ded4083-2f31-489e-a60d-<wbr>a315a5eb9b22=org.ovirt.engine.<wbr>core.common.businessentities.<wbr>gluster.GlusterVolumeEntity@<wbr>7765e4ad},
log id: 2ac55735
2017-03-10 10:39:31,849+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] START,
GlusterServersListVDSCommand(<wbr>HostName = 2kvm1,
VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 2e8dbcd1
2017-03-10 10:39:31,932+01 INFO
[org.ovirt.engine.core.bll.<wbr>ConcurrentChildCommandsExecuti<wbr>onCallback]
(DefaultQuartzScheduler6) [67e1d8ed] Command &#39;RemoveSnapshot&#39; (id:
&#39;13c2cb7c-0809-4971-aceb-<wbr>37ae66105ab7&#39;) waiting on child command id:
&#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39; type:&#39;<wbr>RemoveSnapshotSingleDiskLive&#39;
to complete
2017-03-10 10:39:31,944+01 INFO
[org.ovirt.engine.core.bll.<wbr>snapshots.<wbr>RemoveSnapshotSingleDiskLiveCo<wbr>mmandCallback]
(DefaultQuartzScheduler6) [4856f570] Command &#39;RemoveSnapshotSingleDiskLive&#39;
(id: &#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39;) waiting on child command id:
&#39;b1d63b8e-19d3-4d64-8fa8-<wbr>4eb3e2d1a8fc&#39; type:&#39;DestroyImageCheck&#39; to complete
2017-03-10 10:39:33,213+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] FINISH,
GlusterServersListVDSCommand, return: [<a href="http://172.16.5.163/24:CONNECTED" target="_blank">172.16.5.163/24:CONNECTED</a>,
16.0.0.164:CONNECTED], log id: 2e8dbcd1
2017-03-10 10:39:33,226+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] START,
GlusterVolumesListVDSCommand(<wbr>HostName = 2kvm1,
GlusterVolumesListVDSParameter<wbr>s:{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 1fb3f9e3
2017-03-10 10:39:34,375+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-<wbr>352dc444ccc4] FINISH,
GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-<wbr>a315a5eb9b22=org.ovirt.engine.<wbr>core.common.businessentities.<wbr>gluster.GlusterVolumeEntity@<wbr>7765e4ad},
log id: 1fb3f9e3
2017-03-10 10:39:39,392+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-<wbr>58f6a51e5fa6] START,
GlusterServersListVDSCommand(<wbr>HostName = 2kvm1,
VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 1e0b8eeb
2017-03-10 10:39:40,753+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterServersListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-<wbr>58f6a51e5fa6] FINISH,
GlusterServersListVDSCommand, return: [<a href="http://172.16.5.163/24:CONNECTED" target="_blank">172.16.5.163/24:CONNECTED</a>,
16.0.0.164:CONNECTED], log id: 1e0b8eeb
2017-03-10 10:39:40,763+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-<wbr>58f6a51e5fa6] START,
GlusterVolumesListVDSCommand(<wbr>HostName = 2kvm1,
GlusterVolumesListVDSParameter<wbr>s:{runAsync=&#39;true&#39;,
hostId=&#39;86876b79-71d8-4ae1-<wbr>883b-ba010cd270e7&#39;}), log id: 35b04b33
2017-03-10 10:39:41,952+01 INFO
[org.ovirt.engine.core.<wbr>vdsbroker.gluster.<wbr>GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-<wbr>58f6a51e5fa6] FINISH,
GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-<wbr>a315a5eb9b22=org.ovirt.engine.<wbr>core.common.businessentities.<wbr>gluster.GlusterVolumeEntity@<wbr>7765e4ad},
log id: 35b04b33
2017-03-10 10:39:41,991+01 INFO
[org.ovirt.engine.core.bll.<wbr>ConcurrentChildCommandsExecuti<wbr>onCallback]
(DefaultQuartzScheduler6) [67e1d8ed] Command &#39;RemoveSnapshot&#39; (id:
&#39;13c2cb7c-0809-4971-aceb-<wbr>37ae66105ab7&#39;) waiting on child command id:
&#39;a8a3a4d5-cf7d-4423-8243-<wbr>022911232508&#39; type:&#39;<wbr>RemoveSnapshotSingleDiskLive&#39;
to complete


gluster  ( nothing in logs )
======


## &quot;etc-glusterfs-glusterd.vol.<wbr>log&quot;
[2017-03-10 10:13:52.599019] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:16:48.639635] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
The message &quot;I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1&quot; repeated 3 times between
[2017-03-10 10:16:48.639635] and [2017-03-10 10:17:55.659379]
[2017-03-10 10:18:56.875516] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:19:57.204689] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:21:56.576879] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:21:57.772857] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:24:00.617931] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:30:04.918080] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:31:06.128638] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:32:07.325672] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:32:12.433586] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:32:13.544909] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:35:10.039213] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:37:19.905314] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:37:20.174209] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:38:12.635460] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1
[2017-03-10 10:40:14.169864] I [MSGID: 106499]
[glusterd-handler.c:4349:__<wbr>glusterd_handle_status_volume] 0-management:
Received status volume req for volume slow1


## &quot;rhev-data-center-mnt-<wbr>glusterSD-localhost:_slow1.<wbr>log&quot;
[2017-03-10 09:43:40.346785] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/b6f2d08d-2441-<wbr>4111-ab62-e14abdfaf602.61849:
dentry not found in 43e6968f-9c2a-40d8-8074-<wbr>caf1a36f60cf
[2017-03-10 09:43:40.347076] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/b6f2d08d-2441-<wbr>4111-ab62-e14abdfaf602.61879:
dentry not found in 902a6e3d-b7aa-439f-8262-<wbr>cdc1b7f9f022
[2017-03-10 09:43:40.347145] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/b6f2d08d-2441-<wbr>4111-ab62-e14abdfaf602.61935:
dentry not found in 846bbcfc-f2b3-4ab6-af44-<wbr>aeaa10b39318
[2017-03-10 09:43:40.347211] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/b6f2d08d-2441-<wbr>4111-ab62-e14abdfaf602.61922:
dentry not found in 66ad3bc5-26c7-4360-b33b-<wbr>a084e3305cf8
[2017-03-10 09:43:40.351571] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/b6f2d08d-2441-<wbr>4111-ab62-e14abdfaf602.61834:
dentry not found in 3b8278e1-40e5-4363-b21e-<wbr>7bffcd024c62
[2017-03-10 09:43:40.352449] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/b6f2d08d-2441-<wbr>4111-ab62-e14abdfaf602.61870:
dentry not found in 282f4c05-e09a-48e0-96a3-<wbr>52e079ff2f73
[2017-03-10 09:50:38.829325] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/014ca3aa-<wbr>d5f5-4b88-8f84-be8d4c5dfc1e/<wbr>f147532a-89fa-49e0-8225-<wbr>f82343fca8be.meta.new
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0) =&gt;
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/014ca3aa-<wbr>d5f5-4b88-8f84-be8d4c5dfc1e/<wbr>f147532a-89fa-49e0-8225-<wbr>f82343fca8be.meta
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0)
[2017-03-10 09:50:42.221775] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/4cf7dd90-<wbr>9dcc-428c-82bc-fbf08dbee0be/<wbr>12812d56-1606-4bf8-a391-<wbr>0a2cacbd020b.meta.new
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0) =&gt;
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/4cf7dd90-<wbr>9dcc-428c-82bc-fbf08dbee0be/<wbr>12812d56-1606-4bf8-a391-<wbr>0a2cacbd020b.meta
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0)
[2017-03-10 09:50:45.956432] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/3cef54b4-<wbr>45b9-4f5b-82c2-fcc8def06a37/<wbr>85287865-38f0-45df-9e6c-<wbr>1294913cbb88.meta.new
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0) =&gt;
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/3cef54b4-<wbr>45b9-4f5b-82c2-fcc8def06a37/<wbr>85287865-38f0-45df-9e6c-<wbr>1294913cbb88.meta
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0)
[2017-03-10 09:50:40.349563] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/014ca3aa-<wbr>d5f5-4b88-8f84-be8d4c5dfc1e/<wbr>f147532a-89fa-49e0-8225-<wbr>f82343fca8be.meta.new
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0) =&gt;
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/014ca3aa-<wbr>d5f5-4b88-8f84-be8d4c5dfc1e/<wbr>f147532a-89fa-49e0-8225-<wbr>f82343fca8be.meta
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0)
[2017-03-10 09:50:44.503866] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/4cf7dd90-<wbr>9dcc-428c-82bc-fbf08dbee0be/<wbr>12812d56-1606-4bf8-a391-<wbr>0a2cacbd020b.meta.new
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0) =&gt;
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/4cf7dd90-<wbr>9dcc-428c-82bc-fbf08dbee0be/<wbr>12812d56-1606-4bf8-a391-<wbr>0a2cacbd020b.meta
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0)
[2017-03-10 09:59:46.860762] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/6e105aa3-a3fc-<wbr>4aca-be50-78b7642c4072.6684:
dentry not found in d1e65eea-8758-4407-ac2e-<wbr>3605dc661364
[2017-03-10 10:02:22.500865] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/6e105aa3-a3fc-<wbr>4aca-be50-78b7642c4072.8767:
dentry not found in e228bb28-9602-4f8e-8323-<wbr>7434d77849fc
[2017-03-10 10:04:03.103839] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
0-inode:
be318638-e8a0-4c6d-977d-<wbr>7a937aa84806/6e105aa3-a3fc-<wbr>4aca-be50-78b7642c4072.9787:
dentry not found in 6be71632-aa36-4975-b673-<wbr>1357e0355027
[2017-03-10 10:06:02.406385] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/2a9c1c6a-<wbr>f045-4dce-a47b-95a2267eef72/<wbr>6f264695-0669-4b49-a2f6-<wbr>e6c92482f2fb.meta.new
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0) =&gt;
/1603cd90-92ef-4c03-922c-<wbr>cecb282fd00e/images/2a9c1c6a-<wbr>f045-4dce-a47b-95a2267eef72/<wbr>6f264695-0669-4b49-a2f6-<wbr>e6c92482f2fb.meta
(hash=slow1-replicate-0/cache=<wbr>slow1-replicate-0)
... no other record


messages
========

several times occured:

Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
Mar 10 09:10:01 2kvm2 systemd: Started Session 274 of user root.
Mar 10 09:10:01 2kvm2 systemd: Starting Session 274 of user root.
Mar 10 09:20:02 2kvm2 systemd: Started Session 275 of user root.
Mar 10 09:20:02 2kvm2 systemd: Starting Session 275 of user root.
Mar 10 09:22:59 2kvm2 sanlock[1673]: 2017-03-10 09:22:59+0100 136031 [2576]:
s3 delta_renew long write time 11 sec
</pre>
      </blockquote>
      <pre>Sanlock cannot write to storage

</pre>
      <blockquote type="cite">
        <pre>Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2,
mode:0x104020
</pre>
      </blockquote>
      <pre>Kernel cannot allocate page?

</pre>
      <blockquote type="cite">
        <pre>Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G
I    ------------   3.10.0-514.10.2.el7.x86_64 #1
Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+,
BIOS 1.0a 08/29/2014
Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9
ffff88301f4839d8 ffffffff816864ef
Mar 10 09:24:03 2kvm2 kernel: ffff88301f483a68 ffffffff81186ba0
000068fc00000000 0000000000000000
Mar 10 09:24:03 2kvm2 kernel: fffffffffffffffc 0010402000000000
ffff88301567ae80 00000000f7228dc9
Mar 10 09:24:03 2kvm2 kernel: Call Trace:
Mar 10 09:24:03 2kvm2 kernel: &lt;IRQ&gt;  [&lt;ffffffff816864ef&gt;]
dump_stack+0x19/0x1b
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81186ba0&gt;]
warn_alloc_failed+0x110/0x180
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81682083&gt;]
__alloc_pages_slowpath+0x6b7/<wbr>0x725
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8118b155&gt;]
__alloc_pages_nodemask+0x405/<wbr>0x420
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811cf30a&gt;]
alloc_pages_current+0xaa/0x170
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81185a7e&gt;] __get_free_pages+0xe/0x50
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811dabae&gt;]
kmalloc_order_trace+0x2e/0xa0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811dd381&gt;] __kmalloc+0x221/0x240
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffffa02f83fa&gt;]
bnx2x_frag_alloc.isra.62+0x2a/<wbr>0x40 [bnx2x]
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffffa02f92f7&gt;] bnx2x_rx_int+0x227/0x17b0
[bnx2x]
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81033669&gt;] ? sched_clock+0x9/0x10
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffffa02fc72d&gt;] bnx2x_poll+0x1dd/0x260
[bnx2x]
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff815705e0&gt;] net_rx_action+0x170/0x380
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8108f2cf&gt;] __do_softirq+0xef/0x280
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8169859c&gt;] call_softirq+0x1c/0x30
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8102d365&gt;] do_softirq+0x65/0xa0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8108f665&gt;] irq_exit+0x115/0x120
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81699138&gt;] do_IRQ+0x58/0xf0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8168e2ad&gt;]
common_interrupt+0x6d/0x6d
Mar 10 09:24:03 2kvm2 kernel: &lt;EOI&gt;  [&lt;ffffffff81189a73&gt;] ?
free_hot_cold_page+0x103/0x160
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81189b16&gt;]
free_hot_cold_page_list+0x46/<wbr>0xa0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81195193&gt;]
shrink_page_list+0x543/0xb00
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81195dda&gt;]
shrink_inactive_list+0x1fa/<wbr>0x630
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81196975&gt;] shrink_lruvec+0x385/0x770
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81196dd6&gt;] shrink_zone+0x76/0x1a0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff8119807c&gt;] balance_pgdat+0x48c/0x5e0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81198343&gt;] kswapd+0x173/0x450
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b17d0&gt;] ?
wake_up_atomic_t+0x30/0x30
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff811981d0&gt;] ?
balance_pgdat+0x5e0/0x5e0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b06ff&gt;] kthread+0xcf/0xe0
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b0630&gt;] ?
kthread_create_on_node+0x140/<wbr>0x140
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff81696a58&gt;] ret_from_fork+0x58/0x90
Mar 10 09:24:03 2kvm2 kernel: [&lt;ffffffff810b0630&gt;] ?
kthread_create_on_node+0x140/<wbr>0x140
Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2,
mode:0x104020
Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G
I    ------------   3.10.0-514.10.2.el7.x86_64 #1
Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+,
BIOS 1.0a 08/29/2014
Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9
ffff88301f4839d8 ffffffff816864ef


and in critical time again

Mar 10 10:37:53 2kvm2 sanlock[1673]: 2017-03-10 10:37:53+0100 140524 [1673]:
s3 check_our_lease warning 73 last_success 140451
Mar 10 10:37:54 2kvm2 sanlock[1673]: 2017-03-10 10:37:54+0100 140525 [1673]:
s3 check_our_lease warning 74 last_success 140451
</pre>
      </blockquote>
      <pre>Sanlock could not renew the lease for 74 seconds

</pre>
      <blockquote type="cite">
        <pre>Mar 10 10:37:54 2kvm2 wdmd[1732]: test warning now 140526 ping 140516 close
0 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-<wbr>922c-cecb282fd00e:1
Mar 10 10:37:54 2kvm2 kernel: watchdog watchdog0: watchdog did not stop!
Mar 10 10:37:54 2kvm2 wdmd[1732]: /dev/watchdog0 closed unclean
Mar 10 10:37:55 2kvm2 sanlock[1673]: 2017-03-10 10:37:55+0100 140526 [1673]:
s3 check_our_lease warning 75 last_success 140451
Mar 10 10:37:55 2kvm2 wdmd[1732]: test warning now 140527 ping 140516 close
140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-<wbr>922c-cecb282fd00e:1
Mar 10 10:37:56 2kvm2 sanlock[1673]: 2017-03-10 10:37:56+0100 140527 [1673]:
s3 check_our_lease warning 76 last_success 140451
Mar 10 10:37:56 2kvm2 wdmd[1732]: test warning now 140528 ping 140516 close
140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-<wbr>922c-cecb282fd00e:1
Mar 10 10:37:57 2kvm2 sanlock[1673]: 2017-03-10 10:37:57+0100 140528 [1673]:
s3 check_our_lease warning 77 last_success 140451
Mar 10 10:37:57 2kvm2 wdmd[1732]: test warning now 140529 ping 140516 close
140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-<wbr>922c-cecb282fd00e:1
Mar 10 10:37:58 2kvm2 sanlock[1673]: 2017-03-10 10:37:58+0100 140529 [1673]:
s3 check_our_lease warning 78 last_success 140451
Mar 10 10:37:58 2kvm2 wdmd[1732]: test warning now 140530 ping 140516 close
140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-<wbr>922c-cecb282fd00e:1
Mar 10 10:37:59 2kvm2 sanlock[1673]: 2017-03-10 10:37:59+0100 140530 [1673]:
s3 check_our_lease warning 79 last_success 140451
Mar 10 10:37:59 2kvm2 wdmd[1732]: test failed rem 55 now 140531 ping 140516
close 140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-<wbr>922c-cecb282fd00e:1
Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]:
s3 check_our_lease failed 80
</pre>
      </blockquote>
      <pre>Sanlock fail to renew the lease after 80 seconds - game over

</pre>
      <blockquote type="cite">
        <pre>Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]:
s3 all pids clear
</pre>
      </blockquote>
      <pre>If this host is the SPM, sanlock just killed vdsm, this explains why
your storage operation fail.

</pre>
      <blockquote type="cite">
        <pre>Mar 10 10:38:01 2kvm2 wdmd[1732]: /dev/watchdog0 reopen
Mar 10 10:38:10 2kvm2 journal: Cannot start job (query, none) for domain
TEST-LBS_EBSAPP; current job is (query, none) owned by (3284
remoteDispatchConnectGetAllDom<wbr>ainStats, 0 &lt;null&gt;) for (62s, 0s)
Mar 10 10:38:10 2kvm2 journal: Timed out during operation: cannot acquire
state change lock (held by remoteDispatchConnectGetAllDom<wbr>ainStats)
Mar 10 10:38:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
from &lt;yajsonrpc.betterAsyncore.<wbr>Dispatcher connected (&#39;::1&#39;, 40590, 0, 0) at
0x3acdd88&gt;: unexpected eof
Mar 10 10:38:40 2kvm2 journal: Cannot start job (query, none) for domain
TEST1-LBS_ATRYA; current job is (query, none) owned by (3288
remoteDispatchConnectGetAllDom<wbr>ainStats, 0 &lt;null&gt;) for (47s, 0s)
Mar 10 10:38:40 2kvm2 journal: Timed out during operation: cannot acquire
state change lock (held by remoteDispatchConnectGetAllDom<wbr>ainStats)
Mar 10 10:38:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
from &lt;yajsonrpc.betterAsyncore.<wbr>Dispatcher connected (&#39;::1&#39;, 40592, 0, 0) at
0x3fd5b90&gt;: unexpected eof
Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain
TEST-LBS_EBSAPP; current job is (query, none) owned by (3284
remoteDispatchConnectGetAllDom<wbr>ainStats, 0 &lt;null&gt;) for (122s, 0s)
Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire
state change lock (held by remoteDispatchConnectGetAllDom<wbr>ainStats)
Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain
TEST1-LBS_ATRYA; current job is (query, none) owned by (3288
remoteDispatchConnectGetAllDom<wbr>ainStats, 0 &lt;null&gt;) for (77s, 0s)
Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire
state change lock (held by remoteDispatchConnectGetAllDom<wbr>ainStats)
Mar 10 10:39:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
from &lt;yajsonrpc.betterAsyncore.<wbr>Dispatcher connected (&#39;::1&#39;, 40594, 0, 0) at
0x2447290&gt;: unexpected eof
Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]:
s3 delta_renew write time 140 error -202
Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]:
s3 renewal error -202 delta_length 144 last_success 140451
Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain
TEST-LBS_EBSAPP; current job is (query, none) owned by (3284
remoteDispatchConnectGetAllDom<wbr>ainStats, 0 &lt;null&gt;) for (152s, 0s)
Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire
state change lock (held by remoteDispatchConnectGetAllDom<wbr>ainStats)
Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain
TEST1-LBS_ATRYA; current job is (query, none) owned by (3288
remoteDispatchConnectGetAllDom<wbr>ainStats, 0 &lt;null&gt;) for (107s, 0s)
Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire
state change lock (held by remoteDispatchConnectGetAllDom<wbr>ainStats)
Mar 10 10:39:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
from &lt;yajsonrpc.betterAsyncore.<wbr>Dispatcher connected (&#39;::1&#39;, 40596, 0, 0) at
0x2472ef0&gt;: unexpected eof
Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42107 blocked for more
than 120 seconds.
</pre>
      </blockquote>
      <pre>qemu-img is blocked for more than 120 seconds.

</pre>
      <blockquote type="cite">
        <pre>Mar 10 10:39:49 2kvm2 kernel: &quot;echo 0 &gt;
/proc/sys/kernel/hung_task_<wbr>timeout_secs&quot; disables this message.
Mar 10 10:39:49 2kvm2 kernel: qemu-img        D ffff88010dad3e30     0 42107
3631 0x00000080
Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3b30 0000000000000082
ffff8814491f4e70 ffff88010dad3fd8
Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3fd8 ffff88010dad3fd8
ffff8814491f4e70 ffff88301f096c40
Mar 10 10:39:49 2kvm2 kernel: 0000000000000000 7fffffffffffffff
ffff88181f186c00 ffff88010dad3e30
Mar 10 10:39:49 2kvm2 kernel: Call Trace:
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8168bbb9&gt;] schedule+0x29/0x70
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81689609&gt;]
schedule_timeout+0x239/0x2d0
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8168b15e&gt;]
io_schedule_timeout+0xae/0x130
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8168b1f8&gt;] io_schedule+0x18/0x20
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff8124d9e5&gt;]
wait_on_sync_kiocb+0x35/0x80
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffffa0a36091&gt;]
fuse_direct_IO+0x231/0x380 [fuse]
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff812a6ddd&gt;] ?
cap_inode_need_killpriv+0x2d/<wbr>0x40
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff812a8cb6&gt;] ?
security_inode_need_killpriv+<wbr>0x16/0x20
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81219e3f&gt;] ?
dentry_needs_remove_privs.<wbr>part.13+0x1f/0x30
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81182a2d&gt;]
generic_file_direct_write+<wbr>0xcd/0x190
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffffa0a36905&gt;]
fuse_file_aio_write+0x185/<wbr>0x340 [fuse]
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff811fdabd&gt;] do_sync_write+0x8d/0xd0
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff811fe32d&gt;] vfs_write+0xbd/0x1e0
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff811ff002&gt;] SyS_pwrite64+0x92/0xc0
Mar 10 10:39:49 2kvm2 kernel: [&lt;ffffffff81696b09&gt;]
system_call_fastpath+0x16/0x1b
Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42111 blocked for more
than 120 seconds.
Mar 10 10:39:49 2kvm2 kernel: &quot;echo 0 &gt;
/proc/sys/kernel/hung_task_<wbr>timeout_secs&quot; disables this message.
Mar 10 10:39:49 2kvm2 kernel: qemu-img        D ffff8818a76e7e30     0 42111
3632 0x00000080
Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7b30 0000000000000082
ffff88188aaeaf10 ffff8818a76e7fd8
Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7fd8 ffff8818a76e7fd8
ffff88188aaeaf10 ffff88301f156c40

memory
=======

# cat /proc/meminfo
MemTotal:       197983472 kB
MemFree:          834228 kB
MemAvailable:   165541204 kB
Buffers:           45548 kB
Cached:         159596272 kB
SwapCached:       119872 kB
Active:         40803264 kB
Inactive:       148022076 kB
Active(anon):   26594112 kB
Inactive(anon):  2626384 kB
Active(file):   14209152 kB
Inactive(file): 145395692 kB
Unevictable:       50488 kB
Mlocked:           50488 kB
SwapTotal:       4194300 kB
SwapFree:        3612188 kB
Dirty:               624 kB
Writeback:             0 kB
AnonPages:      29185032 kB
Mapped:            85176 kB
Shmem:             25908 kB
Slab:            6203384 kB
SReclaimable:    5857240 kB
SUnreclaim:       346144 kB
KernelStack:       19184 kB
PageTables:        86100 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    103186036 kB
Committed_AS:   52300288 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1560580 kB
VmallocChunk:   34257341440 kB
HardwareCorrupted:     0 kB
AnonHugePages:   5566464 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      431292 kB
DirectMap2M:    19382272 kB
DirectMap1G:    183500800 kB


can anybody help me with this ??
I&#39;ve got a small tip about swap problem ( in messages), but not shure .....
The similar problem occured in older versions in gluster/ovirt testing ( in
huge workload freeez - but not fatal overload )
</pre>
      </blockquote>
      <pre>You have a storage issue, you should understand why
your storage is failing.

There is also kernel failure to allocated page, maybe this is
related to the storage failure?

Nir
</pre>
    </blockquote>
    <br>
  </div></div></div>

<br>______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
<br></blockquote></div><br></div>