<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000066">
<font face="Ubuntu">Hello everybody,<br>
<br>
for production usage i'm testing ovirt with gluster.<br>
All components seems to be running fine but whenever I'm testing
huge workload, then node freez. Not the main OS, but VDSM mgmt and
attached services, VMs eg.<br>
<br>
<b>mgmt </b><br>
oVirt - 4.1.0.4<br>
centos 7.3-1611<br>
<br>
<br>
<b>nodes</b> ( installed from ovirt image <i>"ovirt-node-ng-installer-ovirt-4.1-2017030804.iso"
)</i><br>
</font><br>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">OS Version:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row0_value"
class="GOJECEMBACD"> == RHEL - 7 - 3.1611.el7.centos</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">OS Description:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row1_value"
class="GOJECEMBACD">== oVirt Node 4.1.0</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">Kernel Version:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row2_value"
class="GOJECEMBACD">== 3.10.0 - 514.10.2.el7.x86_64</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">KVM Version:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row3_value"
class="GOJECEMBACD">== 2.6.0 - 28.el7_3.3.1</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">LIBVIRT Version:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row4_value"
class="GOJECEMBACD">== libvirt-2.0.0-10.el7_3.5</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">VDSM Version:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row5_value"
class="GOJECEMBACD">== vdsm-4.19.4-1.el7.centos</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">SPICE Version:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row6_value"
class="GOJECEMBACD">== 0.12.4 - 20.el7_3</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="col-md-10">GlusterFS Version:<span
id="SubTabHostGeneralSoftwareView_formPanel_col0_row7_value"
class="GOJECEMBACD">== glusterfs-3.8.9-1.el7 ( LVM
thinprovisioning in replica 2 - created from ovirt GUI )<br>
</span></div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12">
<div class="row">
<div class="col-md-2">
<div class="GOJECEMBPBD"
id="SubTabHostGeneralSoftwareView_formPanel_col0_row8_label"><br>
</div>
</div>
</div>
</div>
</div>
<font face="Ubuntu">concurently running<br>
- huge import from export domain ( net workload )<br>
- sequential write to VMs local disk ( gluster replica sequential
workload )<br>
- VMs database huge select ( random IOps )<br>
- huge old snapshot delete ( random IOps )<br>
<br>
In this configuration / workload is runnig one hour eg, with no
exceptions , with 70-80% disk load, but in some point VDSM freez
all jobs for a timeout and VM's are in "uknown" status .<br>
The whole system revitalize then automaticaly in cca 20min time
frame ( except the import and snapshot deleting(rollback) )<br>
<br>
engine.log - focus 10:39:07 time ( </font><font face="Ubuntu"><font
face="Ubuntu">Failed in 'HSMGetAllTasksStatusesVDS' method )<br>
</font>========<br>
<br>
<font size="-1">n child command id:
'a8a3a4d5-cf7d-4423-8243-022911232508'
type:'RemoveSnapshotSingleDiskLive' to complete<br>
2017-03-10 10:39:01,727+01 INFO
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]
(DefaultQuartzScheduler2) [759c8e1f] Command
'RemoveSnapshotSingleDiskLive' (id:
'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command
id: '33df2c1e-6ce3-44fd-a39b-d111883b4c4e' type:'DestroyImage'
to complete<br>
2017-03-10 10:39:03,929+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4]
START, GlusterServersListVDSCommand(HostName = 2kvm1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
446d0cd3<br>
2017-03-10 10:39:04,343+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4]
FINISH, GlusterServersListVDSCommand, return:
[172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id:
446d0cd3<br>
2017-03-10 10:39:04,353+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4]
START, GlusterVolumesListVDSCommand(HostName = 2kvm1,
GlusterVolumesListVDSParameters:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
69ea1fda<br>
2017-03-10 10:39:05,128+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4]
FINISH, GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},
log id: 69ea1fda<br>
2017-03-10 10:39:07,163+01 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
(DefaultQuartzScheduler2) [759c8e1f] Failed in
'HSMGetAllTasksStatusesVDS' method<br>
2017-03-10 10:39:07,178+01 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler2) [759c8e1f] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call
Stack: null, Custom Event ID: -1, Message: VDSM 2kvm2 command
HSMGetAllTasksStatusesVDS failed: Connection timed out<br>
2017-03-10 10:39:07,182+01 INFO
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(DefaultQuartzScheduler2) [759c8e1f]
BaseAsyncTask::onTaskEndSuccess: Task
'f594bf69-619b-4d1b-8f6d-a9826997e478' (Parent Command
'ImportVm', Parameters Type
'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters')
ended successfully.<br>
2017-03-10 10:39:07,182+01 INFO
[org.ovirt.engine.core.bll.CommandMultiAsyncTasks]
(DefaultQuartzScheduler2) [759c8e1f] Task with DB Task ID
'a05c7c07-9b98-4ab2-ac7b-9e70a75ba7b7' and VDSM Task ID
'7c60369f-70a3-4a6a-80c9-4753ac9ed372' is in state Polling. End
action for command 8deb3fe3-4a83-4605-816c-ffdc63fd9ac1 will
proceed when all the entity's tasks are completed.<br>
2017-03-10 10:39:07,182+01 INFO
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(DefaultQuartzScheduler2) [759c8e1f] SPMAsyncTask::PollTask:
Polling task 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent
Command 'DestroyImage', Parameters Type
'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters')
returned status 'finished', result 'cleanSuccess'.<br>
2017-03-10 10:39:07,182+01 ERROR
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(DefaultQuartzScheduler2) [759c8e1f]
BaseAsyncTask::logEndTaskFailure: Task
'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent Command
'DestroyImage', Parameters Type
'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters')
ended with failure:<br>
-- Result: 'cleanSuccess'<br>
-- Message: 'VDSGenericException: VDSErrorException: Failed to
HSMGetAllTasksStatusesVDS, error = Connection timed out, code =
100',<br>
-- Exception: 'VDSGenericException: VDSErrorException: Failed to
HSMGetAllTasksStatusesVDS, error = Connection timed out, code =
100'<br>
2017-03-10 10:39:07,184+01 INFO
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(DefaultQuartzScheduler2) [759c8e1f]
CommandAsyncTask::endActionIfNecessary: All tasks of command
'33df2c1e-6ce3-44fd-a39b-d111883b4c4e' has ended -> executing
'endAction'<br>
2017-03-10 10:39:07,185+01 INFO
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(DefaultQuartzScheduler2) [759c8e1f]
CommandAsyncTask::endAction: Ending action for '1' tasks
(command ID: '33df2c1e-6ce3-44fd-a39b-d111883b4c4e'): calling
endAction '.<br>
2017-03-10 10:39:07,185+01 INFO
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f]
CommandAsyncTask::endCommandAction [within thread] context:
Attempting to endAction 'DestroyImage',<br>
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f] Command
[id=33df2c1e-6ce3-44fd-a39b-d111883b4c4e]: Updating status to
'FAILED', The command end method logic will be executed by one
of its parent commands.<br>
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f]
CommandAsyncTask::HandleEndActionResult [within thread]:
endAction for action type 'DestroyImage' completed, handling the
result.<br>
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f]
CommandAsyncTask::HandleEndActionResult [within thread]:
endAction for action type 'DestroyImage' succeeded, clearing
tasks.<br>
2017-03-10 10:39:07,192+01 INFO
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f]
SPMAsyncTask::ClearAsyncTask: Attempting to clear task
'f351e8f6-6dd7-49aa-bf54-650d84fc6352'<br>
2017-03-10 10:39:07,193+01 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f] START,
SPMClearTaskVDSCommand(
SPMTaskGuidBaseVDSCommandParameters:{runAsync='true',
storagePoolId='00000001-0001-0001-0001-000000000311',
ignoreFailoverLimit='false',
taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id:
2b7080c2<br>
2017-03-10 10:39:07,194+01 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f] START,
HSMClearTaskVDSCommand(HostName = 2kvm2,
HSMTaskGuidBaseVDSCommandParameters:{runAsync='true',
hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8',
taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id:
2edff460<br>
2017-03-10 10:39:08,208+01 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH,
HSMClearTaskVDSCommand, log id: 2edff460<br>
2017-03-10 10:39:08,208+01 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH,
SPMClearTaskVDSCommand, log id: 2b7080c2<br>
2017-03-10 10:39:08,213+01 INFO
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f]
BaseAsyncTask::removeTaskFromDB: Removed task
'f351e8f6-6dd7-49aa-bf54-650d84fc6352' from DataBase<br>
2017-03-10 10:39:08,213+01 INFO
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-31) [759c8e1f]
CommandAsyncTask::HandleEndActionResult [within thread]:
Removing CommandMultiAsyncTasks object for entity
'33df2c1e-6ce3-44fd-a39b-d111883b4c4e'<br>
2017-03-10 10:39:10,142+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler10)
[a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START,
GlusterServersListVDSCommand(HostName = 2kvm1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
2e7278cb<br>
2017-03-10 10:39:11,513+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler10)
[a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH,
GlusterServersListVDSCommand, return:
[172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id:
2e7278cb<br>
2017-03-10 10:39:11,523+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler10)
[a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START,
GlusterVolumesListVDSCommand(HostName = 2kvm1,
GlusterVolumesListVDSParameters:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
43704ef2<br>
2017-03-10 10:39:11,777+01 INFO
[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
(DefaultQuartzScheduler9) [67e1d8ed] Command 'RemoveSnapshot'
(id: '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child
command id: 'a8a3a4d5-cf7d-4423-8243-022911232508'
type:'RemoveSnapshotSingleDiskLive' to complete<br>
2017-03-10 10:39:11,789+01 WARN
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
(DefaultQuartzScheduler9) [759c8e1f] Child command
'DESTROY_IMAGE' failed, proceeding to verify<br>
2017-03-10 10:39:11,789+01 INFO
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
(DefaultQuartzScheduler9) [759c8e1f] Executing Live Merge
command step 'DESTROY_IMAGE_CHECK'<br>
2017-03-10 10:39:11,832+01 INFO
[org.ovirt.engine.core.bll.DestroyImageCheckCommand]
(pool-5-thread-7) [4856f570] Running command:
DestroyImageCheckCommand internal: true.<br>
2017-03-10 10:39:11,833+01 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand]
(pool-5-thread-7) [4856f570] START, SPMGetVolumeInfoVDSCommand(
SPMGetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',
runAsync='true',
storagePoolId='00000001-0001-0001-0001-000000000311',
ignoreFailoverLimit='false',
storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e',
imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428',
imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id:
2c8031f8<br>
2017-03-10 10:39:11,833+01 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand]
(pool-5-thread-7) [4856f570] Executing GetVolumeInfo using the
current SPM<br>
2017-03-10 10:39:11,834+01 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand]
(pool-5-thread-7) [4856f570] START,
GetVolumeInfoVDSCommand(HostName = 2kvm2,
GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',
runAsync='true', hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8',
storagePoolId='00000001-0001-0001-0001-000000000311',
storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e',
imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428',
imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id:
79ca86cc<br>
2017-03-10 10:39:11,846+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler10)
[a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH,
GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},
log id: 43704ef2<br>
2017-03-10 10:39:16,858+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f]
START, GlusterServersListVDSCommand(HostName = 2kvm1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
6542adcd<br>
2017-03-10 10:39:17,394+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f]
FINISH, GlusterServersListVDSCommand, return:
[172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id:
6542adcd<br>
2017-03-10 10:39:17,406+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f]
START, GlusterVolumesListVDSCommand(HostName = 2kvm1,
GlusterVolumesListVDSParameters:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
44ec33ed<br>
2017-03-10 10:39:18,598+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f]
FINISH, GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},
log id: 44ec33ed<br>
2017-03-10 10:39:21,865+01 INFO
[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
(DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot'
(id: '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child
command id: 'a8a3a4d5-cf7d-4423-8243-022911232508'
type:'RemoveSnapshotSingleDiskLive' to complete<br>
2017-03-10 10:39:21,881+01 INFO
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]
(DefaultQuartzScheduler6) [4856f570] Command
'RemoveSnapshotSingleDiskLive' (id:
'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command
id: 'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc'
type:'DestroyImageCheck' to complete<br>
2017-03-10 10:39:23,611+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] START,
GlusterServersListVDSCommand(HostName = 2kvm1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
4c2fc22d<br>
2017-03-10 10:39:24,616+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f]
START, GlusterTasksListVDSCommand(HostName = 2kvm1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
1f169371<br>
2017-03-10 10:39:24,618+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] FINISH,
GlusterServersListVDSCommand, return:
[172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id:
4c2fc22d<br>
2017-03-10 10:39:24,629+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] START,
GlusterVolumesListVDSCommand(HostName = 2kvm1,
GlusterVolumesListVDSParameters:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
2ac55735<br>
2017-03-10 10:39:24,822+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand]
(DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f]
FINISH, GlusterTasksListVDSCommand, return: [], log id: 1f169371<br>
2017-03-10 10:39:26,836+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler6) [4856f570] FINISH,
GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},
log id: 2ac55735<br>
2017-03-10 10:39:31,849+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4]
START, GlusterServersListVDSCommand(HostName = 2kvm1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
2e8dbcd1<br>
2017-03-10 10:39:31,932+01 INFO
[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
(DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot'
(id: '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child
command id: 'a8a3a4d5-cf7d-4423-8243-022911232508'
type:'RemoveSnapshotSingleDiskLive' to complete<br>
2017-03-10 10:39:31,944+01 INFO
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]
(DefaultQuartzScheduler6) [4856f570] Command
'RemoveSnapshotSingleDiskLive' (id:
'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command
id: 'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc'
type:'DestroyImageCheck' to complete<br>
2017-03-10 10:39:33,213+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4]
FINISH, GlusterServersListVDSCommand, return:
[172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id:
2e8dbcd1<br>
2017-03-10 10:39:33,226+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4]
START, GlusterVolumesListVDSCommand(HostName = 2kvm1,
GlusterVolumesListVDSParameters:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
1fb3f9e3<br>
2017-03-10 10:39:34,375+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4]
FINISH, GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},
log id: 1fb3f9e3<br>
2017-03-10 10:39:39,392+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6]
START, GlusterServersListVDSCommand(HostName = 2kvm1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
1e0b8eeb<br>
2017-03-10 10:39:40,753+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6]
FINISH, GlusterServersListVDSCommand, return:
[172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id:
1e0b8eeb<br>
2017-03-10 10:39:40,763+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6]
START, GlusterVolumesListVDSCommand(HostName = 2kvm1,
GlusterVolumesListVDSParameters:{runAsync='true',
hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id:
35b04b33<br>
2017-03-10 10:39:41,952+01 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6]
FINISH, GlusterVolumesListVDSCommand, return:
{8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad},
log id: 35b04b33<br>
2017-03-10 10:39:41,991+01 INFO
[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
(DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot'
(id: '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child
command id: 'a8a3a4d5-cf7d-4423-8243-022911232508'
type:'RemoveSnapshotSingleDiskLive' to complete<br>
<br>
</font><br>
gluster ( nothing in logs )<br>
======<br>
<br>
<br>
</font><font face="Ubuntu">## "etc-glusterfs-glusterd.vol.log"</font><font
face="Ubuntu"><font size="-1"><br>
[2017-03-10 10:13:52.599019] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:16:48.639635] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
The message "I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1"
repeated 3 times between [2017-03-10 10:16:48.639635] and
[2017-03-10 10:17:55.659379]<br>
[2017-03-10 10:18:56.875516] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:19:57.204689] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:21:56.576879] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:21:57.772857] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:24:00.617931] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:30:04.918080] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:31:06.128638] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:32:07.325672] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:32:12.433586] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:32:13.544909] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:35:10.039213] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:37:19.905314] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:37:20.174209] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:38:12.635460] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1<br>
[2017-03-10 10:40:14.169864] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume slow1</font><br>
<br>
<br>
## "rhev-data-center-mnt-glusterSD-localhost:_slow1.log"<br>
<font size="-1">[2017-03-10 09:43:40.346785] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61849:
dentry not found in 43e6968f-9c2a-40d8-8074-caf1a36f60cf<br>
[2017-03-10 09:43:40.347076] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61879:
dentry not found in 902a6e3d-b7aa-439f-8262-cdc1b7f9f022<br>
[2017-03-10 09:43:40.347145] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61935:
dentry not found in 846bbcfc-f2b3-4ab6-af44-aeaa10b39318<br>
[2017-03-10 09:43:40.347211] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61922:
dentry not found in 66ad3bc5-26c7-4360-b33b-a084e3305cf8<br>
[2017-03-10 09:43:40.351571] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61834:
dentry not found in 3b8278e1-40e5-4363-b21e-7bffcd024c62<br>
[2017-03-10 09:43:40.352449] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61870:
dentry not found in 282f4c05-e09a-48e0-96a3-52e079ff2f73<br>
[2017-03-10 09:50:38.829325] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta.new
(hash=slow1-replicate-0/cache=slow1-replicate-0) =>
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta
(hash=slow1-replicate-0/cache=slow1-replicate-0)<br>
[2017-03-10 09:50:42.221775] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta.new
(hash=slow1-replicate-0/cache=slow1-replicate-0) =>
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta
(hash=slow1-replicate-0/cache=slow1-replicate-0)<br>
[2017-03-10 09:50:45.956432] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/3cef54b4-45b9-4f5b-82c2-fcc8def06a37/85287865-38f0-45df-9e6c-1294913cbb88.meta.new
(hash=slow1-replicate-0/cache=slow1-replicate-0) =>
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/3cef54b4-45b9-4f5b-82c2-fcc8def06a37/85287865-38f0-45df-9e6c-1294913cbb88.meta
(hash=slow1-replicate-0/cache=slow1-replicate-0)<br>
[2017-03-10 09:50:40.349563] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta.new
(hash=slow1-replicate-0/cache=slow1-replicate-0) =>
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta
(hash=slow1-replicate-0/cache=slow1-replicate-0)<br>
[2017-03-10 09:50:44.503866] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta.new
(hash=slow1-replicate-0/cache=slow1-replicate-0) =>
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta
(hash=slow1-replicate-0/cache=slow1-replicate-0)<br>
[2017-03-10 09:59:46.860762] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.6684:
dentry not found in d1e65eea-8758-4407-ac2e-3605dc661364<br>
[2017-03-10 10:02:22.500865] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.8767:
dentry not found in e228bb28-9602-4f8e-8323-7434d77849fc<br>
[2017-03-10 10:04:03.103839] W [MSGID: 101159]
[inode.c:1214:__inode_unlink] 0-inode:
be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.9787:
dentry not found in 6be71632-aa36-4975-b673-1357e0355027<br>
[2017-03-10 10:06:02.406385] I [MSGID: 109066]
[dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta.new
(hash=slow1-replicate-0/cache=slow1-replicate-0) =>
/1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta
(hash=slow1-replicate-0/cache=slow1-replicate-0)</font><br>
<font size="-1">... no other record</font><br>
<br>
<br>
messages<br>
========<br>
<br>
several times occured:<br>
<br>
<font size="-1">Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring
unsupported value for cmd.<br>
Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported
value for cmd.<br>
Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported
value for cmd.<br>
Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported
value for cmd.<br>
Mar 10 09:10:01 2kvm2 systemd: Started Session 274 of user root.<br>
Mar 10 09:10:01 2kvm2 systemd: Starting Session 274 of user
root.<br>
Mar 10 09:20:02 2kvm2 systemd: Started Session 275 of user root.<br>
Mar 10 09:20:02 2kvm2 systemd: Starting Session 275 of user
root.<br>
Mar 10 09:22:59 2kvm2 sanlock[1673]: 2017-03-10 09:22:59+0100
136031 [2576]: s3 delta_renew long write time 11 sec<br>
Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure:
order:2, mode:0x104020<br>
Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1
Tainted: G I ------------
3.10.0-514.10.2.el7.x86_64 #1<br>
Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro
X10DRC/X10DRi-LN4+, BIOS 1.0a 08/29/2014<br>
Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9
ffff88301f4839d8 ffffffff816864ef<br>
Mar 10 09:24:03 2kvm2 kernel: ffff88301f483a68 ffffffff81186ba0
000068fc00000000 0000000000000000<br>
Mar 10 09:24:03 2kvm2 kernel: fffffffffffffffc 0010402000000000
ffff88301567ae80 00000000f7228dc9<br>
Mar 10 09:24:03 2kvm2 kernel: Call Trace:<br>
Mar 10 09:24:03 2kvm2 kernel: <IRQ>
[<ffffffff816864ef>] dump_stack+0x19/0x1b<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81186ba0>]
warn_alloc_failed+0x110/0x180<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81682083>]
__alloc_pages_slowpath+0x6b7/0x725<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8118b155>]
__alloc_pages_nodemask+0x405/0x420<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811cf30a>]
alloc_pages_current+0xaa/0x170<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81185a7e>]
__get_free_pages+0xe/0x50<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811dabae>]
kmalloc_order_trace+0x2e/0xa0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811dd381>]
__kmalloc+0x221/0x240<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02f83fa>]
bnx2x_frag_alloc.isra.62+0x2a/0x40 [bnx2x]<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02f92f7>]
bnx2x_rx_int+0x227/0x17b0 [bnx2x]<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81033669>] ?
sched_clock+0x9/0x10<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02fc72d>]
bnx2x_poll+0x1dd/0x260 [bnx2x]<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff815705e0>]
net_rx_action+0x170/0x380<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8108f2cf>]
__do_softirq+0xef/0x280<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8169859c>]
call_softirq+0x1c/0x30<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8102d365>]
do_softirq+0x65/0xa0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8108f665>]
irq_exit+0x115/0x120<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81699138>]
do_IRQ+0x58/0xf0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8168e2ad>]
common_interrupt+0x6d/0x6d<br>
Mar 10 09:24:03 2kvm2 kernel: <EOI>
[<ffffffff81189a73>] ? free_hot_cold_page+0x103/0x160<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81189b16>]
free_hot_cold_page_list+0x46/0xa0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81195193>]
shrink_page_list+0x543/0xb00<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81195dda>]
shrink_inactive_list+0x1fa/0x630<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81196975>]
shrink_lruvec+0x385/0x770<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81196dd6>]
shrink_zone+0x76/0x1a0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8119807c>]
balance_pgdat+0x48c/0x5e0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81198343>]
kswapd+0x173/0x450<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b17d0>] ?
wake_up_atomic_t+0x30/0x30<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811981d0>] ?
balance_pgdat+0x5e0/0x5e0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b06ff>]
kthread+0xcf/0xe0<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b0630>] ?
kthread_create_on_node+0x140/0x140<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81696a58>]
ret_from_fork+0x58/0x90<br>
Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b0630>] ?
kthread_create_on_node+0x140/0x140<br>
Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure:
order:2, mode:0x104020<br>
Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1
Tainted: G I ------------
3.10.0-514.10.2.el7.x86_64 #1<br>
Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro
X10DRC/X10DRi-LN4+, BIOS 1.0a 08/29/2014<br>
Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9
ffff88301f4839d8 ffffffff816864ef<br>
</font><br>
<br>
and in critical time again<br>
<br>
<font size="-1">Mar 10 10:37:53 2kvm2 sanlock[1673]: 2017-03-10
10:37:53+0100 140524 [1673]: s3 check_our_lease warning 73
last_success 140451<br>
Mar 10 10:37:54 2kvm2 sanlock[1673]: 2017-03-10 10:37:54+0100
140525 [1673]: s3 check_our_lease warning 74 last_success 140451<br>
Mar 10 10:37:54 2kvm2 wdmd[1732]: test warning now 140526 ping
140516 close 0 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1<br>
Mar 10 10:37:54 2kvm2 kernel: watchdog watchdog0: watchdog did
not stop!<br>
Mar 10 10:37:54 2kvm2 wdmd[1732]: /dev/watchdog0 closed unclean<br>
Mar 10 10:37:55 2kvm2 sanlock[1673]: 2017-03-10 10:37:55+0100
140526 [1673]: s3 check_our_lease warning 75 last_success 140451<br>
Mar 10 10:37:55 2kvm2 wdmd[1732]: test warning now 140527 ping
140516 close 140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1<br>
Mar 10 10:37:56 2kvm2 sanlock[1673]: 2017-03-10 10:37:56+0100
140527 [1673]: s3 check_our_lease warning 76 last_success 140451<br>
Mar 10 10:37:56 2kvm2 wdmd[1732]: test warning now 140528 ping
140516 close 140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1<br>
Mar 10 10:37:57 2kvm2 sanlock[1673]: 2017-03-10 10:37:57+0100
140528 [1673]: s3 check_our_lease warning 77 last_success 140451<br>
Mar 10 10:37:57 2kvm2 wdmd[1732]: test warning now 140529 ping
140516 close 140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1<br>
Mar 10 10:37:58 2kvm2 sanlock[1673]: 2017-03-10 10:37:58+0100
140529 [1673]: s3 check_our_lease warning 78 last_success 140451<br>
Mar 10 10:37:58 2kvm2 wdmd[1732]: test warning now 140530 ping
140516 close 140526 renewal 140451 expire 140531 client 1673
sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1<br>
Mar 10 10:37:59 2kvm2 sanlock[1673]: 2017-03-10 10:37:59+0100
140530 [1673]: s3 check_our_lease warning 79 last_success 140451<br>
Mar 10 10:37:59 2kvm2 wdmd[1732]: test failed rem 55 now 140531
ping 140516 close 140526 renewal 140451 expire 140531 client
1673 sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1<br>
Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100
140531 [1673]: s3 check_our_lease failed 80<br>
Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100
140531 [1673]: s3 all pids clear<br>
Mar 10 10:38:01 2kvm2 wdmd[1732]: /dev/watchdog0 reopen<br>
Mar 10 10:38:10 2kvm2 journal: Cannot start job (query, none)
for domain TEST-LBS_EBSAPP; current job is (query, none) owned
by (3284 remoteDispatchConnectGetAllDomainStats, 0 <null>)
for (62s, 0s)<br>
Mar 10 10:38:10 2kvm2 journal: Timed out during operation:
cannot acquire state change lock (held by
remoteDispatchConnectGetAllDomainStats)<br>
Mar 10 10:38:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL
error receiving from <yajsonrpc.betterAsyncore.Dispatcher
connected ('::1', 40590, 0, 0) at 0x3acdd88>: unexpected eof<br>
Mar 10 10:38:40 2kvm2 journal: Cannot start job (query, none)
for domain TEST1-LBS_ATRYA; current job is (query, none) owned
by (3288 remoteDispatchConnectGetAllDomainStats, 0 <null>)
for (47s, 0s)<br>
Mar 10 10:38:40 2kvm2 journal: Timed out during operation:
cannot acquire state change lock (held by
remoteDispatchConnectGetAllDomainStats)<br>
Mar 10 10:38:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL
error receiving from <yajsonrpc.betterAsyncore.Dispatcher
connected ('::1', 40592, 0, 0) at 0x3fd5b90>: unexpected eof<br>
Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none)
for domain TEST-LBS_EBSAPP; current job is (query, none) owned
by (3284 remoteDispatchConnectGetAllDomainStats, 0 <null>)
for (122s, 0s)<br>
Mar 10 10:39:10 2kvm2 journal: Timed out during operation:
cannot acquire state change lock (held by
remoteDispatchConnectGetAllDomainStats)<br>
Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none)
for domain TEST1-LBS_ATRYA; current job is (query, none) owned
by (3288 remoteDispatchConnectGetAllDomainStats, 0 <null>)
for (77s, 0s)<br>
Mar 10 10:39:10 2kvm2 journal: Timed out during operation:
cannot acquire state change lock (held by
remoteDispatchConnectGetAllDomainStats)<br>
Mar 10 10:39:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL
error receiving from <yajsonrpc.betterAsyncore.Dispatcher
connected ('::1', 40594, 0, 0) at 0x2447290>: unexpected eof<br>
Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100
140615 [2576]: s3 delta_renew write time 140 error -202<br>
Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100
140615 [2576]: s3 renewal error -202 delta_length 144
last_success 140451<br>
Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none)
for domain TEST-LBS_EBSAPP; current job is (query, none) owned
by (3284 remoteDispatchConnectGetAllDomainStats, 0 <null>)
for (152s, 0s)<br>
Mar 10 10:39:40 2kvm2 journal: Timed out during operation:
cannot acquire state change lock (held by
remoteDispatchConnectGetAllDomainStats)<br>
Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none)
for domain TEST1-LBS_ATRYA; current job is (query, none) owned
by (3288 remoteDispatchConnectGetAllDomainStats, 0 <null>)
for (107s, 0s)<br>
Mar 10 10:39:40 2kvm2 journal: Timed out during operation:
cannot acquire state change lock (held by
remoteDispatchConnectGetAllDomainStats)<br>
Mar 10 10:39:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL
error receiving from <yajsonrpc.betterAsyncore.Dispatcher
connected ('::1', 40596, 0, 0) at 0x2472ef0>: unexpected eof<br>
Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42107 blocked
for more than 120 seconds.<br>
Mar 10 10:39:49 2kvm2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
Mar 10 10:39:49 2kvm2 kernel: qemu-img D
ffff88010dad3e30 0 42107 3631 0x00000080<br>
Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3b30 0000000000000082
ffff8814491f4e70 ffff88010dad3fd8<br>
Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3fd8 ffff88010dad3fd8
ffff8814491f4e70 ffff88301f096c40<br>
Mar 10 10:39:49 2kvm2 kernel: 0000000000000000 7fffffffffffffff
ffff88181f186c00 ffff88010dad3e30<br>
Mar 10 10:39:49 2kvm2 kernel: Call Trace:<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168bbb9>]
schedule+0x29/0x70<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81689609>]
schedule_timeout+0x239/0x2d0<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168b15e>]
io_schedule_timeout+0xae/0x130<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168b1f8>]
io_schedule+0x18/0x20<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8124d9e5>]
wait_on_sync_kiocb+0x35/0x80<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffffa0a36091>]
fuse_direct_IO+0x231/0x380 [fuse]<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff812a6ddd>] ?
cap_inode_need_killpriv+0x2d/0x40<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff812a8cb6>] ?
security_inode_need_killpriv+0x16/0x20<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81219e3f>] ?
dentry_needs_remove_privs.part.13+0x1f/0x30<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81182a2d>]
generic_file_direct_write+0xcd/0x190<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffffa0a36905>]
fuse_file_aio_write+0x185/0x340 [fuse]<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811fdabd>]
do_sync_write+0x8d/0xd0<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811fe32d>]
vfs_write+0xbd/0x1e0<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811ff002>]
SyS_pwrite64+0x92/0xc0<br>
Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81696b09>]
system_call_fastpath+0x16/0x1b<br>
Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42111 blocked
for more than 120 seconds.<br>
Mar 10 10:39:49 2kvm2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
Mar 10 10:39:49 2kvm2 kernel: qemu-img D
ffff8818a76e7e30 0 42111 3632 0x00000080<br>
Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7b30 0000000000000082
ffff88188aaeaf10 ffff8818a76e7fd8<br>
Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7fd8 ffff8818a76e7fd8
ffff88188aaeaf10 ffff88301f156c40<br>
</font><br>
memory<br>
=======<br>
<br>
# cat /proc/meminfo <br>
<font size="-1">MemTotal: 197983472 kB<br>
MemFree: 834228 kB<br>
MemAvailable: 165541204 kB<br>
Buffers: 45548 kB<br>
Cached: 159596272 kB<br>
SwapCached: 119872 kB<br>
Active: 40803264 kB<br>
Inactive: 148022076 kB<br>
Active(anon): 26594112 kB<br>
Inactive(anon): 2626384 kB<br>
Active(file): 14209152 kB<br>
Inactive(file): 145395692 kB<br>
Unevictable: 50488 kB<br>
Mlocked: 50488 kB<br>
SwapTotal: 4194300 kB<br>
SwapFree: 3612188 kB<br>
Dirty: 624 kB<br>
Writeback: 0 kB<br>
AnonPages: 29185032 kB<br>
Mapped: 85176 kB<br>
Shmem: 25908 kB<br>
Slab: 6203384 kB<br>
SReclaimable: 5857240 kB<br>
SUnreclaim: 346144 kB<br>
KernelStack: 19184 kB<br>
PageTables: 86100 kB<br>
NFS_Unstable: 0 kB<br>
Bounce: 0 kB<br>
WritebackTmp: 0 kB<br>
CommitLimit: 103186036 kB<br>
Committed_AS: 52300288 kB<br>
VmallocTotal: 34359738367 kB<br>
VmallocUsed: 1560580 kB<br>
VmallocChunk: 34257341440 kB<br>
HardwareCorrupted: 0 kB<br>
AnonHugePages: 5566464 kB<br>
HugePages_Total: 0<br>
HugePages_Free: 0<br>
HugePages_Rsvd: 0<br>
HugePages_Surp: 0<br>
Hugepagesize: 2048 kB<br>
DirectMap4k: 431292 kB<br>
DirectMap2M: 19382272 kB<br>
DirectMap1G: 183500800 kB</font><br>
<br>
<br>
can anybody help me with this ??<br>
I've got a small tip about swap problem ( in messages), but not
shure .....<br>
The similar problem occured in older versions in gluster/ovirt
testing ( in huge workload freeez - but not fatal overload )<br>
<br>
regards<br>
Paf1<br>
<br>
<br>
</font>
</body>
</html>