[ovirt-users] VDSM hang

paf1 at email.cz paf1 at email.cz
Fri Mar 10 13:11:52 UTC 2017


freez / freezing
IO operations are paused from any reasons
available posibilities are
1) net - any tcp framework collapse
2) gluster interconnect due gluster daemon - process hang ??
3) VSD - pause managed services
4) XFS - RW issues
5) swap overfulled - any processes are killed - but why swap is full if 
max 30% of mem (196 GB )  is used by VMs ? ( unmanaged process forking )

regs


On 03/10/2017 01:56 PM, Nir Soffer wrote:
> On Fri, Mar 10, 2017 at 1:07 PM, paf1 at email.cz <paf1 at email.cz> wrote:
>> Hello everybody,
>>
>> for production usage i'm testing  ovirt with gluster.
>> All components seems to be running fine but whenever I'm testing huge
>> workload, then node freez. Not the main OS, but VDSM mgmt and attached
>> services, VMs eg.
> What do you mean by freez?
>
>> mgmt
>> oVirt - 4.1.0.4
>> centos 7.3-1611
>>
>>
>> nodes ( installed from ovirt image
>> "ovirt-node-ng-installer-ovirt-4.1-2017030804.iso"  )
>>
>> OS Version: == RHEL - 7 - 3.1611.el7.centos
>> OS Description:== oVirt Node 4.1.0
>> Kernel Version:== 3.10.0 - 514.10.2.el7.x86_64
>> KVM Version:== 2.6.0 - 28.el7_3.3.1
>> LIBVIRT Version:== libvirt-2.0.0-10.el7_3.5
>> VDSM Version:== vdsm-4.19.4-1.el7.centos
>> SPICE Version:== 0.12.4 - 20.el7_3
>> GlusterFS Version:== glusterfs-3.8.9-1.el7  ( LVM thinprovisioning in
>> replica 2 - created from ovirt GUI )
>>
>> concurently running
>> - huge import from export domain    ( net workload )
>> - sequential write to VMs local disk ( gluster replica sequential workload )
>> - VMs database huge select  (  random IOps )
>> - huge old snapshot delete  ( random IOps )
>>
>> In this configuration / workload  is  runnig one hour eg, with no exceptions
>> , with 70-80% disk load, but in some point VDSM freez  all jobs for a
>> timeout and VM's are in "uknown" status .
>> The whole system revitalize then automaticaly in cca 20min time frame (
>> except the import and snapshot deleting(rollback) )
>>
>> engine.log  - focus 10:39:07 time  ( Failed in 'HSMGetAllTasksStatusesVDS'
>> method )
>> ========
>>
>> n child command id: 'a8a3a4d5-cf7d-4423-8243-022911232508'
>> type:'RemoveSnapshotSingleDiskLive' to complete
>> 2017-03-10 10:39:01,727+01 INFO
>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]
>> (DefaultQuartzScheduler2) [759c8e1f] Command 'RemoveSnapshotSingleDiskLive'
>> (id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id:
>> '33df2c1e-6ce3-44fd-a39b-d111883b4c4e' type:'DestroyImage' to complete
>> 2017-03-10 10:39:03,929+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,
>> GlusterServersListVDSCommand(HostName = 2kvm1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 446d0cd3
>> 2017-03-10 10:39:04,343+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,
>> GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,
>> 16.0.0.164:CONNECTED], log id: 446d0cd3
>> 2017-03-10 10:39:04,353+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,
>> GlusterVolumesListVDSCommand(HostName = 2kvm1,
>> GlusterVolumesListVDSParameters:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 69ea1fda
>> 2017-03-10 10:39:05,128+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,
>> GlusterVolumesListVDSCommand, return:
>> {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity at 7765e4ad},
>> log id: 69ea1fda
>> 2017-03-10 10:39:07,163+01 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
>> (DefaultQuartzScheduler2) [759c8e1f] Failed in 'HSMGetAllTasksStatusesVDS'
>> method
>> 2017-03-10 10:39:07,178+01 ERROR
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (DefaultQuartzScheduler2) [759c8e1f] EVENT_ID:
>> VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null,
>> Custom Event ID: -1, Message: VDSM 2kvm2 command HSMGetAllTasksStatusesVDS
>> failed: Connection timed out
>> 2017-03-10 10:39:07,182+01 INFO
>> [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2)
>> [759c8e1f] BaseAsyncTask::onTaskEndSuccess: Task
>> 'f594bf69-619b-4d1b-8f6d-a9826997e478' (Parent Command 'ImportVm',
>> Parameters Type
>> 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended
>> successfully.
>> 2017-03-10 10:39:07,182+01 INFO
>> [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (DefaultQuartzScheduler2)
>> [759c8e1f] Task with DB Task ID 'a05c7c07-9b98-4ab2-ac7b-9e70a75ba7b7' and
>> VDSM Task ID '7c60369f-70a3-4a6a-80c9-4753ac9ed372' is in state Polling. End
>> action for command 8deb3fe3-4a83-4605-816c-ffdc63fd9ac1 will proceed when
>> all the entity's tasks are completed.
>> 2017-03-10 10:39:07,182+01 INFO
>> [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2)
>> [759c8e1f] SPMAsyncTask::PollTask: Polling task
>> 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent Command 'DestroyImage',
>> Parameters Type
>> 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned
>> status 'finished', result 'cleanSuccess'.
>> 2017-03-10 10:39:07,182+01 ERROR
>> [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2)
>> [759c8e1f] BaseAsyncTask::logEndTaskFailure: Task
>> 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent Command 'DestroyImage',
>> Parameters Type
>> 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with
>> failure:
>> -- Result: 'cleanSuccess'
>> -- Message: 'VDSGenericException: VDSErrorException: Failed to
>> HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100',
>> -- Exception: 'VDSGenericException: VDSErrorException: Failed to
>> HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100'
>> 2017-03-10 10:39:07,184+01 INFO
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler2)
>> [759c8e1f] CommandAsyncTask::endActionIfNecessary: All tasks of command
>> '33df2c1e-6ce3-44fd-a39b-d111883b4c4e' has ended -> executing 'endAction'
>> 2017-03-10 10:39:07,185+01 INFO
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler2)
>> [759c8e1f] CommandAsyncTask::endAction: Ending action for '1' tasks (command
>> ID: '33df2c1e-6ce3-44fd-a39b-d111883b4c4e'): calling endAction '.
>> 2017-03-10 10:39:07,185+01 INFO
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f]
>> CommandAsyncTask::endCommandAction [within thread] context: Attempting to
>> endAction 'DestroyImage',
>> 2017-03-10 10:39:07,192+01 INFO
>> [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f] Command
>> [id=33df2c1e-6ce3-44fd-a39b-d111883b4c4e]: Updating status to 'FAILED', The
>> command end method logic will be executed by one of its parent commands.
>> 2017-03-10 10:39:07,192+01 INFO
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f]
>> CommandAsyncTask::HandleEndActionResult [within thread]: endAction for
>> action type 'DestroyImage' completed, handling the result.
>> 2017-03-10 10:39:07,192+01 INFO
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f]
>> CommandAsyncTask::HandleEndActionResult [within thread]: endAction for
>> action type 'DestroyImage' succeeded, clearing tasks.
>> 2017-03-10 10:39:07,192+01 INFO
>> [org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f] SPMAsyncTask::ClearAsyncTask:
>> Attempting to clear task 'f351e8f6-6dd7-49aa-bf54-650d84fc6352'
>> 2017-03-10 10:39:07,193+01 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f] START,
>> SPMClearTaskVDSCommand(
>> SPMTaskGuidBaseVDSCommandParameters:{runAsync='true',
>> storagePoolId='00000001-0001-0001-0001-000000000311',
>> ignoreFailoverLimit='false',
>> taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id: 2b7080c2
>> 2017-03-10 10:39:07,194+01 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f] START,
>> HSMClearTaskVDSCommand(HostName = 2kvm2,
>> HSMTaskGuidBaseVDSCommandParameters:{runAsync='true',
>> hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8',
>> taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id: 2edff460
>> 2017-03-10 10:39:08,208+01 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH,
>> HSMClearTaskVDSCommand, log id: 2edff460
>> 2017-03-10 10:39:08,208+01 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH,
>> SPMClearTaskVDSCommand, log id: 2b7080c2
>> 2017-03-10 10:39:08,213+01 INFO
>> [org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f]
>> BaseAsyncTask::removeTaskFromDB: Removed task
>> 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' from DataBase
>> 2017-03-10 10:39:08,213+01 INFO
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>> (org.ovirt.thread.pool-6-thread-31) [759c8e1f]
>> CommandAsyncTask::HandleEndActionResult [within thread]: Removing
>> CommandMultiAsyncTasks object for entity
>> '33df2c1e-6ce3-44fd-a39b-d111883b4c4e'
>> 2017-03-10 10:39:10,142+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START,
>> GlusterServersListVDSCommand(HostName = 2kvm1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2e7278cb
>> 2017-03-10 10:39:11,513+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH,
>> GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,
>> 16.0.0.164:CONNECTED], log id: 2e7278cb
>> 2017-03-10 10:39:11,523+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START,
>> GlusterVolumesListVDSCommand(HostName = 2kvm1,
>> GlusterVolumesListVDSParameters:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 43704ef2
>> 2017-03-10 10:39:11,777+01 INFO
>> [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
>> (DefaultQuartzScheduler9) [67e1d8ed] Command 'RemoveSnapshot' (id:
>> '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:
>> 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'
>> to complete
>> 2017-03-10 10:39:11,789+01 WARN
>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
>> (DefaultQuartzScheduler9) [759c8e1f] Child command 'DESTROY_IMAGE' failed,
>> proceeding to verify
>> 2017-03-10 10:39:11,789+01 INFO
>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
>> (DefaultQuartzScheduler9) [759c8e1f] Executing Live Merge command step
>> 'DESTROY_IMAGE_CHECK'
>> 2017-03-10 10:39:11,832+01 INFO
>> [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-7)
>> [4856f570] Running command: DestroyImageCheckCommand internal: true.
>> 2017-03-10 10:39:11,833+01 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand]
>> (pool-5-thread-7) [4856f570] START, SPMGetVolumeInfoVDSCommand(
>> SPMGetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',
>> runAsync='true', storagePoolId='00000001-0001-0001-0001-000000000311',
>> ignoreFailoverLimit='false',
>> storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e',
>> imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428',
>> imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id: 2c8031f8
>> 2017-03-10 10:39:11,833+01 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand]
>> (pool-5-thread-7) [4856f570] Executing GetVolumeInfo using the current SPM
>> 2017-03-10 10:39:11,834+01 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand]
>> (pool-5-thread-7) [4856f570] START, GetVolumeInfoVDSCommand(HostName =
>> 2kvm2,
>> GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',
>> runAsync='true', hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8',
>> storagePoolId='00000001-0001-0001-0001-000000000311',
>> storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e',
>> imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428',
>> imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id: 79ca86cc
>> 2017-03-10 10:39:11,846+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH,
>> GlusterVolumesListVDSCommand, return:
>> {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity at 7765e4ad},
>> log id: 43704ef2
>> 2017-03-10 10:39:16,858+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START,
>> GlusterServersListVDSCommand(HostName = 2kvm1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 6542adcd
>> 2017-03-10 10:39:17,394+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH,
>> GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,
>> 16.0.0.164:CONNECTED], log id: 6542adcd
>> 2017-03-10 10:39:17,406+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START,
>> GlusterVolumesListVDSCommand(HostName = 2kvm1,
>> GlusterVolumesListVDSParameters:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 44ec33ed
>> 2017-03-10 10:39:18,598+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH,
>> GlusterVolumesListVDSCommand, return:
>> {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity at 7765e4ad},
>> log id: 44ec33ed
>> 2017-03-10 10:39:21,865+01 INFO
>> [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
>> (DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id:
>> '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:
>> 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'
>> to complete
>> 2017-03-10 10:39:21,881+01 INFO
>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]
>> (DefaultQuartzScheduler6) [4856f570] Command 'RemoveSnapshotSingleDiskLive'
>> (id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id:
>> 'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc' type:'DestroyImageCheck' to complete
>> 2017-03-10 10:39:23,611+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler6) [4856f570] START,
>> GlusterServersListVDSCommand(HostName = 2kvm1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 4c2fc22d
>> 2017-03-10 10:39:24,616+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand]
>> (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START,
>> GlusterTasksListVDSCommand(HostName = 2kvm1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1f169371
>> 2017-03-10 10:39:24,618+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler6) [4856f570] FINISH, GlusterServersListVDSCommand,
>> return: [172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id: 4c2fc22d
>> 2017-03-10 10:39:24,629+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler6) [4856f570] START,
>> GlusterVolumesListVDSCommand(HostName = 2kvm1,
>> GlusterVolumesListVDSParameters:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2ac55735
>> 2017-03-10 10:39:24,822+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand]
>> (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH,
>> GlusterTasksListVDSCommand, return: [], log id: 1f169371
>> 2017-03-10 10:39:26,836+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler6) [4856f570] FINISH, GlusterVolumesListVDSCommand,
>> return:
>> {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity at 7765e4ad},
>> log id: 2ac55735
>> 2017-03-10 10:39:31,849+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,
>> GlusterServersListVDSCommand(HostName = 2kvm1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2e8dbcd1
>> 2017-03-10 10:39:31,932+01 INFO
>> [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
>> (DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id:
>> '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:
>> 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'
>> to complete
>> 2017-03-10 10:39:31,944+01 INFO
>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback]
>> (DefaultQuartzScheduler6) [4856f570] Command 'RemoveSnapshotSingleDiskLive'
>> (id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id:
>> 'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc' type:'DestroyImageCheck' to complete
>> 2017-03-10 10:39:33,213+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,
>> GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,
>> 16.0.0.164:CONNECTED], log id: 2e8dbcd1
>> 2017-03-10 10:39:33,226+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] START,
>> GlusterVolumesListVDSCommand(HostName = 2kvm1,
>> GlusterVolumesListVDSParameters:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1fb3f9e3
>> 2017-03-10 10:39:34,375+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH,
>> GlusterVolumesListVDSCommand, return:
>> {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity at 7765e4ad},
>> log id: 1fb3f9e3
>> 2017-03-10 10:39:39,392+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] START,
>> GlusterServersListVDSCommand(HostName = 2kvm1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1e0b8eeb
>> 2017-03-10 10:39:40,753+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
>> (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] FINISH,
>> GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED,
>> 16.0.0.164:CONNECTED], log id: 1e0b8eeb
>> 2017-03-10 10:39:40,763+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] START,
>> GlusterVolumesListVDSCommand(HostName = 2kvm1,
>> GlusterVolumesListVDSParameters:{runAsync='true',
>> hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 35b04b33
>> 2017-03-10 10:39:41,952+01 INFO
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
>> (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] FINISH,
>> GlusterVolumesListVDSCommand, return:
>> {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity at 7765e4ad},
>> log id: 35b04b33
>> 2017-03-10 10:39:41,991+01 INFO
>> [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
>> (DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id:
>> '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id:
>> 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive'
>> to complete
>>
>>
>> gluster  ( nothing in logs )
>> ======
>>
>>
>> ## "etc-glusterfs-glusterd.vol.log"
>> [2017-03-10 10:13:52.599019] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:16:48.639635] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> The message "I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1" repeated 3 times between
>> [2017-03-10 10:16:48.639635] and [2017-03-10 10:17:55.659379]
>> [2017-03-10 10:18:56.875516] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:19:57.204689] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:21:56.576879] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:21:57.772857] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:24:00.617931] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:30:04.918080] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:31:06.128638] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:32:07.325672] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:32:12.433586] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:32:13.544909] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:35:10.039213] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:37:19.905314] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:37:20.174209] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:38:12.635460] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>> [2017-03-10 10:40:14.169864] I [MSGID: 106499]
>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume slow1
>>
>>
>> ## "rhev-data-center-mnt-glusterSD-localhost:_slow1.log"
>> [2017-03-10 09:43:40.346785] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61849:
>> dentry not found in 43e6968f-9c2a-40d8-8074-caf1a36f60cf
>> [2017-03-10 09:43:40.347076] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61879:
>> dentry not found in 902a6e3d-b7aa-439f-8262-cdc1b7f9f022
>> [2017-03-10 09:43:40.347145] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61935:
>> dentry not found in 846bbcfc-f2b3-4ab6-af44-aeaa10b39318
>> [2017-03-10 09:43:40.347211] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61922:
>> dentry not found in 66ad3bc5-26c7-4360-b33b-a084e3305cf8
>> [2017-03-10 09:43:40.351571] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61834:
>> dentry not found in 3b8278e1-40e5-4363-b21e-7bffcd024c62
>> [2017-03-10 09:43:40.352449] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61870:
>> dentry not found in 282f4c05-e09a-48e0-96a3-52e079ff2f73
>> [2017-03-10 09:50:38.829325] I [MSGID: 109066]
>> [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta.new
>> (hash=slow1-replicate-0/cache=slow1-replicate-0) =>
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta
>> (hash=slow1-replicate-0/cache=slow1-replicate-0)
>> [2017-03-10 09:50:42.221775] I [MSGID: 109066]
>> [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta.new
>> (hash=slow1-replicate-0/cache=slow1-replicate-0) =>
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta
>> (hash=slow1-replicate-0/cache=slow1-replicate-0)
>> [2017-03-10 09:50:45.956432] I [MSGID: 109066]
>> [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/3cef54b4-45b9-4f5b-82c2-fcc8def06a37/85287865-38f0-45df-9e6c-1294913cbb88.meta.new
>> (hash=slow1-replicate-0/cache=slow1-replicate-0) =>
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/3cef54b4-45b9-4f5b-82c2-fcc8def06a37/85287865-38f0-45df-9e6c-1294913cbb88.meta
>> (hash=slow1-replicate-0/cache=slow1-replicate-0)
>> [2017-03-10 09:50:40.349563] I [MSGID: 109066]
>> [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta.new
>> (hash=slow1-replicate-0/cache=slow1-replicate-0) =>
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta
>> (hash=slow1-replicate-0/cache=slow1-replicate-0)
>> [2017-03-10 09:50:44.503866] I [MSGID: 109066]
>> [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta.new
>> (hash=slow1-replicate-0/cache=slow1-replicate-0) =>
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta
>> (hash=slow1-replicate-0/cache=slow1-replicate-0)
>> [2017-03-10 09:59:46.860762] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.6684:
>> dentry not found in d1e65eea-8758-4407-ac2e-3605dc661364
>> [2017-03-10 10:02:22.500865] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.8767:
>> dentry not found in e228bb28-9602-4f8e-8323-7434d77849fc
>> [2017-03-10 10:04:03.103839] W [MSGID: 101159] [inode.c:1214:__inode_unlink]
>> 0-inode:
>> be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.9787:
>> dentry not found in 6be71632-aa36-4975-b673-1357e0355027
>> [2017-03-10 10:06:02.406385] I [MSGID: 109066]
>> [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta.new
>> (hash=slow1-replicate-0/cache=slow1-replicate-0) =>
>> /1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta
>> (hash=slow1-replicate-0/cache=slow1-replicate-0)
>> ... no other record
>>
>>
>> messages
>> ========
>>
>> several times occured:
>>
>> Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
>> Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
>> Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
>> Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd.
>> Mar 10 09:10:01 2kvm2 systemd: Started Session 274 of user root.
>> Mar 10 09:10:01 2kvm2 systemd: Starting Session 274 of user root.
>> Mar 10 09:20:02 2kvm2 systemd: Started Session 275 of user root.
>> Mar 10 09:20:02 2kvm2 systemd: Starting Session 275 of user root.
>> Mar 10 09:22:59 2kvm2 sanlock[1673]: 2017-03-10 09:22:59+0100 136031 [2576]:
>> s3 delta_renew long write time 11 sec
> Sanlock cannot write to storage
>
>> Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2,
>> mode:0x104020
> Kernel cannot allocate page?
>
>> Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G
>> I    ------------   3.10.0-514.10.2.el7.x86_64 #1
>> Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+,
>> BIOS 1.0a 08/29/2014
>> Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9
>> ffff88301f4839d8 ffffffff816864ef
>> Mar 10 09:24:03 2kvm2 kernel: ffff88301f483a68 ffffffff81186ba0
>> 000068fc00000000 0000000000000000
>> Mar 10 09:24:03 2kvm2 kernel: fffffffffffffffc 0010402000000000
>> ffff88301567ae80 00000000f7228dc9
>> Mar 10 09:24:03 2kvm2 kernel: Call Trace:
>> Mar 10 09:24:03 2kvm2 kernel: <IRQ>  [<ffffffff816864ef>]
>> dump_stack+0x19/0x1b
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81186ba0>]
>> warn_alloc_failed+0x110/0x180
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81682083>]
>> __alloc_pages_slowpath+0x6b7/0x725
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8118b155>]
>> __alloc_pages_nodemask+0x405/0x420
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811cf30a>]
>> alloc_pages_current+0xaa/0x170
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81185a7e>] __get_free_pages+0xe/0x50
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811dabae>]
>> kmalloc_order_trace+0x2e/0xa0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811dd381>] __kmalloc+0x221/0x240
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02f83fa>]
>> bnx2x_frag_alloc.isra.62+0x2a/0x40 [bnx2x]
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02f92f7>] bnx2x_rx_int+0x227/0x17b0
>> [bnx2x]
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81033669>] ? sched_clock+0x9/0x10
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02fc72d>] bnx2x_poll+0x1dd/0x260
>> [bnx2x]
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff815705e0>] net_rx_action+0x170/0x380
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8108f2cf>] __do_softirq+0xef/0x280
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8169859c>] call_softirq+0x1c/0x30
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8102d365>] do_softirq+0x65/0xa0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8108f665>] irq_exit+0x115/0x120
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81699138>] do_IRQ+0x58/0xf0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8168e2ad>]
>> common_interrupt+0x6d/0x6d
>> Mar 10 09:24:03 2kvm2 kernel: <EOI>  [<ffffffff81189a73>] ?
>> free_hot_cold_page+0x103/0x160
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81189b16>]
>> free_hot_cold_page_list+0x46/0xa0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81195193>]
>> shrink_page_list+0x543/0xb00
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81195dda>]
>> shrink_inactive_list+0x1fa/0x630
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81196975>] shrink_lruvec+0x385/0x770
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81196dd6>] shrink_zone+0x76/0x1a0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8119807c>] balance_pgdat+0x48c/0x5e0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81198343>] kswapd+0x173/0x450
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b17d0>] ?
>> wake_up_atomic_t+0x30/0x30
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811981d0>] ?
>> balance_pgdat+0x5e0/0x5e0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b06ff>] kthread+0xcf/0xe0
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b0630>] ?
>> kthread_create_on_node+0x140/0x140
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81696a58>] ret_from_fork+0x58/0x90
>> Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b0630>] ?
>> kthread_create_on_node+0x140/0x140
>> Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2,
>> mode:0x104020
>> Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G
>> I    ------------   3.10.0-514.10.2.el7.x86_64 #1
>> Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+,
>> BIOS 1.0a 08/29/2014
>> Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9
>> ffff88301f4839d8 ffffffff816864ef
>>
>>
>> and in critical time again
>>
>> Mar 10 10:37:53 2kvm2 sanlock[1673]: 2017-03-10 10:37:53+0100 140524 [1673]:
>> s3 check_our_lease warning 73 last_success 140451
>> Mar 10 10:37:54 2kvm2 sanlock[1673]: 2017-03-10 10:37:54+0100 140525 [1673]:
>> s3 check_our_lease warning 74 last_success 140451
> Sanlock could not renew the lease for 74 seconds
>
>> Mar 10 10:37:54 2kvm2 wdmd[1732]: test warning now 140526 ping 140516 close
>> 0 renewal 140451 expire 140531 client 1673
>> sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1
>> Mar 10 10:37:54 2kvm2 kernel: watchdog watchdog0: watchdog did not stop!
>> Mar 10 10:37:54 2kvm2 wdmd[1732]: /dev/watchdog0 closed unclean
>> Mar 10 10:37:55 2kvm2 sanlock[1673]: 2017-03-10 10:37:55+0100 140526 [1673]:
>> s3 check_our_lease warning 75 last_success 140451
>> Mar 10 10:37:55 2kvm2 wdmd[1732]: test warning now 140527 ping 140516 close
>> 140526 renewal 140451 expire 140531 client 1673
>> sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1
>> Mar 10 10:37:56 2kvm2 sanlock[1673]: 2017-03-10 10:37:56+0100 140527 [1673]:
>> s3 check_our_lease warning 76 last_success 140451
>> Mar 10 10:37:56 2kvm2 wdmd[1732]: test warning now 140528 ping 140516 close
>> 140526 renewal 140451 expire 140531 client 1673
>> sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1
>> Mar 10 10:37:57 2kvm2 sanlock[1673]: 2017-03-10 10:37:57+0100 140528 [1673]:
>> s3 check_our_lease warning 77 last_success 140451
>> Mar 10 10:37:57 2kvm2 wdmd[1732]: test warning now 140529 ping 140516 close
>> 140526 renewal 140451 expire 140531 client 1673
>> sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1
>> Mar 10 10:37:58 2kvm2 sanlock[1673]: 2017-03-10 10:37:58+0100 140529 [1673]:
>> s3 check_our_lease warning 78 last_success 140451
>> Mar 10 10:37:58 2kvm2 wdmd[1732]: test warning now 140530 ping 140516 close
>> 140526 renewal 140451 expire 140531 client 1673
>> sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1
>> Mar 10 10:37:59 2kvm2 sanlock[1673]: 2017-03-10 10:37:59+0100 140530 [1673]:
>> s3 check_our_lease warning 79 last_success 140451
>> Mar 10 10:37:59 2kvm2 wdmd[1732]: test failed rem 55 now 140531 ping 140516
>> close 140526 renewal 140451 expire 140531 client 1673
>> sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1
>> Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]:
>> s3 check_our_lease failed 80
> Sanlock fail to renew the lease after 80 seconds - game over
>
>> Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]:
>> s3 all pids clear
> If this host is the SPM, sanlock just killed vdsm, this explains why
> your storage operation fail.
>
>> Mar 10 10:38:01 2kvm2 wdmd[1732]: /dev/watchdog0 reopen
>> Mar 10 10:38:10 2kvm2 journal: Cannot start job (query, none) for domain
>> TEST-LBS_EBSAPP; current job is (query, none) owned by (3284
>> remoteDispatchConnectGetAllDomainStats, 0 <null>) for (62s, 0s)
>> Mar 10 10:38:10 2kvm2 journal: Timed out during operation: cannot acquire
>> state change lock (held by remoteDispatchConnectGetAllDomainStats)
>> Mar 10 10:38:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
>> from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40590, 0, 0) at
>> 0x3acdd88>: unexpected eof
>> Mar 10 10:38:40 2kvm2 journal: Cannot start job (query, none) for domain
>> TEST1-LBS_ATRYA; current job is (query, none) owned by (3288
>> remoteDispatchConnectGetAllDomainStats, 0 <null>) for (47s, 0s)
>> Mar 10 10:38:40 2kvm2 journal: Timed out during operation: cannot acquire
>> state change lock (held by remoteDispatchConnectGetAllDomainStats)
>> Mar 10 10:38:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
>> from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40592, 0, 0) at
>> 0x3fd5b90>: unexpected eof
>> Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain
>> TEST-LBS_EBSAPP; current job is (query, none) owned by (3284
>> remoteDispatchConnectGetAllDomainStats, 0 <null>) for (122s, 0s)
>> Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire
>> state change lock (held by remoteDispatchConnectGetAllDomainStats)
>> Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain
>> TEST1-LBS_ATRYA; current job is (query, none) owned by (3288
>> remoteDispatchConnectGetAllDomainStats, 0 <null>) for (77s, 0s)
>> Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire
>> state change lock (held by remoteDispatchConnectGetAllDomainStats)
>> Mar 10 10:39:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
>> from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40594, 0, 0) at
>> 0x2447290>: unexpected eof
>> Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]:
>> s3 delta_renew write time 140 error -202
>> Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]:
>> s3 renewal error -202 delta_length 144 last_success 140451
>> Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain
>> TEST-LBS_EBSAPP; current job is (query, none) owned by (3284
>> remoteDispatchConnectGetAllDomainStats, 0 <null>) for (152s, 0s)
>> Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire
>> state change lock (held by remoteDispatchConnectGetAllDomainStats)
>> Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain
>> TEST1-LBS_ATRYA; current job is (query, none) owned by (3288
>> remoteDispatchConnectGetAllDomainStats, 0 <null>) for (107s, 0s)
>> Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire
>> state change lock (held by remoteDispatchConnectGetAllDomainStats)
>> Mar 10 10:39:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving
>> from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40596, 0, 0) at
>> 0x2472ef0>: unexpected eof
>> Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42107 blocked for more
>> than 120 seconds.
> qemu-img is blocked for more than 120 seconds.
>
>> Mar 10 10:39:49 2kvm2 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar 10 10:39:49 2kvm2 kernel: qemu-img        D ffff88010dad3e30     0 42107
>> 3631 0x00000080
>> Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3b30 0000000000000082
>> ffff8814491f4e70 ffff88010dad3fd8
>> Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3fd8 ffff88010dad3fd8
>> ffff8814491f4e70 ffff88301f096c40
>> Mar 10 10:39:49 2kvm2 kernel: 0000000000000000 7fffffffffffffff
>> ffff88181f186c00 ffff88010dad3e30
>> Mar 10 10:39:49 2kvm2 kernel: Call Trace:
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168bbb9>] schedule+0x29/0x70
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81689609>]
>> schedule_timeout+0x239/0x2d0
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168b15e>]
>> io_schedule_timeout+0xae/0x130
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168b1f8>] io_schedule+0x18/0x20
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8124d9e5>]
>> wait_on_sync_kiocb+0x35/0x80
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffffa0a36091>]
>> fuse_direct_IO+0x231/0x380 [fuse]
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff812a6ddd>] ?
>> cap_inode_need_killpriv+0x2d/0x40
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff812a8cb6>] ?
>> security_inode_need_killpriv+0x16/0x20
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81219e3f>] ?
>> dentry_needs_remove_privs.part.13+0x1f/0x30
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81182a2d>]
>> generic_file_direct_write+0xcd/0x190
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffffa0a36905>]
>> fuse_file_aio_write+0x185/0x340 [fuse]
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811fdabd>] do_sync_write+0x8d/0xd0
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811fe32d>] vfs_write+0xbd/0x1e0
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811ff002>] SyS_pwrite64+0x92/0xc0
>> Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81696b09>]
>> system_call_fastpath+0x16/0x1b
>> Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42111 blocked for more
>> than 120 seconds.
>> Mar 10 10:39:49 2kvm2 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar 10 10:39:49 2kvm2 kernel: qemu-img        D ffff8818a76e7e30     0 42111
>> 3632 0x00000080
>> Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7b30 0000000000000082
>> ffff88188aaeaf10 ffff8818a76e7fd8
>> Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7fd8 ffff8818a76e7fd8
>> ffff88188aaeaf10 ffff88301f156c40
>>
>> memory
>> =======
>>
>> # cat /proc/meminfo
>> MemTotal:       197983472 kB
>> MemFree:          834228 kB
>> MemAvailable:   165541204 kB
>> Buffers:           45548 kB
>> Cached:         159596272 kB
>> SwapCached:       119872 kB
>> Active:         40803264 kB
>> Inactive:       148022076 kB
>> Active(anon):   26594112 kB
>> Inactive(anon):  2626384 kB
>> Active(file):   14209152 kB
>> Inactive(file): 145395692 kB
>> Unevictable:       50488 kB
>> Mlocked:           50488 kB
>> SwapTotal:       4194300 kB
>> SwapFree:        3612188 kB
>> Dirty:               624 kB
>> Writeback:             0 kB
>> AnonPages:      29185032 kB
>> Mapped:            85176 kB
>> Shmem:             25908 kB
>> Slab:            6203384 kB
>> SReclaimable:    5857240 kB
>> SUnreclaim:       346144 kB
>> KernelStack:       19184 kB
>> PageTables:        86100 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    103186036 kB
>> Committed_AS:   52300288 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:     1560580 kB
>> VmallocChunk:   34257341440 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:   5566464 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> DirectMap4k:      431292 kB
>> DirectMap2M:    19382272 kB
>> DirectMap1G:    183500800 kB
>>
>>
>> can anybody help me with this ??
>> I've got a small tip about swap problem ( in messages), but not shure .....
>> The similar problem occured in older versions in gluster/ovirt testing ( in
>> huge workload freeez - but not fatal overload )
> You have a storage issue, you should understand why
> your storage is failing.
>
> There is also kernel failure to allocated page, maybe this is
> related to the storage failure?
>
> Nir

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170310/753a4302/attachment-0001.html>


More information about the Users mailing list