error: "cannot set lock, no free lockspace" (localized)

Hi all. I have a HCI setup, glusterfs 3.12, ovirt 4.2.7, 4 nodes Yesterday I see 3 VMs detected by engine as "not responding" (it is marked as HA VMs) (it all located on ovirtnode1 server) Two of them are restarted by engine on other nodes successfully, but one are not. All get LOCALIZED message like 'cannot set lock: no free space on device' - what is this ? Why engine get that errors, and why some VMs can restart automatically, some not (but successfully restarted bu user via webui after some pause?) Who knows, what are you think? Full engine logs may be uploaded. from engine.log: start event --------------- 2019-02-26 17:04:05,308+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'Up' --> 'NotResponding' 2019-02-26 17:04:05,865+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126), VM openfire.miac is not responding. 2019-02-26 17:04:05,865+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM '7a3af2e7-8296-4fe0-ac55-c52a4b1de93f'(e-l-k.miac) moved from 'Up' --> 'NotResponding' 2019-02-26 17:04:05,894+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126), VM e-l-k.miac is not responding. 2019-02-26 17:04:05,895+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'Up' --> 'NotResponding' 2019-02-26 17:04:05,926+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126), VM tsgr-mon is not responding. --------------------------- 2019-02-26 17:04:22,237+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: VM has been terminated on the host. .... 2019-02-26 17:04:22,374+04 INFO [org.ovirt.engine.core.bll.VdsEventListener] (ForkJoinPool-1-worker-9) [] Highly Available VM went down. Attempting to restart. VM Name 'openfire.miac', VM Id 'd546add1-126a-4490-bc83-469bab659854' ... 2019-02-26 17:04:27,737+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host. ... 2019-02-26 17:04:28,350+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886073) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host ovirtnode6.miac 2019-02-26 17:04:31,841+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host. 2019-02-26 17:04:31,877+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host ovirtnode6.miac ... 2019-02-26 17:04:31,994+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host ovirtnode1.miac ..... 2019-02-26 17:04:36,054+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: Не удалось установить блокировку: На устройстве не осталось свободного места. 2019-02-26 17:04:36,054+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [] add VM 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) to rerun treatment 2019-02-26 17:04:36,091+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886083) [] EVENT_ID: VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host ovirtnode1.miac ------------------- No more attemtps for this VM were made (state now is 'Down') Engine tried restart this VM on some other nodes, get ' Lease is held by another host' (it is normal, because timeout for lock not expired?) and then got (LOCALIZED MESSAGE ?? Why it is localized while all other are not?) 'cannot set lock: no free space on device' Which 'device' it mean? How to see how many 'free space' in it? gluster volumes almost empty: ovirtstor1.miac:/engine 150G 7,5G 143G 5% /rhev/data-center/mnt/glusterSD/ovirtstor1.miac:_engine ovirtnode1.miac:/data 5,9T 42G 5,9T 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_data ovirtnode1.miac:/vmstore 4,4T 194G 4,2T 5% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_vmstore ovirtnode1.miac:/iso_storage 600G 1,3G 599G 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_iso__storage ovirtnode1.miac:/export 5,1T 14G 5,1T 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_export Other VMs also get this error about free space: ------------- 2019-02-26 17:05:05,142+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-11) [] EVENT_ID: VM_DOWN_ERROR(119), VM tsgr-mon is down with error. Exit message: Не удалось установить блокир овку: На устройстве не осталось свободного места. 2019-02-26 17:05:43,526+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-11) [] VM 'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'PoweringUp' --> 'Up' 2019-02-26 17:05:43,537+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-11) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM tsgr-mon on Host ovirtnode6.miac ---------------- And it is last message about this VM, state now is 'Up' Therefore all VMs are migrated from this node ovirtnode1, and later in log are records about it unresponsiveness: 2019-02-26 17:07:43,584+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler9) [35e05907] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirtnode1.miac command GetGlusterVolumeHealIn foVDS failed: Message timeout which can be caused by communication issues 2019-02-26 17:07:43,592+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886194) [35e05907] EVENT_ID: VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Host ovirtnode1.miac i s not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. 2019-02-26 17:07:49,684+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_STARTED(9,020), Executing power management status on Host ovirtnode1.miac using Proxy Host ovirtnode5.miac and Fence Agent ipmilan:172.16.16.1. 2019-02-26 17:07:49,685+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] START, FenceVdsVDSCommand(HostName = ovirtnode5.miac, ... action='STATUS', agent='FenceAgent:{id='338cbdd9-aab6-484c-a906-21e747a96a84', ... type='ipmilan', ip='172.16.16.1', port='null', user='admin', password='***', encryptOptions='false', options=''}', policy='null'}), log id: 281e6c2019-02-26 17:07:49,942+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='SUCCESS', powerStatus='ON', message=''}, log id: 281e6c 2019-02-26 17:07:56,131+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [317316e4] EVENT_ID: SYNC_STORAGE_DEVICES_IN_HOST(4,126), Manually synced the storage devices from host ovirtnode1.miac 2019-02-26 17:07:56,550+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: VDS_DETECTED(13), Status of host ovirtnode1.miac was set to Up. 2019-02-26 17:07:56,554+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: VDS_FENCE_STATUS(496), Host ovirtnode1.miac power managementwas verified successfully. ------------- After all, now I try to start VM that not restarted automatically: ------------- 2019-02-27 11:23:30,139+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2921212) [8e405cb9-3a3a-42ab-ae05-5aaf40194b30] EVENT_ID: USER_STARTED_VM(153), VM openfire.miac was started by admin@internal-authz (Host: ovirtnode1.miac). ..... 2019-02-27 11:24:12,489+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'PoweringUp' --> 'Up' 2019-02-27 11:24:12,502+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] EVENT_ID: USER_RUN_VM(32), VM openfire.miac started on Host ovirtnode1.miac --------------- It started without problems on same host ovirtnode1. Strange. -- Mike

Any errors in vdsm.log or gluster mount log for this volume? On Wed, Feb 27, 2019 at 1:07 PM Mike Lykov <combr@ya.ru> wrote:
Hi all. I have a HCI setup, glusterfs 3.12, ovirt 4.2.7, 4 nodes
Yesterday I see 3 VMs detected by engine as "not responding" (it is marked as HA VMs) (it all located on ovirtnode1 server) Two of them are restarted by engine on other nodes successfully, but one are not. All get LOCALIZED message like 'cannot set lock: no free space on device' - what is this ? Why engine get that errors, and why some VMs can restart automatically, some not (but successfully restarted bu user via webui after some pause?) Who knows, what are you think? Full engine logs may be uploaded.
from engine.log: start event --------------- 2019-02-26 17:04:05,308+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'Up' --> 'NotResponding' 2019-02-26 17:04:05,865+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126), VM openfire.miac is not responding. 2019-02-26 17:04:05,865+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM '7a3af2e7-8296-4fe0-ac55-c52a4b1de93f'(e-l-k.miac) moved from 'Up' --> 'NotResponding' 2019-02-26 17:04:05,894+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126), VM e-l-k.miac is not responding. 2019-02-26 17:04:05,895+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'Up' --> 'NotResponding' 2019-02-26 17:04:05,926+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126), VM tsgr-mon is not responding. --------------------------- 2019-02-26 17:04:22,237+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: VM has been terminated on the host. .... 2019-02-26 17:04:22,374+04 INFO [org.ovirt.engine.core.bll.VdsEventListener] (ForkJoinPool-1-worker-9) [] Highly Available VM went down. Attempting to restart. VM Name 'openfire.miac', VM Id 'd546add1-126a-4490-bc83-469bab659854' ... 2019-02-26 17:04:27,737+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host. ... 2019-02-26 17:04:28,350+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886073) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host ovirtnode6.miac 2019-02-26 17:04:31,841+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host. 2019-02-26 17:04:31,877+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host ovirtnode6.miac ... 2019-02-26 17:04:31,994+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host ovirtnode1.miac ..... 2019-02-26 17:04:36,054+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: Не удалось установить блокировку: На устройстве не осталось свободного места. 2019-02-26 17:04:36,054+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [] add VM 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) to rerun treatment 2019-02-26 17:04:36,091+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886083) [] EVENT_ID: VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host ovirtnode1.miac ------------------- No more attemtps for this VM were made (state now is 'Down')
Engine tried restart this VM on some other nodes, get ' Lease is held by another host' (it is normal, because timeout for lock not expired?) and then got (LOCALIZED MESSAGE ?? Why it is localized while all other are not?) 'cannot set lock: no free space on device'
Which 'device' it mean? How to see how many 'free space' in it? gluster volumes almost empty: ovirtstor1.miac:/engine 150G 7,5G 143G 5% /rhev/data-center/mnt/glusterSD/ovirtstor1.miac:_engine ovirtnode1.miac:/data 5,9T 42G 5,9T 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_data ovirtnode1.miac:/vmstore 4,4T 194G 4,2T 5% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_vmstore ovirtnode1.miac:/iso_storage 600G 1,3G 599G 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_iso__storage ovirtnode1.miac:/export 5,1T 14G 5,1T 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_export
Other VMs also get this error about free space: ------------- 2019-02-26 17:05:05,142+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-11) [] EVENT_ID: VM_DOWN_ERROR(119), VM tsgr-mon is down with error. Exit message: Не удалось установить блокир овку: На устройстве не осталось свободного места.
2019-02-26 17:05:43,526+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-11) [] VM 'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'PoweringUp' --> 'Up' 2019-02-26 17:05:43,537+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-11) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM tsgr-mon on Host ovirtnode6.miac ---------------- And it is last message about this VM, state now is 'Up'
Therefore all VMs are migrated from this node ovirtnode1, and later in log are records about it unresponsiveness:
2019-02-26 17:07:43,584+04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler9) [35e05907] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirtnode1.miac command GetGlusterVolumeHealIn foVDS failed: Message timeout which can be caused by communication issues 2019-02-26 17:07:43,592+04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2886194) [35e05907] EVENT_ID: VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Host ovirtnode1.miac i s not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. 2019-02-26 17:07:49,684+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_STARTED(9,020), Executing power management status on Host ovirtnode1.miac using Proxy Host ovirtnode5.miac and Fence Agent ipmilan:172.16.16.1. 2019-02-26 17:07:49,685+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] START, FenceVdsVDSCommand(HostName = ovirtnode5.miac, ... action='STATUS', agent='FenceAgent:{id='338cbdd9-aab6-484c-a906-21e747a96a84', ... type='ipmilan', ip='172.16.16.1', port='null', user='admin', password='***', encryptOptions='false', options=''}', policy='null'}), log id: 281e6c2019-02-26 17:07:49,942+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='SUCCESS', powerStatus='ON', message=''}, log id: 281e6c
2019-02-26 17:07:56,131+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [317316e4] EVENT_ID: SYNC_STORAGE_DEVICES_IN_HOST(4,126), Manually synced the storage devices from host ovirtnode1.miac 2019-02-26 17:07:56,550+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: VDS_DETECTED(13), Status of host ovirtnode1.miac was set to Up. 2019-02-26 17:07:56,554+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: VDS_FENCE_STATUS(496), Host ovirtnode1.miac power managementwas verified successfully. -------------
After all, now I try to start VM that not restarted automatically: ------------- 2019-02-27 11:23:30,139+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2921212) [8e405cb9-3a3a-42ab-ae05-5aaf40194b30] EVENT_ID: USER_STARTED_VM(153), VM openfire.miac was started by admin@internal-authz (Host: ovirtnode1.miac). ..... 2019-02-27 11:24:12,489+04 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'PoweringUp' --> 'Up' 2019-02-27 11:24:12,502+04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] EVENT_ID: USER_RUN_VM(32), VM openfire.miac started on Host ovirtnode1.miac ---------------
It started without problems on same host ovirtnode1. Strange.
-- Mike _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XHC5MDVYLIHKZR...

01.03.2019 9:51, Sahina Bose пишет:
Any errors in vdsm.log or gluster mount log for this volume?
I cannot find any. Here is full logs from one node for that period: https://yadi.sk/d/BzLBb8VGNEwidw file name ovirtnode1-logs-260219.tar.gz gluster, vdsm logs for all volumes sanlock client status now (can it contain any useful info for "cannot set lock" error?): node without any VMs [root@ovirtnode5 ~]# sanlock client status daemon 165297fa-c9e7-47ec-8949-80f39f52304c.ovirtnode5 p -1 helper p -1 listener p -1 status s 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:2:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/ids:0 s 64f18bf1-4eb6-4b3e-a216-9681091a3bc7:2:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_data/64f18bf1-4eb6-4b3e-a216-9681091a3bc7/dom_md/ids:0 s hosted-engine:2:/var/run/vdsm/storage/0571ac7b-a28e-4e20-9cd8-4803e40ec602/1c7d4c4d-4ae4-4743-a61c-1437459dcc14/699eec1d-c713-4e66-8587-27792d9a2b32:0 s 0571ac7b-a28e-4e20-9cd8-4803e40ec602:2:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/ids:0 node with VMs [root@ovirtnode1 /]# sanlock client status daemon 71784659-0fac-4802-8c0d-0efe3ab977d9.ovirtnode1 p -1 helper p -1 listener p 36456 miac_serv2 p 48024 miac_gitlab_runner p 10151 p 50624 e-l-k.miac p 455336 openfire.miac p 456445 miac_serv3 p 458384 debian9_2 p -1 status s 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:1:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/ids:0 s 64f18bf1-4eb6-4b3e-a216-9681091a3bc7:1:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_data/64f18bf1-4eb6-4b3e-a216-9681091a3bc7/dom_md/ids:0 s hosted-engine:1:/var/run/vdsm/storage/0571ac7b-a28e-4e20-9cd8-4803e40ec602/1c7d4c4d-4ae4-4743-a61c-1437459dcc14/699eec1d-c713-4e66-8587-27792d9a2b32:0 s 0571ac7b-a28e-4e20-9cd8-4803e40ec602:1:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/ids:0 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:b19996be-1548-41ad-afe3-1726ee38d368:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:13631488:7 p 458384 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:4507a184-e158-484e-932a-2f1266b80223:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:7340032:7 p 456445 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:d546add1-126a-4490-bc83-469bab659854:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:19922944:6 p 455336 r 0571ac7b-a28e-4e20-9cd8-4803e40ec602:SDM:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/leases:1048576:10 p 10151 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:7a3af2e7-8296-4fe0-ac55-c52a4b1de93f:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:17825792:5 p 50624 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:4c2aaf48-a3f1-45a1-9c2b-912763643268:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:10485760:4 p 48024 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:6c380073-9650-4832-8416-3001c5a172ab:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:6291456:6 p 36456

Looking at the logs: In the vdsm log: 2019-02-26 17:07:09,440+0400 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/metadata (monitor:498) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in _pathChecked delay = result.delay() File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/metadata', 1, 'Read timeout') Does this match the below error (UTC time), from gluster mount log/ 07:01:23.040258] W [socket.c:600:__socket_rwv] 0-vmstore-client-1: readv on 172.16.100.5:49155 failed (No data available) [2019-02-28 07:01:23.040287] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-vmstore-client-1: disconnected from vmstore-client-1. Client process will keep trying to connect to glusterd until brick's port is available Also, what does the below error translate to in English? I think the error is to do with storage unavailability during the time. Was there an issue with network connection during the time? 2019-02-26 17:08:20,271+0400 WARN (qgapoller/3) [virt.periodic.VmDispatcher] could not run <function <lambda> at 0x7ff7341cd668> on ['de76aa6c-a211-41de-8d85-7d2821c3980d', '7a3af2e7-8296-4fe0-ac55-c52a4b1de93f'] (periodic:323) 2019-02-26 17:08:20,823+0400 ERROR (vm/d546add1) [virt.vm] (vmId='d546add1-126a-4490-bc83-469bab659854') The vm start process failed (vm:948) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 877, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2898, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: Не удалось установить блокировку: На устройстве не осталось свободного места[2019-02-28 On Fri, Mar 1, 2019 at 1:08 PM Mike Lykov <combr@ya.ru> wrote:
01.03.2019 9:51, Sahina Bose пишет:
Any errors in vdsm.log or gluster mount log for this volume?
I cannot find any. Here is full logs from one node for that period:
https://yadi.sk/d/BzLBb8VGNEwidw file name ovirtnode1-logs-260219.tar.gz
gluster, vdsm logs for all volumes
sanlock client status now (can it contain any useful info for "cannot set lock" error?): node without any VMs
[root@ovirtnode5 ~]# sanlock client status daemon 165297fa-c9e7-47ec-8949-80f39f52304c.ovirtnode5 p -1 helper p -1 listener p -1 status s 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:2:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/ids:0 s 64f18bf1-4eb6-4b3e-a216-9681091a3bc7:2:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_data/64f18bf1-4eb6-4b3e-a216-9681091a3bc7/dom_md/ids:0 s hosted-engine:2:/var/run/vdsm/storage/0571ac7b-a28e-4e20-9cd8-4803e40ec602/1c7d4c4d-4ae4-4743-a61c-1437459dcc14/699eec1d-c713-4e66-8587-27792d9a2b32:0 s 0571ac7b-a28e-4e20-9cd8-4803e40ec602:2:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/ids:0
node with VMs
[root@ovirtnode1 /]# sanlock client status daemon 71784659-0fac-4802-8c0d-0efe3ab977d9.ovirtnode1 p -1 helper p -1 listener p 36456 miac_serv2 p 48024 miac_gitlab_runner p 10151 p 50624 e-l-k.miac p 455336 openfire.miac p 456445 miac_serv3 p 458384 debian9_2 p -1 status s 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:1:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/ids:0 s 64f18bf1-4eb6-4b3e-a216-9681091a3bc7:1:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_data/64f18bf1-4eb6-4b3e-a216-9681091a3bc7/dom_md/ids:0 s hosted-engine:1:/var/run/vdsm/storage/0571ac7b-a28e-4e20-9cd8-4803e40ec602/1c7d4c4d-4ae4-4743-a61c-1437459dcc14/699eec1d-c713-4e66-8587-27792d9a2b32:0 s 0571ac7b-a28e-4e20-9cd8-4803e40ec602:1:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/ids:0 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:b19996be-1548-41ad-afe3-1726ee38d368:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:13631488:7 p 458384 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:4507a184-e158-484e-932a-2f1266b80223:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:7340032:7 p 456445 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:d546add1-126a-4490-bc83-469bab659854:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:19922944:6 p 455336 r 0571ac7b-a28e-4e20-9cd8-4803e40ec602:SDM:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/leases:1048576:10 p 10151 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:7a3af2e7-8296-4fe0-ac55-c52a4b1de93f:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:17825792:5 p 50624 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:4c2aaf48-a3f1-45a1-9c2b-912763643268:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:10485760:4 p 48024 r 01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:6c380073-9650-4832-8416-3001c5a172ab:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:6291456:6 p 36456
participants (2)
-
Mike Lykov
-
Sahina Bose