Any errors in vdsm.log or gluster mount log for this volume?
On Wed, Feb 27, 2019 at 1:07 PM Mike Lykov <combr(a)ya.ru> wrote:
>
>
> Hi all. I have a HCI setup, glusterfs 3.12, ovirt 4.2.7, 4 nodes
>
> Yesterday I see 3 VMs detected by engine as "not responding" (it is marked
as HA VMs)
> (it all located on ovirtnode1 server)
> Two of them are restarted by engine on other nodes successfully, but one are not. All
get LOCALIZED message like 'cannot set lock: no free space on device' - what is
this ? Why engine get that errors, and why some VMs can restart automatically, some not
(but successfully restarted bu user via webui after some pause?)
> Who knows, what are you think? Full engine logs may be uploaded.
>
> from engine.log: start event
> ---------------
> 2019-02-26 17:04:05,308+04 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM
'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'Up'
--> 'NotResponding'
> 2019-02-26 17:04:05,865+04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126),
VM openfire.miac is not responding.
> 2019-02-26 17:04:05,865+04 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM
'7a3af2e7-8296-4fe0-ac55-c52a4b1de93f'(e-l-k.miac) moved from 'Up' -->
'NotResponding'
> 2019-02-26 17:04:05,894+04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126),
VM e-l-k.miac is not responding.
> 2019-02-26 17:04:05,895+04 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM
'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'Up' -->
'NotResponding'
> 2019-02-26 17:04:05,926+04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: VM_NOT_RESPONDING(126),
VM tsgr-mon is not responding.
> ---------------------------
> 2019-02-26 17:04:22,237+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with
error. Exit message: VM has been terminated on the host.
> ....
> 2019-02-26 17:04:22,374+04 INFO [org.ovirt.engine.core.bll.VdsEventListener]
(ForkJoinPool-1-worker-9) [] Highly Available VM went down. Attempting to restart. VM Name
'openfire.miac', VM Id 'd546add1-126a-4490-bc83-469bab659854'
> ...
> 2019-02-26 17:04:27,737+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: VM_DOWN_ERROR(119), VM
openfire.miac is down with error. Exit message: resource busy: Failed to acquire lock:
Lease is held by another host.
> ...
> 2019-02-26 17:04:28,350+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-2886073) [] EVENT_ID: VDS_INITIATED_RUN_VM(506),
Trying to restart VM openfire.miac on Host ovirtnode6.miac
> 2019-02-26 17:04:31,841+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with
error. Exit message: resource busy: Failed to acquire lock: Lease is held by another
host.
> 2019-02-26 17:04:31,877+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID:
VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host
ovirtnode6.miac
> ...
> 2019-02-26 17:04:31,994+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: VDS_INITIATED_RUN_VM(506),
Trying to restart VM openfire.miac on Host ovirtnode1.miac
> .....
> 2019-02-26 17:04:36,054+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is down with
error. Exit message: Не удалось установить блокировку: На устройстве не осталось
свободного места.
> 2019-02-26 17:04:36,054+04 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [] add
VM 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) to rerun treatment
> 2019-02-26 17:04:36,091+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-2886083) [] EVENT_ID:
VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host
ovirtnode1.miac
> -------------------
> No more attemtps for this VM were made (state now is 'Down')
>
> Engine tried restart this VM on some other nodes, get ' Lease is held by another
host' (it is normal, because timeout for lock not expired?) and then got (LOCALIZED
MESSAGE ?? Why it is localized while all other are not?) 'cannot set lock: no free
space on device'
>
> Which 'device' it mean? How to see how many 'free space' in it?
gluster volumes almost empty:
> ovirtstor1.miac:/engine 150G 7,5G 143G
5% /rhev/data-center/mnt/glusterSD/ovirtstor1.miac:_engine
> ovirtnode1.miac:/data 5,9T 42G 5,9T
1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_data
> ovirtnode1.miac:/vmstore 4,4T 194G 4,2T
5% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_vmstore
> ovirtnode1.miac:/iso_storage 600G 1,3G 599G
1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_iso__storage
> ovirtnode1.miac:/export 5,1T 14G 5,1T
1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_export
>
> Other VMs also get this error about free space:
> -------------
> 2019-02-26 17:05:05,142+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-11) [] EVENT_ID: VM_DOWN_ERROR(119), VM tsgr-mon is down with
error. Exit message: Не удалось установить блокир
> овку: На устройстве не осталось свободного места.
>
> 2019-02-26 17:05:43,526+04 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-11) [] VM
'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'PoweringUp'
--> 'Up'
> 2019-02-26 17:05:43,537+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-11) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM
tsgr-mon on Host ovirtnode6.miac
> ----------------
> And it is last message about this VM, state now is 'Up'
>
> Therefore all VMs are migrated from this node ovirtnode1, and later in log are
records about it unresponsiveness:
>
> 2019-02-26 17:07:43,584+04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler9) [35e05907] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM
ovirtnode1.miac command GetGlusterVolumeHealIn
> foVDS failed: Message timeout which can be caused by communication issues
> 2019-02-26 17:07:43,592+04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-2886194) [35e05907] EVENT_ID:
VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Host ovirtnode1.miac i
> s not responding. It will stay in Connecting state for a grace period of 60 seconds
and after that an attempt to fence the host will be issued.
> 2019-02-26 17:07:49,684+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] EVENT_ID:
FENCE_OPERATION_USING_AGENT_AND_PROXY_STARTED(9,020), Executing power management status on
Host ovirtnode1.miac using Proxy Host ovirtnode5.miac and Fence Agent
ipmilan:172.16.16.1.
> 2019-02-26 17:07:49,685+04 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] START,
FenceVdsVDSCommand(HostName = ovirtnode5.miac, ... action='STATUS',
agent='FenceAgent:{id='338cbdd9-aab6-484c-a906-21e747a96a84', ...
type='ipmilan', ip='172.16.16.1', port='null',
user='admin', password='***', encryptOptions='false',
options=''}', policy='null'}), log id: 281e6c2019-02-26
17:07:49,942+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] FINISH, FenceVdsVDSCommand,
return: FenceOperationResult:{status='SUCCESS', powerStatus='ON',
message=''}, log id: 281e6c
>
> 2019-02-26 17:07:56,131+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [317316e4] EVENT_ID:
SYNC_STORAGE_DEVICES_IN_HOST(4,126), Manually synced the storage devices from host
ovirtnode1.miac
> 2019-02-26 17:07:56,550+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: VDS_DETECTED(13),
Status of host ovirtnode1.miac was set to Up.
> 2019-02-26 17:07:56,554+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID:
VDS_FENCE_STATUS(496), Host ovirtnode1.miac power managementwas verified successfully.
> -------------
>
> After all, now I try to start VM that not restarted automatically:
> -------------
> 2019-02-27 11:23:30,139+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-2921212) [8e405cb9-3a3a-42ab-ae05-5aaf40194b30]
EVENT_ID: USER_STARTED_VM(153), VM openfire.miac was started by admin@internal-authz
(Host: ovirtnode1.miac).
> .....
> 2019-02-27 11:24:12,489+04 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM
'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from
'PoweringUp' --> 'Up'
> 2019-02-27 11:24:12,502+04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-1) [] EVENT_ID: USER_RUN_VM(32), VM openfire.miac started on Host
ovirtnode1.miac
> ---------------
>
> It started without problems on same host ovirtnode1. Strange.
>
>
> --
> Mike
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XHC5MDVYLIH...