[ovirt-users] Ovirt 4.2 Fencing problems

Martin Perina mperina at redhat.com
Mon Oct 9 18:21:16 UTC 2017


On Mon, Oct 9, 2017 at 6:00 PM, Maton, Brett <matonb at ltresources.co.uk>
wrote:

> Hope this is enough of the log,  no idea why it's saying invalid fence
> agents:
>
> 2017-10-09 16:01:01,535+01 ERROR [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.EE-
> ManagedThreadFactory-engine-Thread-184) [] EVENT_ID:
> VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov_host.example.com command
> ConnectStoragePoolVDS failed: Cannot find master domain:
> u'spUUID=47d12516-a41c-41a7-9da4-320f08fda147, msdUUID=a28ebb26-98be-4850-
> 8e88-0ac2c3d6037a'
> 2017-10-09 16:01:01,535+01 INFO  [org.ovirt.engine.core.
> vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.EE-
> ManagedThreadFactory-engine-Thread-184) [] HostName = ov_host.example.com
> 2017-10-09 16:01:01,535+01 ERROR [org.ovirt.engine.core.
> vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.EE-
> ManagedThreadFactory-engine-Thread-184) [] Command
> 'ConnectStoragePoolVDSCommand(HostName = ov_host.example.com,
> ConnectStoragePoolVDSCommandParameters:{hostId='9390df92-5c79-4a25-9e94-1ca2d4aedc74',
> vdsId='9390df92-5c79-4a25-9e94-1ca2d4aedc74',
> storagePoolId='47d12516-a41c-41a7-9da4-320f08fda147',
> masterVersion='2'})' execution failed: IRSGenericException:
> IRSErrorException: IRSNoMasterDomainException: Cannot find master domain:
> u'spUUID=47d12516-a41c-41a7-9da4-320f08fda147, msdUUID=a28ebb26-98be-4850-
> 8e88-0ac2c3d6037a'
> 2017-10-09 16:01:01,535+01 INFO  [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
> (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-184) [] Could not
> connect host 'ov_host.example.com' to pool 'GHDC', as the master domain
> is in inactive/unknown status - not failing the operation
>

​
​This seems like severe storage issue​
, if your master storage domain is inactive/unknown, then your VMs cannot
​run. You would need to check VDSM logs to find the real cause of the
issue. Are you sure there are no storage or network issues on your setup?



> 2017-10-09 16:01:01,565+01 INFO  [org.ovirt.engine.core.
> vdsbroker.vdsbroker.SetMOMPolicyParametersVDSCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-1) [594458d4] START,
> SetMOMPolicyParametersVDSCommand(HostName = ov_host.example.com,
> MomPolicyVDSParameters:{hostId='9390df92-5c79-4a25-9e94-1ca2d4aedc74'}),
> log id: 43532a7a
> 2017-10-09 16:01:01,595+01 INFO  [org.ovirt.engine.core.
> vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.EE-
> ManagedThreadFactory-engine-Thread-210) [6eeda756] START,
> SpmStopVDSCommand(HostName = ov_host.example.com,
> SpmStopVDSCommandParameters:{hostId='9390df92-5c79-4a25-9e94-1ca2d4aedc74',
> storagePoolId='47d12516-a41c-41a7-9da4-320f08fda147'}), log id: 3861bb24
> 2017-10-09 16:01:01,602+01 ERROR [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.EE-
> ManagedThreadFactory-engine-Thread-210) [6eeda756] EVENT_ID:
> VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov_host.example.com command
> HSMGetAllTasksStatusesVDS failed: Not SPM: ()
> 2017-10-09 16:01:01,602+01 INFO  [org.ovirt.engine.core.
> vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
> (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-210) [6eeda756]
> HostName = ov_host.example.com
> 2017-10-09 16:01:01,602+01 ERROR [org.ovirt.engine.core.
> vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
> (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-210) [6eeda756]
> Command 'HSMGetAllTasksStatusesVDSCommand(HostName = ov_host.example.com,
> VdsIdVDSCommandParametersBase:{hostId='9390df92-5c79-4a25-9e94-1ca2d4aedc74'})'
> execution failed: IRSGenericException: IRSErrorException:
> IRSNonOperationalException: Not SPM: ()
> 2017-10-09 16:01:01,653+01 INFO  [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-1)
> [2e03c9cf] EVENT_ID: VDS_DETECTED(13), Status of host ov_host.example.com
> was set to Up.
> 2017-10-09 16:01:01,656+01 WARN  [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-1)
> [2e03c9cf] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management
> test failed for Host ov_host.example.com.Invalid fence agents defined for
> host '192.168.1.12'.
> 2017-10-09 16:01:01,659+01 ERROR [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-1)
> [2e03c9cf] EVENT_ID: VDS_FENCE_STATUS_FAILED(497), Failed to verify Host
> ov_host.example.com power management.
>

​We are trying to get power management status of the host, when activating
the host. In that case there was an error trying when getting power
management status, more details should be in VDSM logs of the host that
acted as fence proxy.
​


>
>
>
>
> On 9 October 2017 at 14:41, Maton, Brett <matonb at ltresources.co.uk> wrote:
>
>> Hi Martin,
>>
>>   Sure is there a way to trigger the check (I cleared the alerts), or do
>> I just need to have a poke around the engine log next time it happens?
>>
>
​Yes, please take a look at engine-config option PMHealthCheckIntervalInSec:

  engine-config -l | grep​

​PMHealthCheckIntervalInSec

After updating this value you need to restart ovirt-engine service.
​

>
>> Regards,
>> Brett
>>
>> On 9 October 2017 at 14:26, Martin Perina <mperina at redhat.com> wrote:
>>
>>>
>>>
>>> On Mon, Oct 9, 2017 at 2:40 PM, Maton, Brett <matonb at ltresources.co.uk>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>   I recently upgraded my testlab to oVirt 4.2-pre and have been having
>>>> some issues.
>>>>
>>>>   One of which is fencing with Dell Drac 8 remote access cards.
>>>>   The configuration I'm using works fine on my other (4.1.6) cluster...
>>>>
>>>>   Error:
>>>> Health check on Host host.example.com indicates that future attempts
>>>> to Start this host using Power-Management are expected to fail.
>>>>
>>>
>>> ​Hi, ​
>>>
>>> ​Could you please share engine.log around above message? We are
>>> executing get power management status call to check, so if this get PM
>>> status failed, the above error is raised.
>>>
>>> Thanks
>>>
>>> Martin
>>>>>>
>>>
>>>>
>>>>
>>>>   Fence agent setup for the host appears to be working though:
>>>>
>>>>
>>>>>>>>
>>>>   Which logs would be helpful in debugging this issue ?
>>>>
>>>> Regards,
>>>> Brett
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171009/dad67923/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oVirt_4.2-fence.png
Type: image/png
Size: 11287 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171009/dad67923/attachment.png>


More information about the Users mailing list