On Tue, Dec 15, 2020 at 12:34 PM Martin Perina <mperina@redhat.com> wrote:


On Tue, Dec 15, 2020 at 11:18 AM Alex K <rightkicktech@gmail.com> wrote:


On Tue, Dec 15, 2020 at 11:59 AM Martin Perina <mperina@redhat.com> wrote:
Hi,

could you please provide engine.log? And also vdsm.log from a host which was acting as a fence proxy?

At proxy host (kvm1) I see the following vdsm.log:

2020-12-15 10:13:03,933+0000 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.01 seconds (__init__:312)
2020-12-15 10:13:04,376+0000 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.01 seconds (__init__:312)

Isn't there stdout and stderr content of fence_xvm execution a few lines above, which should reveal the exact error? If not, then could you please turn on debug logging using below command:

vdsm-client Host setLogLevel level=DEBUG

This should be executed on the host which acts as a fence proxy (if you have multiple hosts, then you would need to turn on debug on all, because the fence proxy is selected randomly).
Once we will have vdsm.log with fence_xvm execution details, then you can change log level to INFO again by running:
I had to set engine-config -s CustomFenceAgentMapping="fence_xvm=xvm" at engine, as it seems the host prepends fence_.
After that I got the following at the proxy host with DEBUG enabled:

2020-12-15 10:51:57,891+0000 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Calling 'Host.fenceNode' in bridge with {u'username': u'root', u'addr': u'225.0.0.12', u'agent': u'xvm', u'options': u'port=ovirt-node0', u'action': u'status', u'password': '********', u'port': u'0'} (__init__:329)
2020-12-15 10:51:57,892+0000 DEBUG (jsonrpc/7) [root] /usr/bin/taskset --cpu-list 0-3 /usr/sbin/fence_xvm (cwd None) (commands:198)
2020-12-15 10:51:57,911+0000 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.02 seconds (__init__:312)
2020-12-15 10:51:58,339+0000 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Calling 'Host.fenceNode' in bridge with {u'username': u'root', u'addr': u'225.0.0.12', u'agent': u'xvm', u'options': u'port=ovirt-node0', u'action': u'status', u'password': '********', u'port': u'0'} (__init__:329)
2020-12-15 10:51:58,340+0000 DEBUG (jsonrpc/5) [root] /usr/bin/taskset --cpu-list 0-3 /usr/sbin/fence_xvm (cwd None) (commands:198)
2020-12-15 10:51:58,356+0000 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.01 seconds (__init__:312

while at engine at got:
2020-12-15 10:51:57,873Z INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_STARTED(9,020), Executing power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent xvm:225.0.0.12.
2020-12-15 10:51:57,888Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] START, FenceVdsVDSCommand(HostName = kvm1.lab.local, FenceVdsVDSCommandParameters:{hostId='91c81bbe-5933-4ed0-b9c5-2c8c277e44c7', targetVdsId='b5e8fe3d-cbea-44cb-835a-f88d6d70c163', action='STATUS', agent='FenceAgent:{id='null', hostId='null', order='1', type='xvm', ip='225.0.0.12', port='0', user='root', password='***', encryptOptions='false', options='port=ovirt-node0'}', policy='null'}), log id: e6d3e8c
2020-12-15 10:51:58,008Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-15 10:51:58,008Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: e6d3e8c
2020-12-15 10:51:58,133Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent xvm:225.0.0.12 failed.
2020-12-15 10:51:58,134Z WARN  [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] Fence action failed using proxy host 'kvm1.lab.local', trying another proxy
2020-12-15 10:51:58,258Z ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] Can not run fence action on host 'kvm0.lab.local', no suitable proxy host was found.
2020-12-15 10:51:58,258Z WARN  [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] Failed to find another proxy to re-run failed fence action, retrying with the same proxy 'kvm1.lab.local'
2020-12-15 10:51:58,334Z INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_STARTED(9,020), Executing power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent xvm:225.0.0.12.
2020-12-15 10:51:58,337Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] START, FenceVdsVDSCommand(HostName = kvm1.lab.local, FenceVdsVDSCommandParameters:{hostId='91c81bbe-5933-4ed0-b9c5-2c8c277e44c7', targetVdsId='b5e8fe3d-cbea-44cb-835a-f88d6d70c163', action='STATUS', agent='FenceAgent:{id='null', hostId='null', order='1', type='xvm', ip='225.0.0.12', port='0', user='root', password='***', encryptOptions='false', options='port=ovirt-node0'}', policy='null'}), log id: 557cbe7a
2020-12-15 10:51:58,426Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-15 10:51:58,427Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: 557cbe7a
2020-12-15 10:51:58,508Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [a4f30921-37a9-45c1-97e5-26152f844d72] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent xvm:225.0.0.12 failed.

I see that the proxy host uses twice the port option. Could that be the reason?

vdsm-client Host setLogLevel level=INFO

Thanks,
Martin

2020-12-15 10:13:06,722+0000 INFO  (jsonrpc/4) [api.host] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'info': {'cpuStatistics': {'1': {'cpuUser': '2.33', 'nodeIndex': 0, 'cpuSys': '1.13', 'cpuIdle': '96.54'}, '0': {'cpuUser': '1.66', 'nodeIndex': 0, 'cpuSys': '0.47', 'cpuIdle': '97.87'}, '3': {'cpuUser': '0.73', 'nodeIndex': 0, 'cpuSys': '0.60', 'cpuIdle': '98.67'}, '2': {'cpuUser': '1.20', 'nodeIndex': 0, 'cpuSys': '0.40', 'cpuIdle': '98.40'}}, 'numaNodeMemFree': {'0': {'memPercent': 14, 'memFree': '8531'}}, 'memShared': 0, 'haScore': 3400, 'thpState': 'always', 'ksmMergeAcrossNodes': True, 'vmCount': 0, 'memUsed': '8', 'storageDomains': {u'b4d25e5e-7806-464f-b2e1-4d4ab5a54dee': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.0027973', 'lastCheck': '2.7', 'valid': True}, u'dc4d507b-954f-4da6-bcc3-b4f2633d0fa1': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.00285824', 'lastCheck': '5.7', 'valid': True}}, 'incomingVmMigrations': 0, 'network': {'ovirtmgmt': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '149', 'name': 'ovirtmgmt', 'tx': '2980375', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '27524740', 'state': 'up'}, 'lo': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'lo', 'tx': '1085188922', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '1085188922', 'state': 'up'}, 'ovs-system': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'ovs-system', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '0', 'state': 'down'}, ';vdsmdummy;': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': ';vdsmdummy;', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '0', 'state': 'down'}, 'br-int': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'br-int', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '0', 'state': 'down'}, 'eth1': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth1', 'tx': '83685154', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '300648288', 'state': 'up'}, 'eth0': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth0', 'tx': '2980933', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '28271472', 'state': 'up'}}, 'txDropped': '149', 'anonHugePages': '182', 'ksmPages': 100, 'elapsedTime': '5717.99', 'cpuLoad': '0.42', 'cpuSys': '0.63', 'diskStats': {'/var/log': {'free': '16444'}, '/var/run/vdsm/': {'free': '4909'}, '/tmp': {'free': '16444'}}, 'cpuUserVdsmd': '1.33', 'netConfigDirty': 'False', 'memCommitted': 0, 'ksmState': False, 'vmMigrating': 0, 'ksmCpu': 0, 'memAvailable': 9402, 'bootTime': '1608021428', 'haStats': {'active': True, 'configured': True, 'score': 3400, 'localMaintenance': False, 'globalMaintenance': True}, 'momStatus': 'active', 'multipathHealth': {}, 'rxDropped': '0', 'outgoingVmMigrations': 0, 'swapTotal': 6015, 'swapFree': 6015, 'hugepages': defaultdict(<type 'dict'>, {1048576: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}, 2048: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}}), 'dateTime': '2020-12-15T10:13:06 GMT', 'cpuUser': '1.50', 'memFree': 9146, 'cpuIdle': '97.87', 'vmActive': 0, 'v2vJobs': {}, 'cpuSysVdsmd': '0.60'}} from=::1,55238 (api:54)
2020-12-15 10:13:07,093+0000 INFO  (jsonrpc/1) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'0167fedb-7445-46bb-a39d-ea4471c86bf4'} (api:129)
2020-12-15 10:13:07,094+0000 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:312)
2020-12-15 10:13:07,631+0000 INFO  (jsonrpc/3) [api.host] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'info': {'cpuStatistics': {'1': {'cpuUser': '2.33', 'nodeIndex': 0, 'cpuSys': '1.13', 'cpuIdle': '96.54'}, '0': {'cpuUser': '1.66', 'nodeIndex': 0, 'cpuSys': '0.47', 'cpuIdle': '97.87'}, '3': {'cpuUser': '0.73', 'nodeIndex': 0, 'cpuSys': '0.60', 'cpuIdle': '98.67'}, '2': {'cpuUser': '1.20', 'nodeIndex': 0, 'cpuSys': '0.40', 'cpuIdle': '98.40'}}, 'numaNodeMemFree': {'0': {'memPercent': 14, 'memFree': '8531'}}, 'memShared': 0, 'haScore': 3400, 'thpState': 'always', 'ksmMergeAcrossNodes': True, 'vmCount': 0, 'memUsed': '8', 'storageDomains': {u'b4d25e5e-7806-464f-b2e1-4d4ab5a54dee': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.0027973', 'lastCheck': '3.6', 'valid': True}, u'dc4d507b-954f-4da6-bcc3-b4f2633d0fa1': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.00285824', 'lastCheck': '6.6', 'valid': True}}, 'incomingVmMigrations': 0, 'network': {'ovirtmgmt': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '149', 'name': 'ovirtmgmt', 'tx': '2985005', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '27525820', 'state': 'up'}, 'lo': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'lo', 'tx': '1085195824', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '1085195824', 'state': 'up'}, 'ovs-system': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'ovs-system', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '0', 'state': 'down'}, ';vdsmdummy;': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': ';vdsmdummy;', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '0', 'state': 'down'}, 'br-int': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'br-int', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '0', 'state': 'down'}, 'eth1': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth1', 'tx': '83689498', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '300653876', 'state': 'up'}, 'eth0': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth0', 'tx': '2985215', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '28272664', 'state': 'up'}}, 'txDropped': '149', 'anonHugePages': '182', 'ksmPages': 100, 'elapsedTime': '5718.91', 'cpuLoad': '0.42', 'cpuSys': '0.63', 'diskStats': {'/var/log': {'free': '16444'}, '/var/run/vdsm/': {'free': '4909'}, '/tmp': {'free': '16444'}}, 'cpuUserVdsmd': '1.33', 'netConfigDirty': 'False', 'memCommitted': 0, 'ksmState': False, 'vmMigrating': 0, 'ksmCpu': 0, 'memAvailable': 9402, 'bootTime': '1608021428', 'haStats': {'active': True, 'configured': True, 'score': 3400, 'localMaintenance': False, 'globalMaintenance': True}, 'momStatus': 'active', 'multipathHealth': {}, 'rxDropped': '0', 'outgoingVmMigrations': 0, 'swapTotal': 6015, 'swapFree': 6015, 'hugepages': defaultdict(<type 'dict'>, {1048576: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}, 2048: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}}), 'dateTime': '2020-12-15T10:13:07 GMT', 'cpuUser': '1.50', 'memFree': 9146, 'cpuIdle': '97.87', 'vmActive': 0, 'v2vJobs': {}, 'cpuSysVdsmd': '0.60'}} from=::1,55238 (api:54)

While at engine I have:
2020-12-15 10:09:57,393Z ERROR [org.ovirt.engine.core.utils.pm.VdsFenceOptions] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Cannot find fence agent named 'fence_xvm' in fence option mapping
2020-12-15 10:09:57,519Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-15 10:09:57,519Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: dc98f7c
2020-12-15 10:09:57,596Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.
2020-12-15 10:09:57,596Z WARN  [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Fence action failed using proxy host 'kvm1.lab.local', trying another proxy
2020-12-15 10:09:57,694Z ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Can not run fence action on host 'kvm0.lab.local', no suitable proxy host was found.
2020-12-15 10:09:57,695Z WARN  [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Failed to find another proxy to re-run failed fence action, retrying with the same proxy 'kvm1.lab.local'
2020-12-15 10:09:57,695Z ERROR [org.ovirt.engine.core.utils.pm.VdsFenceOptions] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Cannot find fence agent named 'fence_xvm' in fence option mapping
2020-12-15 10:09:57,815Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-15 10:09:57,816Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: 4b58ec5e
2020-12-15 10:09:57,895Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.

At engine I had set the fence agent mapping as below (and have restarted ovirt-engine service):

engine-config -g CustomFenceAgentMapping
CustomFenceAgentMapping: fence_xvm=fence_xvm version: general

Let me know if you need more logs.
I am running ovirt 4.3.10.


Thanks,
Martin


On Tue, Dec 15, 2020 at 10:23 AM Alex K <rightkicktech@gmail.com> wrote:


On Tue, Dec 15, 2020 at 11:07 AM Alex K <rightkicktech@gmail.com> wrote:


On Mon, Dec 14, 2020 at 8:59 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Fence_xvm requires a key is deployed on both the Host and the VMs in order to succeed. What is happening when you use the cli on any of the VMs ?
Also, the VMs require an open tcp port to receive the necessary output of each request.I
I deployed keys at the physical host and virtual hosts, as per https://github.com/rightkick/Notes/blob/master/Ovirt-fence_xmv.md
I can get the VM status from the virtual hosts:

[root@kvm1 cluster]# fence_xvm -a 225.0.0.12 -k /etc/cluster/fence_xvm.key -H ovirt-node0 -o status
Status: ON
You have new mail in /var/spool/mail/root
[root@kvm1 cluster]# fence_xvm -a 225.0.0.12 -k /etc/cluster/fence_xvm.key -H ovirt-node1 -o status
Status: ON

kvm0 and kvm1 are the hostnames of each virtual host, while ovirt-node0 and ovirt-node1 are the domain names of the same virtual hosts as defined at virsh.

I am passing also the port/domain option at GUI, but from logs it seems it is being ignored as it is not being logged from engine.

image.png
tried also domain=ovirt-node0 with same results.



Best Regards,
Strahil Nikolov






В понеделник, 14 декември 2020 г., 10:57:11 Гринуич+2, Alex K <rightkicktech@gmail.com> написа:





Hi friends,

I was wondering what is needed to setup fence_xvm in order to use for power management in virtual nested environments for testing purposes.

I have followed the following steps:
https://github.com/rightkick/Notes/blob/master/Ovirt-fence_xmv.md

I tried also engine-config -s CustomFenceAgentMapping="fence_xvm=_fence_xvm"
From command line all seems fine and I can get the status of the host VMs, but I was not able to find what is needed to set this up at engine UI:


At username and pass I just filled dummy values as they should not be needed for fence_xvm.
I always get an error at GUI while engine logs give:


2020-12-14 08:53:48,343Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-14 08:53:48,343Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: 2437b13c
2020-12-14 08:53:48,400Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.
2020-12-14 08:53:48,400Z WARN  [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] Fence action failed using proxy host 'kvm1.lab.local', trying another proxy
2020-12-14 08:53:48,485Z ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] Can not run fence action on host 'kvm0.lab.local', no suitable proxy host was found.
2020-12-14 08:53:48,486Z WARN  [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] Failed to find another proxy to re-run failed fence action, retrying with the same proxy 'kvm1.lab.local'
2020-12-14 08:53:48,582Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-14 08:53:48,582Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: 8607bc9
2020-12-14 08:53:48,637Z WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.


Any idea?

Thanx,
Alex


_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/B7IHC4MYY5LJFJMEJMLRRFSTMD7IK23I/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MV3RI22LE4C57R6TUQR5BG3LVZUVWRNX/


--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.


--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.