Hi,
could you please provide engine.log? And also vdsm.log from a host which was acting as a fence proxy?
At proxy host (kvm1) I see the following vdsm.log:
2020-12-15 10:13:03,933+0000 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.01 seconds (__init__:312)
2020-12-15 10:13:04,376+0000 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.01 seconds (__init__:312)
2020-12-15 10:13:06,722+0000 INFO (jsonrpc/4) [api.host] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'info': {'cpuStatistics': {'1': {'cpuUser': '2.33', 'nodeIndex': 0, 'cpuSys': '1.13', 'cpuIdle': '96.54'}, '0': {'cpuUser': '1.66', 'nodeIndex': 0, 'cpuSys': '0.47', 'cpuIdle': '97.87'}, '3': {'cpuUser': '0.73', 'nodeIndex': 0, 'cpuSys': '0.60', 'cpuIdle': '98.67'}, '2': {'cpuUser': '1.20', 'nodeIndex': 0, 'cpuSys': '0.40', 'cpuIdle': '98.40'}}, 'numaNodeMemFree': {'0': {'memPercent': 14, 'memFree': '8531'}}, 'memShared': 0, 'haScore': 3400, 'thpState': 'always', 'ksmMergeAcrossNodes': True, 'vmCount': 0, 'memUsed': '8', 'storageDomains': {u'b4d25e5e-7806-464f-b2e1-4d4ab5a54dee': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.0027973', 'lastCheck': '2.7', 'valid': True}, u'dc4d507b-954f-4da6-bcc3-b4f2633d0fa1': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.00285824', 'lastCheck': '5.7', 'valid': True}}, 'incomingVmMigrations': 0, 'network': {'ovirtmgmt': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '149', 'name': 'ovirtmgmt', 'tx': '2980375', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '27524740', 'state': 'up'}, 'lo': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'lo', 'tx': '1085188922', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '1085188922', 'state': 'up'}, 'ovs-system': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'ovs-system', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '0', 'state': 'down'}, ';vdsmdummy;': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': ';vdsmdummy;', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '0', 'state': 'down'}, 'br-int': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'br-int', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '0', 'state': 'down'}, 'eth1': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth1', 'tx': '83685154', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '300648288', 'state': 'up'}, 'eth0': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth0', 'tx': '2980933', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027186.703727, 'rx': '28271472', 'state': 'up'}}, 'txDropped': '149', 'anonHugePages': '182', 'ksmPages': 100, 'elapsedTime': '5717.99', 'cpuLoad': '0.42', 'cpuSys': '0.63', 'diskStats': {'/var/log': {'free': '16444'}, '/var/run/vdsm/': {'free': '4909'}, '/tmp': {'free': '16444'}}, 'cpuUserVdsmd': '1.33', 'netConfigDirty': 'False', 'memCommitted': 0, 'ksmState': False, 'vmMigrating': 0, 'ksmCpu': 0, 'memAvailable': 9402, 'bootTime': '1608021428', 'haStats': {'active': True, 'configured': True, 'score': 3400, 'localMaintenance': False, 'globalMaintenance': True}, 'momStatus': 'active', 'multipathHealth': {}, 'rxDropped': '0', 'outgoingVmMigrations': 0, 'swapTotal': 6015, 'swapFree': 6015, 'hugepages': defaultdict(<type 'dict'>, {1048576: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}, 2048: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}}), 'dateTime': '2020-12-15T10:13:06 GMT', 'cpuUser': '1.50', 'memFree': 9146, 'cpuIdle': '97.87', 'vmActive': 0, 'v2vJobs': {}, 'cpuSysVdsmd': '0.60'}} from=::1,55238 (api:54)
2020-12-15 10:13:07,093+0000 INFO (jsonrpc/1) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'0167fedb-7445-46bb-a39d-ea4471c86bf4'} (api:129)
2020-12-15 10:13:07,094+0000 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:312)
2020-12-15 10:13:07,631+0000 INFO (jsonrpc/3) [api.host] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'info': {'cpuStatistics': {'1': {'cpuUser': '2.33', 'nodeIndex': 0, 'cpuSys': '1.13', 'cpuIdle': '96.54'}, '0': {'cpuUser': '1.66', 'nodeIndex': 0, 'cpuSys': '0.47', 'cpuIdle': '97.87'}, '3': {'cpuUser': '0.73', 'nodeIndex': 0, 'cpuSys': '0.60', 'cpuIdle': '98.67'}, '2': {'cpuUser': '1.20', 'nodeIndex': 0, 'cpuSys': '0.40', 'cpuIdle': '98.40'}}, 'numaNodeMemFree': {'0': {'memPercent': 14, 'memFree': '8531'}}, 'memShared': 0, 'haScore': 3400, 'thpState': 'always', 'ksmMergeAcrossNodes': True, 'vmCount': 0, 'memUsed': '8', 'storageDomains': {u'b4d25e5e-7806-464f-b2e1-4d4ab5a54dee': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.0027973', 'lastCheck': '3.6', 'valid': True}, u'dc4d507b-954f-4da6-bcc3-b4f2633d0fa1': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.00285824', 'lastCheck': '6.6', 'valid': True}}, 'incomingVmMigrations': 0, 'network': {'ovirtmgmt': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '149', 'name': 'ovirtmgmt', 'tx': '2985005', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '27525820', 'state': 'up'}, 'lo': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'lo', 'tx': '1085195824', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '1085195824', 'state': 'up'}, 'ovs-system': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'ovs-system', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '0', 'state': 'down'}, ';vdsmdummy;': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': ';vdsmdummy;', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '0', 'state': 'down'}, 'br-int': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'br-int', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '0', 'state': 'down'}, 'eth1': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth1', 'tx': '83689498', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '300653876', 'state': 'up'}, 'eth0': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth0', 'tx': '2985215', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1608027187.616894, 'rx': '28272664', 'state': 'up'}}, 'txDropped': '149', 'anonHugePages': '182', 'ksmPages': 100, 'elapsedTime': '5718.91', 'cpuLoad': '0.42', 'cpuSys': '0.63', 'diskStats': {'/var/log': {'free': '16444'}, '/var/run/vdsm/': {'free': '4909'}, '/tmp': {'free': '16444'}}, 'cpuUserVdsmd': '1.33', 'netConfigDirty': 'False', 'memCommitted': 0, 'ksmState': False, 'vmMigrating': 0, 'ksmCpu': 0, 'memAvailable': 9402, 'bootTime': '1608021428', 'haStats': {'active': True, 'configured': True, 'score': 3400, 'localMaintenance': False, 'globalMaintenance': True}, 'momStatus': 'active', 'multipathHealth': {}, 'rxDropped': '0', 'outgoingVmMigrations': 0, 'swapTotal': 6015, 'swapFree': 6015, 'hugepages': defaultdict(<type 'dict'>, {1048576: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}, 2048: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}}), 'dateTime': '2020-12-15T10:13:07 GMT', 'cpuUser': '1.50', 'memFree': 9146, 'cpuIdle': '97.87', 'vmActive': 0, 'v2vJobs': {}, 'cpuSysVdsmd': '0.60'}} from=::1,55238 (api:54)
While at engine I have:
2020-12-15 10:09:57,393Z ERROR [org.ovirt.engine.core.utils.pm.VdsFenceOptions] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Cannot find fence agent named 'fence_xvm' in fence option mapping
2020-12-15 10:09:57,519Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-15 10:09:57,519Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: dc98f7c
2020-12-15 10:09:57,596Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.
2020-12-15 10:09:57,596Z WARN [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Fence action failed using proxy host 'kvm1.lab.local', trying another proxy
2020-12-15 10:09:57,694Z ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Can not run fence action on host 'kvm0.lab.local', no suitable proxy host was found.
2020-12-15 10:09:57,695Z WARN [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Failed to find another proxy to re-run failed fence action, retrying with the same proxy 'kvm1.lab.local'
2020-12-15 10:09:57,695Z ERROR [org.ovirt.engine.core.utils.pm.VdsFenceOptions] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] Cannot find fence agent named 'fence_xvm' in fence option mapping
2020-12-15 10:09:57,815Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-15 10:09:57,816Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: 4b58ec5e
2020-12-15 10:09:57,895Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [fa61ae72-bc0c-4487-aeec-2b847877b6b5] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.
At engine I had set the fence agent mapping as below (and have restarted ovirt-engine service):
engine-config -g CustomFenceAgentMapping
CustomFenceAgentMapping: fence_xvm=fence_xvm version: general
Let me know if you need more logs.
I am running ovirt 4.3.10.
Fence_xvm requires a key is deployed on both the Host and the VMs in order to succeed. What is happening when you use the cli on any of the VMs ?
Also, the VMs require an open tcp port to receive the necessary output of each request.I
I can get the VM status from the virtual hosts:
[root@kvm1 cluster]# fence_xvm -a 225.0.0.12 -k /etc/cluster/fence_xvm.key -H ovirt-node0 -o status
Status: ON
You have new mail in /var/spool/mail/root
[root@kvm1 cluster]# fence_xvm -a 225.0.0.12 -k /etc/cluster/fence_xvm.key -H ovirt-node1 -o status
Status: ON
kvm0 and kvm1 are the hostnames of each virtual host, while ovirt-node0 and ovirt-node1 are the domain names of the same virtual hosts as defined at virsh.
I am passing also the port/domain option at GUI, but from logs it seems it is being ignored as it is not being logged from engine.
tried also domain=ovirt-node0 with same results.
Best Regards,
Strahil Nikolov
В понеделник, 14 декември 2020 г., 10:57:11 Гринуич+2, Alex K <rightkicktech@gmail.com> написа:
Hi friends,
I was wondering what is needed to setup fence_xvm in order to use for power management in virtual nested environments for testing purposes.
I have followed the following steps:
https://github.com/rightkick/Notes/blob/master/Ovirt-fence_xmv.md
I tried also engine-config -s CustomFenceAgentMapping="fence_xvm=_fence_xvm"
From command line all seems fine and I can get the status of the host VMs, but I was not able to find what is needed to set this up at engine UI:
At username and pass I just filled dummy values as they should not be needed for fence_xvm.
I always get an error at GUI while engine logs give:
2020-12-14 08:53:48,343Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-14 08:53:48,343Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: 2437b13c
2020-12-14 08:53:48,400Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.
2020-12-14 08:53:48,400Z WARN [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] Fence action failed using proxy host 'kvm1.lab.local', trying another proxy
2020-12-14 08:53:48,485Z ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] Can not run fence action on host 'kvm0.lab.local', no suitable proxy host was found.
2020-12-14 08:53:48,486Z WARN [org.ovirt.engine.core.bll.pm.FenceAgentExecutor] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] Failed to find another proxy to re-run failed fence action, retrying with the same proxy 'kvm1.lab.local'
2020-12-14 08:53:48,582Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host kvm0.lab.local.Internal JSON-RPC error
2020-12-14 08:53:48,582Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] FINISH, FenceVdsVDSCommand, return: FenceOperationResult:{status='ERROR', powerStatus='UNKNOWN', message='Internal JSON-RPC error'}, log id: 8607bc9
2020-12-14 08:53:48,637Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [07c1d540-6d8d-419c-affb-181495d75759] EVENT_ID: FENCE_OPERATION_USING_AGENT_AND_PROXY_FAILED(9,021), Execution of power management status on Host kvm0.lab.local using Proxy Host kvm1.lab.local and Fence Agent fence_xvm:225.0.0.12 failed.
Any idea?
Thanx,
Alex
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/B7IHC4MYY5LJFJMEJMLRRFSTMD7IK23I/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MV3RI22LE4C57R6TUQR5BG3LVZUVWRNX/
--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.