oVirt 4.3 - create windows2012 failed due to ovirt-imageio-proxy
by jingjie.jiang@oracle.com
Hi,
I tried to create windows2012 vm on nfs data domain, but the disk was locked.
Found the error message as following:
Connection to ovirt-imageio-proxy service has failed. Make sure the service is installed, configured, and ovirt-engine certificate is registered as a valid CA in the browser.
Is this known issue?
Thanks,
Jingjie
5 years, 8 months
Change cluster cpu type with hosted engine
by Fabrice SOLER
Hello,
I need to create a windows 10 virtual machine but I have an error :
I have a fresh ovirt installation (version 4.2.8) with an hosted engine.
At the hosted engine installation there was no question about the
cluster cpu type, it should be great if in the future version it could be.
To change an host to another cluster this host need to be in maintenance
mode, and the hosted engine will be power off.
I have created another Cluster with an SandyBridge Family CPU type, but
to move the hosted engine to this new cluster the hosted should be power
off.
Is there someone who can help ?
Sincerely,
--
5 years, 8 months
Re: Ovirt 4.3.1 problem with HA agent
by Strahil
>> >> 1.2 All bricks healed (gluster volume heal data info summary) and no split-brain
>> >
>> >
>> >
>> > gluster volume heal data info
>> >
>> > Brick node-msk-gluster203:/opt/gluster/data
>> > Status: Connected
>> > Number of entries: 0
>> >
>> > Brick node-msk-gluster205:/opt/gluster/data
>> > <gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
>> > <gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
>> > <gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
>> > <gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
>> > <gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
>> > <gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
>> > <gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
>> > Status: Connected
>> > Number of entries: 7
>> >
>> > Brick node-msk-gluster201:/opt/gluster/data
>> > <gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
>> > <gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
>> > <gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
>> > <gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
>> > <gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
>> > <gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
>> > <gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
>> > Status: Connected
>> > Number of entries: 7
>> >
>>
>> Data needs healing.
>> Run: cluster volume heal data full
>
> This does not work.
Yeah, That's because my phone corrects the 'gluster' to 'cluster'
Usually gluster daemons detect need of heal, but with 'gluster volume heal data full && sleep 5 && gluster volume heal data info summary && sleep 5 && gluster volume heal data info summary', you can force syncing and get the result.
Let's see what happens with DNS.
Best Regards,
Strahil Nikolov
5 years, 8 months
Hosted -engine is down and cannot be restarted
by ada per
Hello everyone,
For a strange reason the hosted engine went down and I cannot restart it. I tried manually restarting it without any success can you please advice?
For all the nodes the engine status is the same as the one below.
--== Host nodex. (id: 6) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : nodex
Host ID : 6
Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 323a9f45
local_conf_timestamp : 2648874
Host timestamp : 2648874
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=2648874 (Tue Mar 19 12:25:44 2019)
host-id=6
score=3400
vm_conf_refresh_time=2648874 (Tue Mar 19 12:25:44 2019)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
When I try the commands
root@node5# hosted-engine --vm-shutdown
I ge the response:
root@node5# Command VM.shutdown with args {'delay': '120', 'message': 'VM is shutting down!', 'vmID': 'a492d2eb-1dfd-470d-a141-3e55d2189275'} failed:(code=1, message=Virtual machine does not exist)
But when I run : hosted-engine --vm-start
I get the response: VM exists and is down, cleaning up and restarting
Below you can see the # journalctl -u ovirt-ha-agent logs
Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Unhandled monitoring loop exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 430, in start_monitoring
self._monitoring_loop()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 449, in _monitoring_loop
for old_state, state, delay in self.fsm:
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 127, in next
new_data = self.refresh(self._state.data)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 81, in refresh
stats.update(self.hosted_engine.collect_stats())
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 737, in collect_stats
all_stats = self._broker.get_stats_from_storage()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 143, in get_stats_from_storage
result = self._proxy.get_stats()
File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
verbose=self.__verbose
File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 881, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/httplib.py", line 843, in send
self.connect()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 52, in connect
self.sock.connect(base64.b16decode(self.host))
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring
self.publish(stopped)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 337, in publish
self._push_to_storage(blocks)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 708, in _push_to_storage
self._broker.put_stats_on_storage(self.host_id, blocks)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 113, in put_stats_on_storage
self._proxy.put_stats(host_id, xmlrpclib.Binary(data))
File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
verbose=self.__verbose
File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 881, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/httplib.py", line 843, in send
self.connect()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 52, in connect
self.sock.connect(base64.b16decode(self.host))
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
Mar 14 12:04:42 node7. systemd[1]: ovirt-ha-agent.service: main process exited, code=exited, status=157/n/a
Mar 14 12:04:42 node7. systemd[1]: Unit ovirt-ha-agent.service entered failed state.
Mar 14 12:04:42 node7. systemd[1]: ovirt-ha-agent.service failed.
Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service holdoff time over, scheduling restart.
Mar 14 12:04:52 node7. systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent.
Mar 14 12:04:52 node7. systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors
Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 413, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 537, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 86, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'ping', options: {'addr': '19
Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service: main process exited, code=exited, status=157/n/a
Mar 14 12:04:52 node7. systemd[1]: Unit ovirt-ha-agent.service entered failed state.
Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service failed.
Mar 14 12:05:02 node7. systemd[1]: ovirt-ha-agent.service holdoff time over, scheduling restart.
Mar 14 12:05:02 node7. systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent.
Mar 14 12:05:02 node7. systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Mar 14 12:06:55 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to stop engine vm with /usr/sbin/hosted-engine --vm-poweroff: Co
(code=1, message=Virtual machine does not exist: {'vmId': u'a492d2eb-1dfd-470d-a141-3e55d2189275'})
Mar 14 12:06:55 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to stop engine VM: Command VM.destroy with args {'vmID': 'a492d2
(code=1, message=Virtual machine does not exist: {'vmId': u'a492d2eb-1dfd-470d-a141-3e55d2189275'})
Mar 15 14:28:16 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:28:36 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:29:00 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:29:22 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:29:44 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:30:06 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:30:28 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:30:50 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:31:12 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:31:33 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:31:56 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:32:18 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:32:40 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
Mar 15 14:33:02 node7. ovirt-ha-agent[31822]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
5 years, 8 months
Host unresponsive after upgrade 4.2.8 -> 4.3.2 failed
by Artem Tambovskiy
Hello,
Just started upgrading my small cluster to from 4.2.8 to 4.3.2 and endup in
the situation that one of the hosts is not working after upgrade.
For some reason vdsmd is not starting up, I have tried to restart it
manually with no luck:
Any ideas on what could be the reason?
[root@ovirt2 log]# systemctl restart vdsmd
A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
[root@ovirt2 log]# journalctl -xe
-- Unit ovirt-ha-agent.service has finished shutting down.
Mar 19 15:47:47 ovirt2.domain.org systemd[1]: Starting Virtual Desktop
Server Manager...
-- Subject: Unit vdsmd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit vdsmd.service has begun starting up.
Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm:
Running mkdirs
Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm:
Running configure_coredump
Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm:
Running configure_vdsm_logs
Mar 19 15:47:47 ovirt2.domain.org vdsmd_init_common.sh[56717]: vdsm:
Running wait_for_network
Mar 19 15:47:47 ovirt2.domain.org supervdsmd[56716]: Supervdsm failed to
start: 'module' object has no attribute 'Accounting'
Mar 19 15:47:47 ovirt2.domain.org python2[56716]: detected unhandled Python
exception in '/usr/share/vdsm/supervdsmd'
Mar 19 15:47:48 ovirt2.domain.org abrt-server[56745]: Duplicate: core
backtrace
Mar 19 15:47:48 ovirt2.domain.org abrt-server[56745]: DUP_OF_DIR:
/var/tmp/abrt/Python-2019-03-19-14:23:04-17292
Mar 19 15:47:48 ovirt2.domain.org abrt-server[56745]: Deleting problem
directory Python-2019-03-19-15:47:47-56716 (dup of
Python-2019-03-19-14:23:04-17292
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: Traceback (most
recent call last):
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File
"/usr/share/vdsm/supervdsmd", line 26, in <module>
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]:
supervdsm_server.main(sys.argv[1:])
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File
"/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 294, in
main
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: module_name))
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File
"/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: __import__(name)
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: File
"/usr/lib/python2.7/site-packages/vdsm/supervdsm_api/systemd.py", line 34,
in <module>
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]:
cmdutils.Accounting.CPU,
Mar 19 15:47:49 ovirt2.domain.org daemonAdapter[56716]: AttributeError:
'module' object has no attribute 'Accounting'
Mar 19 15:47:49 ovirt2.domain.org systemd[1]: supervdsmd.service: main
process exited, code=exited, status=1/FAILURE
Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Unit supervdsmd.service
entered failed state.
Mar 19 15:47:49 ovirt2.domain.org systemd[1]: supervdsmd.service failed.
Mar 19 15:47:49 ovirt2.domain.org systemd[1]: supervdsmd.service holdoff
time over, scheduling restart.
Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Cannot add dependency job for
unit lvm2-lvmetad.socket, ignoring: Unit is masked.
Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Stopped Auxiliary vdsm
service for running helper functions as root.
-- Subject: Unit supervdsmd.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit supervdsmd.service has finished shutting down.
Mar 19 15:47:49 ovirt2.domain.org systemd[1]: Started Auxiliary vdsm
service for running helper functions as root.
-- Subject: Unit supervdsmd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit supervdsmd.service has finished starting up.
--
-- The start-up result is done.
Mar 19 15:47:50 ovirt2.domain.org supervdsmd[56757]: Supervdsm failed to
start: 'module' object has no attribute 'Accounting'
Mar 19 15:47:50 ovirt2.domain.org python2[56757]: detected unhandled Python
exception in '/usr/share/vdsm/supervdsmd'
--
Regards,
Artem
5 years, 8 months
Re: Ovirt 4.3.1 problem with HA agent
by Strahil
Hi Alexei,
>> 1.2 All bricks healed (gluster volume heal data info summary) and no split-brain
>
>
>
> gluster volume heal data info
>
> Brick node-msk-gluster203:/opt/gluster/data
> Status: Connected
> Number of entries: 0
>
> Brick node-msk-gluster205:/opt/gluster/data
> <gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
> <gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
> <gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
> <gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
> <gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
> <gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
> <gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
> Status: Connected
> Number of entries: 7
>
> Brick node-msk-gluster201:/opt/gluster/data
> <gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
> <gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
> <gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
> <gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
> <gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
> <gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
> <gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
> Status: Connected
> Number of entries: 7
>
Data needs healing.
Run: cluster volume heal data full
If it still doesn't heal (check in 5 min),go to /rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx_data
And run 'find . -exec stat {}\;' without the quotes.
As I have understood you, ovirt Hosted Engine is running and can be started on all nodes except 1.
>>
>> 2. Go to the problematic host and check the mount point is there
>
>
>
> No mount point on problematic node /rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data
> If I create a mount point manually, it is deleted after the node is activated.
>
> Other nodes can mount this volume without problems. Only this node have connection problems after update.
>
> Here is a part of the log at the time of activation of the node:
>
> vdsm log
>
> 2019-03-18 16:46:00,548+0300 INFO (jsonrpc/5) [vds] Setting Hosted Engine HA local maintenance to False (API:1630)
> 2019-03-18 16:46:00,549+0300 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)
> 2019-03-18 16:46:00,581+0300 INFO (jsonrpc/7) [vdsm.api] START connectStorageServer(domType=7, spUUID=u'5a5cca91-01f8-01af-0297-00000000025f', conList=[{u'id': u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': u'msk-gluster-facility.xxxx:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '********', u'port': u''}], options=None) from=::ffff:10.77.253.210,56630, flow_id=81524ed, task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)
> 2019-03-18 16:46:00,621+0300 INFO (jsonrpc/7) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data' (storageServer:167)
> 2019-03-18 16:46:00,622+0300 INFO (jsonrpc/7) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data mode: None (fileUtils:197)
> 2019-03-18 16:46:00,622+0300 WARN (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.xxxx' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)
This seems very strange. As you have hidden the hostname, I'm not use which on is this.
Check that DNS can be resolved from all hosts and the hostname of this Host is resolvable.
Also check if it in the peer list.
Try to manually mount the cluster volume:
mount -t glusterfs msk-gluster-facility.xxxx:/data /mnt
Is this a second FQDN/IP of this server?
If so, gluster accepts that via gluster peer probe IP2
>> 2.1. Check permissions (should be vdsm:kvm) and fix with chown -R if needed
>> 2.2. Check the OVF_STORE from the logs that it exists
>
>
> How can i do this?
Go to /rhev/data-center/mnt/glusterSD/host_engine and use find inside the domain UUID for files that are not owned by vdsm:KVM.
I usually run 'chown -R vdsm:KVM 823xx-xxxx-yyyy-zzz' and it will fix any misconfiguration.
Best Regards,
Strahil Nikolov
5 years, 8 months
Live migration failed
by Bong Shau Fui
Hi:
I deployed 2 ovirt hosts and an ovirt engine in a nested KVM server. I've a windows vm setup and tried to perform live migration but failed. I checked on the hosts and found them meeting the live migration requirements, or at least that's what I thought. I took the requirement from the below document.
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtuali...
The hosts, both source and destination are quite empty, with only the hosted engine, 1 centos vm and the windows VM in the cluster. I can do a live migration for the centos vm successfully. But when I tried live migration on the hosted-engine vm it failed immediately with a message "No available host to migrate VMs to". When I tried to migrate the windows VM the message box that let me choose the destination host popped up but failed after a while.
I'd like to ask where can I get more information with regards to live-migration apart from /var/log/ovirt-engine/engine.log ? I also checked on the ovirt hosts' /var/log/vdsm/vdsm.log but found nothing pointing to the reason why it failed.
Below is the extract from /var/log/ovirt-engine/engine.log when the live-migration took place
2019-03-12 14:37:58,159+08 INFO [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-131) [] User admin@internal successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2019-03-12 14:37:58,450+08 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [77888830] Lock freed to object 'EngineLock:{exclusiveLocks='[d113be83-2740-4246-a1f2-b9344889c3cf=PROVIDER]', sharedLocks=''}'
2019-03-12 14:38:02,544+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:38:12,677+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-16) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:38:21,650+08 INFO [org.ovirt.engine.core.bll.aaa.SessionDataContainer] (EE-ManagedThreadFactory-engineScheduled-Thread-51) [] Not removing session 'xDiHqqa6l+g8cngM26TTCfW7NeLN3WgWChsx28wUM391vAngSxwtyCkLbQxZR1AbJ5I+2bkPZNQijMUk0jLZcA==', session has running commands for user 'admin@internal-authz'.
2019-03-12 14:38:22,782+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-49) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:38:33,018+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:38:43,261+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [77888830] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:38:53,528+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:39:03,759+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:39:14,011+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:39:21,660+08 INFO [org.ovirt.engine.core.bll.aaa.SessionDataContainer] (EE-ManagedThreadFactory-engineScheduled-Thread-79) [] Not removing session 'xDiHqqa6l+g8cngM26TTCfW7NeLN3WgWChsx28wUM391vAngSxwtyCkLbQxZR1AbJ5I+2bkPZNQijMUk0jLZcA==', session has running commands for user 'admin@internal-authz'.
2019-03-12 14:39:24,122+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-85) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:39:29,773+08 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-131) [7f0cf113-55e8-4def-9c68-de3b91d6d641] Lock Acquired to object 'EngineLock:{exclusiveLocks='[5cad5c5f-5aab-46ec-a28e-d484abc0401d=VM]', sharedLocks=''}'
2019-03-12 14:39:29,887+08 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-131) [7f0cf113-55e8-4def-9c68-de3b91d6d641] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 5cad5c5f-5aab-46ec-a28e-d484abc0401d Type: VMAction group MIGRATE_VM with role type USER
2019-03-12 14:39:30,019+08 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default task-131) [7f0cf113-55e8-4def-9c68-de3b91d6d641] START, MigrateVDSCommand( MigrateVDSCommandParameters:{hostId='f9014bc4-485c-4eb0-a9bc-42d13ed68f41', vmId='5cad5c5f-5aab-46ec-a28e-d484abc0401d', srcHost='host2.xxxx.com', dstVdsId='1bc9b9e9-1e90-4570-9930-08416d1927cc', dstHost='host3.xxxx.com:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='null', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={nam
e=abort, params=[]}}]]', dstQemu='192.168.138.135'}), log id: 7eeb678c
2019-03-12 14:39:30,022+08 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (default task-131) [7f0cf113-55e8-4def-9c68-de3b91d6d641] START, MigrateBrokerVDSCommand(HostName = host2.xxxx.com, MigrateVDSCommandParameters:{hostId='f9014bc4-485c-4eb0-a9bc-42d13ed68f41', vmId='5cad5c5f-5aab-46ec-a28e-d484abc0401d', srcHost='host2.xxxx.com', dstVdsId='1bc9b9e9-1e90-4570-9930-08416d1927cc', dstHost='host3.xxxx.com:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='null', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=set
Downtime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]', dstQemu='192.168.138.135'}), log id: 5cef4981
2019-03-12 14:39:30,039+08 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (default task-131) [7f0cf113-55e8-4def-9c68-de3b91d6d641] FINISH, MigrateBrokerVDSCommand, return: , log id: 5cef4981
2019-03-12 14:39:30,048+08 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default task-131) [7f0cf113-55e8-4def-9c68-de3b91d6d641] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 7eeb678c
2019-03-12 14:39:30,067+08 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-131) [7f0cf113-55e8-4def-9c68-de3b91d6d641] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: Win_2016_1, Source: host2.xxxx.com, Destination: host3, User: admin@internal-authz).
2019-03-12 14:39:33,901+08 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '5cad5c5f-5aab-46ec-a28e-d484abc0401d' was reported as Down on VDS '1bc9b9e9-1e90-4570-9930-08416d1927cc'(host3)
2019-03-12 14:39:33,903+08 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] START, DestroyVDSCommand(HostName = host3, DestroyVmVDSCommandParameters:{hostId='1bc9b9e9-1e90-4570-9930-08416d1927cc', vmId='5cad5c5f-5aab-46ec-a28e-d484abc0401d', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: c853ba5
2019-03-12 14:39:34,211+08 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-73) [] BaseAsyncTask::onTaskEndSuccess: Task '67631cf6-4c75-4681-88ef-fd4af56c0363' (Parent Command 'RemoveDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-03-12 14:39:34,604+08 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] Failed to destroy VM '5cad5c5f-5aab-46ec-a28e-d484abc0401d' because VM does not exist, ignoring
2019-03-12 14:39:34,605+08 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] FINISH, DestroyVDSCommand, return: , log id: c853ba5
2019-03-12 14:39:34,605+08 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '5cad5c5f-5aab-46ec-a28e-d484abc0401d'(Win_2016_1) was unexpectedly detected as 'Down' on VDS '1bc9b9e9-1e90-4570-9930-08416d1927cc'(ohost3) (expected on 'f9014bc4-485c-4eb0-a9bc-42d13ed68f41')
2019-03-12 14:39:34,605+08 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] Migration of VM 'Win_2016_1' to host 'host3' failed: VM destroyed during the startup.
2019-03-12 14:39:34,615+08 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-10) [] VM '5cad5c5f-5aab-46ec-a28e-d484abc0401d'(Win_2016_1) moved from 'MigratingFrom' --> 'Up'
2019-03-12 14:39:34,615+08 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-10) [] Adding VM '5cad5c5f-5aab-46ec-a28e-d484abc0401d'(Win_2016_1) to re-run list
2019-03-12 14:39:34,621+08 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-10) [] Rerun VM '5cad5c5f-5aab-46ec-a28e-d484abc0401d'. Called from VDS 'host2.xxxx.com'
2019-03-12 14:39:34,752+08 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-53959) [] START, MigrateStatusVDSCommand(HostName = host2.xxxx.com, MigrateStatusVDSCommandParameters:{hostId='f9014bc4-485c-4eb0-a9bc-42d13ed68f41', vmId='5cad5c5f-5aab-46ec-a28e-d484abc0401d'}), log id: 7ded4ad7
2019-03-12 14:39:34,760+08 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-53959) [] FINISH, MigrateStatusVDSCommand, return: , log id: 7ded4ad7
2019-03-12 14:39:34,786+08 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-53959) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: Win_2016_1, Source: host2.xxxx.com, Destination: host3).
Any help is greatly appreciated.
regards,
Bong SF
5 years, 8 months
Where to find the hooks' print
by zodaoko@gmail.com
Hi there,
I created a before_set_num_of_cpus hook:
# more /usr/libexec/vdsm/hooks/before_set_num_of_cpus/before.py
#!/usr/bin/python
import os
import sys
if os.environ.has_key('never_existed'):
sys.stderr.write('cantsetcpu: before_cpu_set: cannot set cpu.\n')
sys.exit(2)
else:
sys.stdout.write('hook ok.\n')
sys.exit(0)
But I cannot find the message "hook ok" in engine.log or vdsm.log, where to find it? Thank you very much.
Thank you,
-Zhen
5 years, 8 months
Host affinity hard rule doesn't work
by zodaoko@gmail.com
Hi there,
Here is my setup:
oVirt engine: 4.2.8
1. Create an affinity group as below:
VM affinity rule: positive + enforcing
Host affinity rule: disabled.
VMs: 2 VMs added
Hosts: No host selected.
2. Run the 2 VMs, they are running on the same host, say host1.
3. Change the affinity group's host affinity:
Host affinity rule: positive + enforcing
Hosts: host2 added.
I expect that the 2 VMs can migrate to host2, but that never happen, is this expected?
snippet of engine.log:
2019-03-13 07:47:05,747Z INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] Candidate host 'dub-svrfarm24' ('76b13e75-d01b-4dec-9298-1fad72b46525') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VmAffinityGroups' (correlation id: null)
2019-03-13 07:47:05,747Z DEBUG [org.ovirt.engine.core.bll.scheduling.arem.AffinityRulesEnforcer] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] VM 822b37b7-5da3-453c-b775-d4192c2fdcae is NOT a viable candidate for solving the affinity group violation situation.
2019-03-13 07:47:05,747Z DEBUG [org.ovirt.engine.core.bll.scheduling.arem.AffinityRulesEnforcer] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] No vm to hosts soft-affinity group violation detected
2019-03-13 07:47:05,749Z DEBUG [org.ovirt.engine.core.bll.scheduling.arem.AffinityRulesEnforcer] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] No affinity group collision detected for cluster 8fe88b8c-966c-4c21-839d-e2437cc6b73d. Standing by.
2019-03-13 07:47:05,749Z DEBUG [org.ovirt.engine.core.bll.scheduling.arem.AffinityRulesEnforcer] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] No affinity group collision detected for cluster 3beac2ea-ed04-4f40-9ce3-5a9a67cebd8c. Standing by.
2019-03-13 07:47:05,750Z DEBUG [org.ovirt.engine.core.bll.scheduling.arem.AffinityRulesEnforcer] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [] No affinity group collision detected for cluster da32d154-4303-11e9-9607-00163eaab080. Standing by.
Thank you,
-Zhen
5 years, 8 months