Ovirt-ha-agent is logging following:
MainThread::INFO::2019-08-19
15:26:13,302::hosted_engine::512::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineDown (score: 3400)
MainThread::INFO::2019-08-19
15:26:13,303::hosted_engine::520::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Best remote host
ovirt-sj-01.ictv.com (id: 1, score: 3400)
MainThread::INFO::2019-08-19
15:26:13,314::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Global metadata: {'maintenance': False}
MainThread::INFO::2019-08-19
15:26:13,314::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host
ovirt-sj-01.ictv.com (id 1): {'conf_on_shared_storage': True,
'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4777
(Mon Aug 19 15:19:31 2019)\nhost-id=1\nscore=3400\nvm_conf_refresh_time=4531 (Mon Aug 19
15:15:26
2019)\nconf_on_shared_storage=True\nmaintenance=False\nstate=EngineDown\nstopped=False\n',
'hostname': 'ovirt-sj-01.ictv.com', 'alive': True,
'host-id': 1, 'engine-status': {'reason': 'bad vm status',
'health': 'bad', 'vm': 'down', 'detail':
'Down'}, 'score': 3400, 'stopped': False, 'maintenance':
False, 'crc32': 'b1380c25', 'local_conf_timestamp': 4531,
'host-ts': 4777}
MainThread::INFO::2019-08-19
15:26:13,314::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Local (id 2): {'engine-health': {'reason': 'vm not running on this
host', 'health': 'bad', 'vm': 'down',
'detail': 'unknown'}, 'bridge': True, 'network': 1.0,
'mem-free': 191320.0, 'maintenance': False, 'cpu-load': 0.0049,
'storage-domain': True}
MainThread::INFO::2019-08-19
15:26:13,314::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine down and local host has best score (3400), attempting to start engine VM
MainThread::INFO::2019-08-19
15:26:13,370::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineStart) sent? sent
MainThread::ERROR::2019-08-19
15:26:13,498::config_ovf::42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store)
Failed scanning for OVF_STORE due to Command Volume.getInfo with args
{'storagepoolID': '00000000-0000-0000-0000-000000000000',
'storagedomainID': 'a03fb743-8004-4d54-823b-9be470a0e87b',
'volumeID': u'3f3ee39f-f687-4586-87bd-e5188958863a', 'imageID':
u'8c9279b7-0321-49c9-bdd5-4bb94d863960'} failed:
(code=100, message=(13, 'Sanlock resource read failure', 'Permission
denied'))
MainThread::ERROR::2019-08-19
15:26:13,498::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store)
Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure
you already added your first data domain for regular VMs
Both domains are present:
10.210.13.64:/hosted_engine on /rhev/data-center/mnt/10.210.13.64:_hosted__engine type
nfs4
(rw,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=10.210.13.12,local_lock=none,addr=10.210.13.64)
10.210.13.64:/ovirt_production on /rhev/data-center/mnt/10.210.13.64:_ovirt__production
type nfs4
(rw,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.210.13.12,local_lock=none,addr=10.210.13.64)
Note: the IP 10.20.20.40 used in first email was just an example.
— — —
Met vriendelijke groet / Kind regards,
Marko Vrgotic
From: "Vrgotic, Marko" <M.Vrgotic(a)activevideo.com>
Date: Monday, 19 August 2019 at 17:19
To: "users(a)ovirt.org" <users(a)ovirt.org>
Subject: Re: Issues with oVirt-Engine start - oVirt 4.3.4
Additionally,
When agent tries to boot up the engine, I am able to get following status:
[root@ovirt-sj-02 images]# hosted-engine --vm-status
--== Host
ovirt-sj-01.ictv.com (id: 1) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname :
ovirt-sj-01.ictv.com
Host ID : 1
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : f4f95c83
local_conf_timestamp : 4285
Host timestamp : 4285
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=4285 (Mon Aug 19 15:11:19 2019)
host-id=1
score=3400
vm_conf_refresh_time=4285 (Mon Aug 19 15:11:20 2019)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
--== Host
ovirt-sj-02.ictv.com (id: 2) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname :
ovirt-sj-02.ictv.com
Host ID : 2
Engine status : {"reason": "bad vm status",
"health": "bad", "vm": "down", "detail":
"Down"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : c2669fe8
local_conf_timestamp : 4153
Host timestamp : 4153
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=4153 (Mon Aug 19 15:09:47 2019)
host-id=2
score=3400
vm_conf_refresh_time=4153 (Mon Aug 19 15:09:47 2019)
conf_on_shared_storage=True
maintenance=False
state=EngineStart
stopped=False
From: "Vrgotic, Marko" <M.Vrgotic(a)activevideo.com>
Date: Monday, 19 August 2019 at 17:17
To: "users(a)ovirt.org" <users(a)ovirt.org>
Subject: Issues with oVirt-Engine start - oVirt 4.3.4
Dear oVirt,
While working on a procedure to get the NFS v4 mount from Netapp, working on oVIrt,
following steps came out to be the way to go in regards of setting it up for oVIrt SHE and
VM Guests:
* mkdir /mnt/rhevstore
* mount -t nfs 10.20.30.40:/ovirt_hosted_engine /mnt/rhevstore
* chown -R 36.36 /mnt/rhevstore
* chmod -R 755 /mnt/rhevstore
* umount /mnt/rhevstore
This works fine, and it needs to be executed on each Hypervisor, before its provisioned
into oVirt.
However, just today I have discovered that command chmod -R 755 /mnt/rhevstore, if
executed on new to be added Hypervisor, after oVirt is already running, it brings the
oVirt Engine into broken state.
The moment I executed the above on 3rd Hypervisor, before provisioning it into oVirt,
following occurred:
* Engine threw following error:
* 2019-08-19 13:16:31,425Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-82) [] Failed in 'SpmStatusVDS'
method
* Connection was lost:
* packet_write_wait: Connection to 10.210.11.10 port 22: Broken pipe
* And VDSM on SHE Hosting hypervisor started logging errors like:
* 2019-08-19 15:00:52,340+0000 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call
Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
* 2019-08-19 15:00:53,865+0000 WARN (vdsm.Scheduler) [Executor] Worker blocked:
<Worker name=periodic/2 running <Task <Operation
action=<vdsm.virt.sampling.HostMonitor object at 0x7f59442c3d50> at
0x7f59442c3b90> timeout=15, duration=225.00 at 0x7f592476df90> task#=578 at
0x7f59442ef910>, traceback:
* File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
* self.__bootstrap_inner()
* File: "/usr/lib64/python2.7/threading.py", line 812, in
__bootstrap_inner
* self.run()
* File: "/usr/lib64/python2.7/threading.py", line 765, in run
* self.__target(*self.__args, **self.__kwargs)
* File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py",
line 195, in run
* ret = func(*args, **kwargs)
* File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in
_run
* self._execute_task()
* File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in
_execute_task
* task()
* File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
__call__
* self._callable()
* File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line
186, in __call__
* self._func()
* File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
481, in __call__
* stats = hostapi.get_stats(self._cif, self._samples.stats())
* File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 79, in
get_stats
* ret['haStats'] = _getHaInfo()
* File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 177, in
_getHaInfo
* stats = instance.get_all_stats()
* File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line
94, in get_all_stats
* stats = broker.get_stats_from_storage()
* File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 143, in get_stats_from_storage
* result = self._proxy.get_stats()
* File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
* return self.__send(self.__name, args)
* File: "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
* verbose=self.__verbose
* File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
* return self.single_request(host, handler, request_body, verbose)
* File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in
single_request
* response = h.getresponse(buffering=True)
* File: "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
* response.begin()
* File: "/usr/lib64/python2.7/httplib.py", line 444, in begin
* version, status, reason = self._read_status()
* File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
* line = self.fp.readline(_MAXLINE + 1)
* File: "/usr/lib64/python2.7/socket.py", line 476, in readline
* data = self._sock.recv(self._rbufsize) (executor:363)
* 2019-08-19 15:00:54,103+0000 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call
Host.ping2 succeeded in 0.00 seconds (__init__:312)
I am unable to boot the Engine VM – it end up in Status ForceStop
Hosted-engine –vm-status shows:
[root@ovirt-sj-02 ~]# hosted-engine --vm-status
The hosted engine configuration has not been retrieved from shared storage. Please ensure
that ovirt-ha-agent is running and the storage server is reachable.
But, storage is mounted and reachable, as well as ovirt-ha-agent running:
[root@ovirt-sj-02 ~]# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset:
disabled)
Active: active (running) since Mon 2019-08-19 14:57:07 UTC; 23s ago
Main PID: 43388 (ovirt-ha-agent)
Tasks: 2
CGroup: /system.slice/ovirt-ha-agent.service
└─43388 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Can somebody help me with what to do?
— — —
Met vriendelijke groet / Kind regards,
Marko Vrgotic