Hi,
cleaning metadata won't help in this case. Try transferring the
spm_ids you got from the engine to the proper hosted engine hosts so
the hosted engine ids match the spm_ids. Then restart all hosted
engine services. I would actually recommend restarting all hosts after
this change, but I have no idea how many VMs you have running.
Martin
On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu(a)gmail.com> wrote:
Tried running a 'hosted-engine --clean-metadata" as per
https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
ovirt-ha-agent was not running anyway, but it fails with the following
error:
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 191, in _run_agent
return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 67, in action_clean
return he.clean(options.force_cleanup)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 345, in clean
self._initialize_domain_monitor()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
occurred, giving up. Please review the log and consider filing a bug.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote:
> Actually, it looks like sanlock problems:
>
> "SanlockInitializationError: Failed to initialize sanlock, the
> number of errors has exceeded the limit"
>
>
>
> On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote:
>> Sorry, I am mistaken, two hosts failed for the agent with the following error:
>>
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Shutting down the agent because of 3 failures in a row!
>>
>> What could cause these timeouts? Some other service not running?
>>
>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote:
>>> Both services are up on all three hosts. The broke logs just report:
>>>
>>> Thread-6549::INFO::2017-06-29
>>>
17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>> Connection established
>>> Thread-6549::INFO::2017-06-29
>>>
17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>> Connection closed
>>>
>>> Thanks,
>>>
>>> Cam
>>>
>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com>
wrote:
>>>> Hi,
>>>>
>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>>>> are restarted and up. The error says the agent can't talk to the
>>>> broker. Is there anything in the broker.log?
>>>>
>>>> Best regards
>>>>
>>>> Martin Sivak
>>>>
>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>> I've restarted those two services across all hosts, have taken
the
>>>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>>>> Hosted Engine over to another host, it reports that all three hosts
>>>>> 'did not satisfy internal filter HA because it is not a Hosted
Engine
>>>>> host'.
>>>>>
>>>>> On the host that the Hosted Engine is currently on it reports in the
agent.log:
>>>>>
>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink
ERROR
>>>>> Connection closed: Connection closed
>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>>>> getting service path: Connection closed
>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most
recent
>>>>> call last):
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 191, in _run_agent
>>>>> return
action(he)
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 64, in action_proper
>>>>> return
>>>>> he.start_monitoring()
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 411, in start_monitoring
>>>>>
self._initialize_sanlock()
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 691, in _initialize_sanlock
>>>>>
>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>> line 162, in get_service_path
>>>>>
.format(str(e)))
>>>>> RequestError:
Failed
>>>>> to get service path: Connection closed
>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart
agent
>>>>>
>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker
services.
>>>>>>
>>>>>> The scheduling message just means that the host has score 0 or is
not
>>>>>> reporting score at all.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>>> Thanks Martin, do I have to restart anything? When I try to
use the
>>>>>>> 'migrate' operation, it complains that the other two
hosts 'did not
>>>>>>> satisfy internal filter HA because it is not a Hosted Engine
host..'
>>>>>>> (even though I reinstalled both these hosts with the
'deploy hosted
>>>>>>> engine' option, which suggests that something needs
restarting. Should
>>>>>>> I worry about the sanlock errors, or will that be resolved by
the
>>>>>>> change in host_id?
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Cam
>>>>>>>
>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>> Change the ids so they are distinct. I need to check if
there is a way
>>>>>>>> to read the SPM ids from the engine as using the same
numbers would be
>>>>>>>> the best.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>> Is there any way of recovering from this situation?
I'd prefer to fix
>>>>>>>>> the issue rather than re-deploy, but if there is no
recovery path, I
>>>>>>>>> could perhaps try re-deploying the hosted engine. In
which case, would
>>>>>>>>> the best option be to take a backup of the Hosted
Engine, and then
>>>>>>>>> shut it down, re-initialise the SAN partition (or use
another
>>>>>>>>> partition) and retry the deployment? Would it be
better to use the
>>>>>>>>> older backup from the bare metal engine that I
originally used, or use
>>>>>>>>> a backup from the Hosted Engine? I'm not sure if
any VMs have been
>>>>>>>>> added since switching to Hosted Engine.
>>>>>>>>>
>>>>>>>>> Unfortunately I have very little time left to get
this working before
>>>>>>>>> I have to hand it over for eval (by end of Friday).
>>>>>>>>>
>>>>>>>>> Here are some log snippets from the cluster that are
current
>>>>>>>>>
>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the
Hosted Engine:
>>>>>>>>>
>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b)
[storage.SANLock]
>>>>>>>>> Acquiring host id for domain
207221b2-959b-426b-b945-18e1adfed62f (id:
>>>>>>>>> 3) (clusterlock:282)
>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b)
[storage.Monitor]
>>>>>>>>> Error acquiring host id 3 for domain
>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>> File
"/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
>>>>>>>>> self.domain.acquireHostId(self.hostId,
async=True)
>>>>>>>>> File "/usr/share/vdsm/storage/sd.py",
line 790, in acquireHostId
>>>>>>>>> self._manifest.acquireHostId(hostId, async)
>>>>>>>>> File "/usr/share/vdsm/storage/sd.py",
line 449, in acquireHostId
>>>>>>>>> self._domainLock.acquireHostId(hostId, async)
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>>>>>>>> line 297, in acquireHostId
>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f',
SanlockException(22, 'Sanlock
>>>>>>>>> lockspace add failure', 'Invalid
argument'))
>>>>>>>>>
>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the
same host:
>>>>>>>>>
>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>
13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>> Failed to start monitoring domain
>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>>> during domain acquisition
>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>
13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>> Error while monitoring engine: Failed to start
monitoring domain
>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>>> during domain acquisition
>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>
13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>> Unexpected error
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>> line 443, in start_monitoring
>>>>>>>>> self._initialize_domain_monitor()
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>> raise Exception(msg)
>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>>> during domain acquisition
>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>
13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>> Shutting down the agent because of 3 failures in a
row!
>>>>>>>>>
>>>>>>>>> From sanlock.log:
>>>>>>>>>
>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]:
add_lockspace
>>>>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>> conflicts with name of list1 s5
>>>>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>
>>>>>>>>> From the two other hosts:
>>>>>>>>>
>>>>>>>>> host 2:
>>>>>>>>>
>>>>>>>>> vdsm.log
>>>>>>>>>
>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4)
[jsonrpc.JsonRpcServer]
>>>>>>>>> Internal server error (__init__:570)
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>>>>>>>>> 565, in _handle_request
>>>>>>>>> res = method(**params)
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>>>>>>>>> 202, in _dynamicMethod
>>>>>>>>> result = fn(*methodArgs)
>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454,
in getAllVmIoTunePolicies
>>>>>>>>> io_tune_policies_dict =
self._cif.getAllVmIoTunePolicies()
>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line
448, in getAllVmIoTunePolicies
>>>>>>>>> 'current_values': v.getIoTune()}
>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line
2803, in getIoTune
>>>>>>>>> result = self.getIoTuneResponse()
>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line
2816, in getIoTuneResponse
>>>>>>>>> res = self._dom.blockIoTune(
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
>>>>>>>>> 47, in __getattr__
>>>>>>>>> % self.vmid)
>>>>>>>>> NotConnectedError: VM
u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
>>>>>>>>> started yet or was shut down
>>>>>>>>>
>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>>>>
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>>>>>>> Found OVF_STORE:
imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>> OVF_STORE volume path:
>>>>>>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>> Found an OVF for HE VM, trying to convert
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>> Got vm.conf from OVF_STORE
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun
29 10:53:59 2017
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>
10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>>>>> Reloading vm.conf from the shared storage domain
>>>>>>>>>
>>>>>>>>> /var/log/messages:
>>>>>>>>>
>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl
80306d02 to a partition!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> host 1:
>>>>>>>>>
>>>>>>>>> /var/log/messages also in sanlock.log
>>>>>>>>>
>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29
11:01:02+0100
>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1
2 1193177
>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29
11:01:03+0100
>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262
>>>>>>>>>
>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log:
>>>>>>>>>
>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>
15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>> Failed to start monitoring domain
>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>>> during domain acquisition
>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>
15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>> Error while monitoring engine: Failed to start
monitoring domain
>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>>> during domain acquisition
>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>
15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>> Unexpected error
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>> line 443, in start_monitoring
>>>>>>>>> self._initialize_domain_monitor()
>>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>> raise Exception(msg)
>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>>> during domain acquisition
>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>
15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>> Shutting down the agent because of 3 failures in a
row!
>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>
15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>>>>> VDSM domain monitor status: PENDING
>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>
15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>>>>> Failed to stop monitoring domain
>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f):
Storage domain is
>>>>>>>>> member of pool:
u'domain=207221b2-959b-426b-b945-18e1adfed62f'
>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>
15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>>>> Agent shutting down
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for any help,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cam
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>> Hi Martin,
>>>>>>>>>>
>>>>>>>>>> yes, on two of the machines they have the same
host_id. The other has
>>>>>>>>>> a different host_id.
>>>>>>>>>>
>>>>>>>>>> To update since yesterday: I reinstalled and
deployed Hosted Engine on
>>>>>>>>>> the other host (so all three hosts in the cluster
now have it
>>>>>>>>>> installed). The second one I deployed said it was
able to host the
>>>>>>>>>> engine (unlike the first I reinstalled), so I
tried putting the host
>>>>>>>>>> with the Hosted Engine on it into maintenance to
see if it would
>>>>>>>>>> migrate over. It managed to move all hosts but
the Hosted Engine. And
>>>>>>>>>> now the host that said it was able to host the
engine says
>>>>>>>>>> 'unavailable due to HA score'. The host
that it was trying to move
>>>>>>>>>> from is now in 'preparing for
maintenance' for the last 12 hours.
>>>>>>>>>>
>>>>>>>>>> The summary is:
>>>>>>>>>>
>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted
Engine hosts, reinstalled
>>>>>>>>>> with 'Deploy Hosted Engine'. No icon
saying it can host the Hosted
>>>>>>>>>> Hngine, host_id of '2' in
/etc/ovirt-hosted-engine/hosted-engine.conf.
>>>>>>>>>> 'add_lockspace' fails in sanlock.log
>>>>>>>>>>
>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing
before Hosted Engine
>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted
Engine'. Had an icon
>>>>>>>>>> saying that it was able to host the Hosted
Engine, but after migration
>>>>>>>>>> was attempted when putting kvm-ldn-03 into
maintenance, it reports:
>>>>>>>>>> 'unavailable due to HA score'. It has a
host_id of '1' in
>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No
errors in sanlock.log
>>>>>>>>>>
>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted
Engine on, which was
>>>>>>>>>> not part of the original cluster. I restored the
bare-metal engine
>>>>>>>>>> backup in the Hosted Engine on this host when
deploying it, without
>>>>>>>>>> error. It currently has the Hosted Engine on it
(as the only VM after
>>>>>>>>>> I put that host into maintenance to test the HA
of Hosted Engine).
>>>>>>>>>> Sanlock log shows conflicts
>>>>>>>>>>
>>>>>>>>>> I will look through all the logs for any other
errors. Please let me
>>>>>>>>>> know if you need any logs or other
clarification/information.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Campbell
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> can you please check the contents of
>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf
or
>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am
not sure which one it is
>>>>>>>>>>> right now) and search for host-id?
>>>>>>>>>>>
>>>>>>>>>>> Make sure the IDs are different. If they are
not, then there is a bug somewhere.
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>> I see this on the host it is trying to
migrate in /var/log/sanlock:
>>>>>>>>>>>>
>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]:
s3528 lockspace
>>>>>>>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]:
s3528 delta_acquire host_id 1
>>>>>>>>>>>> busy1 1 2 1042692
3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]:
s3528 add_lockspace fail result -262
>>>>>>>>>>>>
>>>>>>>>>>>> The sanlock service is running. Why would
this occur?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> C
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the reply. I have done
this, and the deployment completed
>>>>>>>>>>>>> without error. However, it still will
not allow the Hosted Engine
>>>>>>>>>>>>> migrate to another host. The
>>>>>>>>>>>>>
/etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>>>>>>>>>>>>> I re-installed, but the
ovirt-ha-broker.service, though it starts,
>>>>>>>>>>>>> reports:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
--------------------8<-------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01
systemd[1]: Starting oVirt Hosted Engine
>>>>>>>>>>>>> High Availability Communications
Broker...
>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01
ovirt-ha-broker[6101]: ovirt-ha-broker
>>>>>>>>>>>>>
ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>>>>>>>>>>>> Failed to read metadata from
>>>>>>>>>>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>>>>>>>>>>
Traceback (most
>>>>>>>>>>>>> recent call last):
>>>>>>>>>>>>>
File
>>>>>>>>>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>>>>>>>> line 129, in
get_raw_stats_for_service_type
>>>>>>>>>>>>>
f =
>>>>>>>>>>>>> os.open(path, direct_flag |
os.O_RDONLY | os.O_SYNC)
>>>>>>>>>>>>>
OSError: [Errno 2]
>>>>>>>>>>>>> No such file or directory:
>>>>>>>>>>>>>
'/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>>>>>>>>>
>>>>>>>>>>>>>
--------------------8<-------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> I checked the path, and it exists. I
can run 'less -f' on it fine. The
>>>>>>>>>>>>> perms are slightly different on the
host that is running the VM vs the
>>>>>>>>>>>>> one that is reporting errors (600 vs
660), ownership is vdsm:qemu. Is
>>>>>>>>>>>>> this a san locking issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM,
Martin Sivak <msivak(a)redhat.com> wrote:
>>>>>>>>>>>>>>> Should it be? It was not in
the instructions for the migration from
>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The hosted engine will only
migrate to hosts that have the services
>>>>>>>>>>>>>> running. Please put one other
host to maintenance and select Hosted
>>>>>>>>>>>>>> engine action: DEPLOY in the
reinstall dialog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Martin Sivak
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM,
cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>> I changed the
'os.other.devices.display.protocols.value.3.6 =
>>>>>>>>>>>>>>>
spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
>>>>>>>>>>>>>>> as 4 and the hosted engine
now appears in the list of VMs. I am
>>>>>>>>>>>>>>> guessing the compatibility
version was causing it to use the 3.6
>>>>>>>>>>>>>>> version. However, I am still
unable to migrate the engine VM to
>>>>>>>>>>>>>>> another host. When I try
putting the host it is currently on into
>>>>>>>>>>>>>>> maintenance, it reports:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Error while executing action:
Cannot switch the Host(s) to Maintenance mode.
>>>>>>>>>>>>>>> There are no available hosts
capable of running the engine VM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Running 'hosted-engine
--vm-status' still shows 'Engine status:
>>>>>>>>>>>>>>> unknown stale-data'.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The ovirt-ha-broker service
is only running on one host. It was set to
>>>>>>>>>>>>>>> 'disabled' in
systemd. It won't start as there is no
>>>>>>>>>>>>>>>
/etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
>>>>>>>>>>>>>>> Should it be? It was not in
the instructions for the migration from
>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07
PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi Tomas,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So in my
/usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>>>>>>>>>>>>>>>> engine VM, I have:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>
os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That seems to match - I
assume since this is 4.1, the 3.6 should not apply
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there somewhere else I
should be looking?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at
11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017
at 12:38 PM, Michal Skrivanek
>>>>>>>>>>>>>>>>>
<michal.skrivanek(a)redhat.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > On 22 Jun
2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Tomas, what
fields are needed in a VM to pass the check that causes
>>>>>>>>>>>>>>>>>> > the
following error?
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>
>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>>>
>>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>
>>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> to match the OS
and VM Display type;-)
>>>>>>>>>>>>>>>>>> Configuration is
in osinfo….e.g. if that is import from older releases on
>>>>>>>>>>>>>>>>>> Linux this is
typically caused by the cahgen of cirrus to vga for non-SPICE
>>>>>>>>>>>>>>>>>> VMs
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yep, the default
supported combinations for 4.0+ is this:
>>>>>>>>>>>>>>>>>
os.other.devices.display.protocols.value =
>>>>>>>>>>>>>>>>>
spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks.
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > On Thu, Jun
22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >> Hi
Martin,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> just
as a random comment, do you still have the database backup from
>>>>>>>>>>>>>>>>>> >>> the
bare metal -> VM attempt? It might be possible to just try again
>>>>>>>>>>>>>>>>>> >>>
using it. Or in the worst case.. update the offending value there
>>>>>>>>>>>>>>>>>> >>>
before restoring it to the new engine instance.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> I still
have the backup. I'd rather do the latter, as re-running the
>>>>>>>>>>>>>>>>>> >> HE
deployment is quite lengthy and involved (I have to re-initialise
>>>>>>>>>>>>>>>>>> >> the FC
storage each time). Do you know what the offending value(s)
>>>>>>>>>>>>>>>>>> >> would
be? Would it be in the Postgres DB or in a config file
>>>>>>>>>>>>>>>>>> >>
somewhere?
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Cheers,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Cam
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>>
Regards
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>>
Martin Sivak
>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>> >>> On
Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >>>>
Hi Yanir,
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>>
Thanks for the reply.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>
>>>>> First of all, maybe a chain reaction of :
>>>>>>>>>>>>>>>>>>
>>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>
>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>>>
>>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>
>>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>
>>>>> is causing the hosted engine vm not to be set up correctly and
>>>>>>>>>>>>>>>>>>
>>>>> further
>>>>>>>>>>>>>>>>>>
>>>>> actions were made when the hosted engine vm wasnt in a stable state.
>>>>>>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>>>>>>>
>>>>> As for now, are you trying to revert back to a previous/initial
>>>>>>>>>>>>>>>>>>
>>>>> state ?
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>>
I'm not trying to revert it to a previous state for now. This was a
>>>>>>>>>>>>>>>>>> >>>>
migration from a bare metal engine, and it didn't report any error
>>>>>>>>>>>>>>>>>> >>>>
during the migration. I'd had some problems on my first attempts at
>>>>>>>>>>>>>>>>>> >>>>
this migration, whereby it never completed (due to a proxy issue) but
>>>>>>>>>>>>>>>>>> >>>>
I managed to resolve this. Do you know of a way to get the Hosted
>>>>>>>>>>>>>>>>>> >>>>
Engine VM into a stable state, without rebuilding the entire cluster
>>>>>>>>>>>>>>>>>> >>>>
from scratch (since I have a lot of VMs on it)?
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>>
Thanks for any help.
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>>
Regards,
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>> >>>>
Cam
>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>
>>>>> Regards,
>>>>>>>>>>>>>>>>>>
>>>>> Yanir
>>>>>>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>>>>>>>
>>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>> Hi Jenny/Martin,
>>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>> Any idea what I can do here? The hosted engine VM has no log on
any
>>>>>>>>>>>>>>>>>>
>>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put
the
>>>>>>>>>>>>>>>>>>
>>>>>> host into maintenance, e.g., to upgrade it that I created it on
>>>>>>>>>>>>>>>>>>
>>>>>> (which
>>>>>>>>>>>>>>>>>>
>>>>>> I think is hosting it), or if it fails for any reason, it
won't get
>>>>>>>>>>>>>>>>>>
>>>>>> migrated to another host, and I will not be able to manage the
>>>>>>>>>>>>>>>>>>
>>>>>> cluster. It seems to be a very dangerous position to be in.
>>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>> Cam
>>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>> Thanks Martin. The hosts are all part of the same cluster.
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> I get these errors in the engine.log on the engine:
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> 2017-06-19 03:28:05,030Z WARN
>>>>>>>>>>>>>>>>>>
>>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>
>>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>>>
>>>>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>
>>>>>>> failed for user SYST
>>>>>>>>>>>>>>>>>>
>>>>>>> EM. Reasons:
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>
VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>
>>>>>>> 2017-06-19 03:28:05,030Z INFO
>>>>>>>>>>>>>>>>>>
>>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>
>>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>>>>>>>>>>>>>>>>>
>>>>>>> 'EngineLock:{exclusiveLocks='[a
>>>>>>>>>>>>>>>>>>
>>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>>>>>>>>>>
>>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName
HostedEngine>,
>>>>>>>>>>>>>>>>>>
>>>>>>> HostedEngine=<VM_NAME,
ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>>>>>>>>>>
>>>>>>> sharedLocks=
>>>>>>>>>>>>>>>>>>
>>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>>>>>>>>>>
>>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName
HostedEngine>]'}'
>>>>>>>>>>>>>>>>>>
>>>>>>> 2017-06-19 03:28:05,030Z ERROR
>>>>>>>>>>>>>>>>>>
>>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>>>>>>>>>>
>>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the
Hosted
>>>>>>>>>>>>>>>>>>
>>>>>>> Engine VM
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> The sanlock.log reports conflicts on that same host, and a
>>>>>>>>>>>>>>>>>>
>>>>>>> different
>>>>>>>>>>>>>>>>>>
>>>>>>> error on the other hosts, not sure if they are related.
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on
the
>>>>>>>>>>>>>>>>>>
>>>>>>> host
>>>>>>>>>>>>>>>>>>
>>>>>>> which I deployed the hosted engine VM on:
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>
13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>>>>>
>>>>>>> Unable to extract HEVM OVF
>>>>>>>>>>>>>>>>>>
>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>
13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>>>>>
>>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling
back
>>>>>>>>>>>>>>>>>>
>>>>>>> to
>>>>>>>>>>>>>>>>>>
>>>>>>> initial vm.conf
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> I've seen some of these issues reported in bugzilla, but
they were
>>>>>>>>>>>>>>>>>>
>>>>>>> for
>>>>>>>>>>>>>>>>>>
>>>>>>> older versions of oVirt (and appear to be resolved).
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> I will install that package on the other two hosts, for which
I
>>>>>>>>>>>>>>>>>>
>>>>>>> will
>>>>>>>>>>>>>>>>>>
>>>>>>> put them in maintenance as vdsm is installed as an upgrade.
I
>>>>>>>>>>>>>>>>>>
>>>>>>> guess
>>>>>>>>>>>>>>>>>>
>>>>>>> restarting vdsm is a good idea after that?
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> Campbell
>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak
<msivak(a)redhat.com>
>>>>>>>>>>>>>>>>>>
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>> you do not have to install it on all hosts. But you
should have
>>>>>>>>>>>>>>>>>>
>>>>>>>> more
>>>>>>>>>>>>>>>>>>
>>>>>>>> than one and ideally all hosted engine enabled nodes
should
>>>>>>>>>>>>>>>>>>
>>>>>>>> belong to
>>>>>>>>>>>>>>>>>>
>>>>>>>> the same engine cluster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>> Does ovirt-hosted-engine-ha need to be installed
across all
>>>>>>>>>>>>>>>>>>
>>>>>>>>> hosts?
>>>>>>>>>>>>>>>>>>
>>>>>>>>> Could that be the reason it is failing to see it
properly?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> Logs are attached. I can see errors in there, but
am unsure how
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> arose.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> Campbell
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> <etokar(a)redhat.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> From the output it looks like the agent is
down, try starting
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> it by
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> running:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> systemctl start ovirt-ha-agent.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> The engine is supposed to see the hosted
engine storage domain
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> import it
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> to the system, then it should import the
hosted engine vm.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> Can you attach the agent log from the host
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> and the engine log from the engine vm
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> (/var/log/ovirt-engine/engine.log)?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> Jenny
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc
<iucounu(a)gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What version are you running?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> 4.1.2.2-1.el7.centos
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> For the hosted engine vm to be
imported and displayed in the
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> engine, you
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> must first create a master storage
domain.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> To provide a bit more detail: this was a
migration of a
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> bare-metal
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> engine in an existing cluster to a hosted
engine VM for that
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> As part of this migration, I built an
entirely new host and
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> ran
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> 'hosted-engine --deploy'
(followed these instructions:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> I restored the backup from the engine and
it completed
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> without any
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> errors. I didn't see any instructions
regarding a master
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> domain in the page above. The cluster has
two existing master
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> domains, one is fibre channel, which is
up, and one ISO
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> domain,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> is currently offline.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you mean the hosted engine
commands are failing?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> you run hosted-engine --vm-status
now?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Interestingly, whereas when I ran it
before, it exited with
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> output
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> and a return code of '1', it now
reports:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> --== Host 1 status ==--
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> conf_on_shared_storage :
True
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Status up-to-date :
False
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Hostname :
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Host ID : 1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Engine status :
unknown stale-data
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Score : 0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> stopped :
True
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Local maintenance :
False
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> crc32 :
0217f07b
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> local_conf_timestamp :
2911
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Host timestamp :
2897
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Extra metadata (valid at timestamp):
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> metadata_parse_version=1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> metadata_feature_version=1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> timestamp=2897 (Thu Jun 15
16:22:54 2017)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> host-id=1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> score=0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun
15 16:23:08 2017)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> conf_on_shared_storage=True
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> maintenance=False
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> state=AgentStopped
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> stopped=True
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Yet I can login to the web GUI fine. I
guess it is not HA due
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> in an unknown state currently? Does the
hosted-engine-ha rpm
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> be installed across all nodes in the
cluster, btw?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc
<iucounu(a)gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've migrated from a
bare-metal engine to a hosted engine.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> no errors during the install,
however, the hosted engine
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> did not
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> on the host I deployed it on, and
it returns nothing (exit
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is 1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> however). I could not ping it
either. So I tried starting
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it via
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 'hosted-engine
--vm-start' and it returned:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But it then became available. I
logged into it
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> successfully. It
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas why the hosted-engine
commands fail, and why it
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>
>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>
>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>>>
>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>
>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>
>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>>>
>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>>>>>>> >
_______________________________________________
>>>>>>>>>>>>>>>>>> > Users
mailing list
>>>>>>>>>>>>>>>>>> >
Users(a)ovirt.org
>>>>>>>>>>>>>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>