[ovirt-users] HostedEngine VM not visible, but running

Fri Jun 30 15:01:25 UTC 2017

I've had no other choice but to power up the old bare metal engine to
be able to start the VMs. This is probably really bad but I had to get
the VMs running.
I am guessing now that if the host is shutdown rather than simply
rebooted, that the VMs will not restart on powerup of the host. This
would not have been such a problem if the Hosted Engine started.

So I'm not sure where to go from here...

I guess it is start from scratch again?

On Fri, Jun 30, 2017 at 3:19 PM, cmc <iucounu at gmail.com> wrote:
> Help! I put the cluster into global maintenance, then powered off and
> then on all of the nodes I have powered off and powered on all the
> nodes. I have taken it out of global maintenance. No VM has started,
> including the hosted engine. This is very bad. I am going to look
> through logs to see why nothing has started. Help greatly appreciated.
>
> Thanks,
>
> Cam
>
> On Fri, Jun 30, 2017 at 1:00 PM, cmc <iucounu at gmail.com> wrote:
>> So I can run from any node: hosted-engine --set-maintenance
>> --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This
>> shouldn't affect the running of any VMs, correct? Sorry for the
>> questions, just want to do it correctly and not make assumptions :)
>>
>> Cheers,
>>
>> C
>>
>> On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <msivak at redhat.com> wrote:
>>> Hi,
>>>
>>>> Just to clarify: you mean the host_id in
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>>>> correct?
>>>
>>> Exactly.
>>>
>>> Put the cluster to global maintenance first. Or kill all agents (has
>>> the same effect).
>>>
>>> Martin
>>>
>>> On Fri, Jun 30, 2017 at 12:47 PM, cmc <iucounu at gmail.com> wrote:
>>>> Just to clarify: you mean the host_id in
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>>>> correct?
>>>>
>>>> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak at redhat.com> wrote:
>>>>> Hi,
>>>>>
>>>>> cleaning metadata won't help in this case. Try transferring the
>>>>> spm_ids you got from the engine to the proper hosted engine hosts so
>>>>> the hosted engine ids match the spm_ids. Then restart all hosted
>>>>> engine services. I would actually recommend restarting all hosts after
>>>>> this change, but I have no idea how many VMs you have running.
>>>>>
>>>>> Martin
>>>>>
>>>>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu at gmail.com> wrote:
>>>>>> Tried running a 'hosted-engine --clean-metadata" as per
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
>>>>>> ovirt-ha-agent was not running anyway, but it fails with the following
>>>>>> error:
>>>>>>
>>>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
>>>>>> to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>> during domain acquisition
>>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
>>>>>> call last):
>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>> line 191, in _run_agent
>>>>>>     return action(he)
>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>> line 67, in action_clean
>>>>>>     return he.clean(options.force_cleanup)
>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 345, in clean
>>>>>>     self._initialize_domain_monitor()
>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 823, in _initialize_domain_monitor
>>>>>>     raise Exception(msg)
>>>>>> Exception: Failed to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>> during domain acquisition
>>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
>>>>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
>>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
>>>>>> occurred, giving up. Please review the log and consider filing a bug.
>>>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>>>>>
>>>>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>> Actually, it looks like sanlock problems:
>>>>>>>
>>>>>>>    "SanlockInitializationError: Failed to initialize sanlock, the
>>>>>>> number of errors has exceeded the limit"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>> Sorry, I am mistaken, two hosts failed for the agent with the following error:
>>>>>>>>
>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>>>>>>> ERROR Failed to start monitoring domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>> during domain acquisition
>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>>>>>>> ERROR Shutting down the agent because of 3 failures in a row!
>>>>>>>>
>>>>>>>> What could cause these timeouts? Some other service not running?
>>>>>>>>
>>>>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>> Both services are up on all three hosts. The broke logs just report:
>>>>>>>>>
>>>>>>>>> Thread-6549::INFO::2017-06-29
>>>>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>>>>>>>> Connection established
>>>>>>>>> Thread-6549::INFO::2017-06-29
>>>>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>>>>>>> Connection closed
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Cam
>>>>>>>>>
>>>>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak at redhat.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>>>>>>>>>> are restarted and up. The error says the agent can't talk to the
>>>>>>>>>> broker. Is there anything in the broker.log?
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>> Martin Sivak
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>> I've restarted those two services across all hosts, have taken the
>>>>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>>>>>>>>>> Hosted Engine over to another host, it reports that all three hosts
>>>>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>>>>>>>>>>> host'.
>>>>>>>>>>>
>>>>>>>>>>> On the host that the Hosted Engine is currently on it reports in the agent.log:
>>>>>>>>>>>
>>>>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>>>>>>>>>> Connection closed: Connection closed
>>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>>>>>>>>>> getting service path: Connection closed
>>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>>>>>>>>>>> call last):
>>>>>>>>>>>                                                     File
>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>>>>> line 191, in _run_agent
>>>>>>>>>>>                                                       return action(he)
>>>>>>>>>>>                                                     File
>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>>>>> line 64, in action_proper
>>>>>>>>>>>                                                       return
>>>>>>>>>>> he.start_monitoring()
>>>>>>>>>>>                                                     File
>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>> line 411, in start_monitoring
>>>>>>>>>>>                                                       self._initialize_sanlock()
>>>>>>>>>>>                                                     File
>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>> line 691, in _initialize_sanlock
>>>>>>>>>>>
>>>>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>>>>>>>>>>>                                                     File
>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>>>>>>> line 162, in get_service_path
>>>>>>>>>>>                                                       .format(str(e)))
>>>>>>>>>>>                                                   RequestError: Failed
>>>>>>>>>>> to get service path: Connection closed
>>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak at redhat.com> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.
>>>>>>>>>>>>
>>>>>>>>>>>> The scheduling message just means that the host has score 0 or is not
>>>>>>>>>>>> reporting score at all.
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the
>>>>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not
>>>>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..'
>>>>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted
>>>>>>>>>>>>> engine' option, which suggests that something needs restarting. Should
>>>>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the
>>>>>>>>>>>>> change in host_id?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak at redhat.com> wrote:
>>>>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is a way
>>>>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers would be
>>>>>>>>>>>>>> the best.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix
>>>>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I
>>>>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would
>>>>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then
>>>>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another
>>>>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use the
>>>>>>>>>>>>>>> older backup from the bare metal engine that I originally used, or use
>>>>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been
>>>>>>>>>>>>>>> added since switching to Hosted Engine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately I have very little time left to get this working before
>>>>>>>>>>>>>>> I have to hand it over for eval (by end of Friday).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here are some log snippets from the cluster that are current
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
>>>>>>>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
>>>>>>>>>>>>>>> 3) (clusterlock:282)
>>>>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
>>>>>>>>>>>>>>> Error acquiring host id 3 for domain
>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>>   File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
>>>>>>>>>>>>>>>     self.domain.acquireHostId(self.hostId, async=True)
>>>>>>>>>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
>>>>>>>>>>>>>>>     self._manifest.acquireHostId(hostId, async)
>>>>>>>>>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
>>>>>>>>>>>>>>>     self._domainLock.acquireHostId(hostId, async)
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>>>>>>>>>>>>>> line 297, in acquireHostId
>>>>>>>>>>>>>>>     raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
>>>>>>>>>>>>>>> lockspace add failure', 'Invalid argument'))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>>>>>>>> Failed to start monitoring domain
>>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>>> Unexpected error
>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>>> line 443, in start_monitoring
>>>>>>>>>>>>>>>     self._initialize_domain_monitor()
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>>>>>>>>     raise Exception(msg)
>>>>>>>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From sanlock.log:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>>>> conflicts with name of list1 s5
>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From the two other hosts:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> host 2:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> vdsm.log
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
>>>>>>>>>>>>>>> Internal server error (__init__:570)
>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>>>>>>>>>>>>>>> 565, in _handle_request
>>>>>>>>>>>>>>>     res = method(**params)
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>>>>>>>>>>>>>>> 202, in _dynamicMethod
>>>>>>>>>>>>>>>     result = fn(*methodArgs)
>>>>>>>>>>>>>>>   File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
>>>>>>>>>>>>>>>     io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>>>>>>>>>>>>>>>   File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies
>>>>>>>>>>>>>>>     'current_values': v.getIoTune()}
>>>>>>>>>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
>>>>>>>>>>>>>>>     result = self.getIoTuneResponse()
>>>>>>>>>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse
>>>>>>>>>>>>>>>     res = self._dom.blockIoTune(
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
>>>>>>>>>>>>>>> 47, in __getattr__
>>>>>>>>>>>>>>>     % self.vmid)
>>>>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
>>>>>>>>>>>>>>> started yet or was shut down
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
>>>>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>> OVF_STORE volume path:
>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>> Found an OVF for HE VM, trying to convert
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>> Got vm.conf from OVF_STORE
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>>>>>>>>>>> Reloading vm.conf from the shared storage domain
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /var/log/messages:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> host 1:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /var/log/messages also in sanlock.log
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100
>>>>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
>>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100
>>>>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>>>>>>>> Failed to start monitoring domain
>>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>>> Unexpected error
>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>>> line 443, in start_monitoring
>>>>>>>>>>>>>>>     self._initialize_domain_monitor()
>>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>>>>>>>>     raise Exception(msg)
>>>>>>>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>>>>>>>>>>> VDSM domain monitor status: PENDING
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>>>>>>>>>>> Failed to stop monitoring domain
>>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is
>>>>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f'
>>>>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>>>>>>>>>> Agent shutting down
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The other has
>>>>>>>>>>>>>>>> a different host_id.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on
>>>>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it
>>>>>>>>>>>>>>>> installed). The second one I deployed said it was able to host the
>>>>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host
>>>>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would
>>>>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And
>>>>>>>>>>>>>>>> now the host that said it was able to host the engine says
>>>>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move
>>>>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The summary is:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled
>>>>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted
>>>>>>>>>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf.
>>>>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine
>>>>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon
>>>>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after migration
>>>>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports:
>>>>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in
>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was
>>>>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal engine
>>>>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, without
>>>>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after
>>>>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine).
>>>>>>>>>>>>>>>> Sanlock log shows conflicts
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I will look through all the logs for any other errors. Please let me
>>>>>>>>>>>>>>>> know if you need any logs or other clarification/information.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Campbell
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak at redhat.com> wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> can you please check the contents of
>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is
>>>>>>>>>>>>>>>>> right now) and search for host-id?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>>>>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
>>>>>>>>>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> C
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed
>>>>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine
>>>>>>>>>>>>>>>>>>> migrate to another host. The
>>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>>>>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>>>>>>>>>>>>>>>>>>> reports:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>>>>>>>>>>>>>>>>>>> High Availability Communications Broker...
>>>>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>>>>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>>>>>>>>>>>>>>>>>> Failed to read metadata from
>>>>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>>>>>>>>>>>>>>>>                                                   Traceback (most
>>>>>>>>>>>>>>>>>>> recent call last):
>>>>>>>>>>>>>>>>>>>                                                     File
>>>>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type
>>>>>>>>>>>>>>>>>>>                                                       f =
>>>>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>>>>>>>>>>>>>>>>>>                                                   OSError: [Errno 2]
>>>>>>>>>>>>>>>>>>> No such file or directory:
>>>>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The
>>>>>>>>>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the
>>>>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
>>>>>>>>>>>>>>>>>>> this a san locking issue?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from
>>>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services
>>>>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted
>>>>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 =
>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
>>>>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am
>>>>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6
>>>>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to
>>>>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently on into
>>>>>>>>>>>>>>>>>>>>> maintenance, it reports:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode.
>>>>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine VM.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status:
>>>>>>>>>>>>>>>>>>>>> unknown stale-data'.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to
>>>>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no
>>>>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
>>>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from
>>>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Tomas,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>>>>>>>>>>>>>>>>>>>>>> engine VM, I have:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>>>>>>>>>>>>>>>>>>>>> <michal.skrivanek at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes
>>>>>>>>>>>>>>>>>>>>>>>> > the following error?
>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-)
>>>>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on
>>>>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE
>>>>>>>>>>>>>>>>>>>>>>>> VMs
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this:
>>>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value =
>>>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>> > Thanks.
>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >> Hi Martin,
>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from
>>>>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again
>>>>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there
>>>>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance.
>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the
>>>>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise
>>>>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s)
>>>>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file
>>>>>>>>>>>>>>>>>>>>>>>> >> somewhere?
>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>> >> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>> >> Cam
>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>> >>> Regards
>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak
>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of :
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly  and
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> further
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> state ?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a
>>>>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error
>>>>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at
>>>>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but
>>>>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted
>>>>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster
>>>>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>> Cam
>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> (which
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks=
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved).
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak at redhat.com>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar at redhat.com>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==--
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage             : True
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date                  : False
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname                           :
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID                            : 1
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status                      : unknown stale-data
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score                              : 0
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped                            : True
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance                  : False
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32                              : 0217f07b
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp               : 2911
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp                     : 2897
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp):
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_parse_version=1
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_feature_version=1
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        timestamp=2897 (Thu Jun 15 16:22:54 2017)
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        host-id=1
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        score=0
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017)
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        conf_on_shared_storage=True
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        maintenance=False
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        state=AgentStopped
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        stopped=True
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned:
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users at ovirt.org
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users at ovirt.org
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Users at ovirt.org
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> > Users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> > Users at ovirt.org
>>>>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>