Sorry, I am mistaken, two hosts failed for the agent with the following error:
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
ERROR Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
ERROR Shutting down the agent because of 3 failures in a row!
What could cause these timeouts? Some other service not running?
On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote:
Both services are up on all three hosts. The broke logs just report:
Thread-6549::INFO::2017-06-29
17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-6549::INFO::2017-06-29
17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thanks,
Cam
On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote:
> Hi,
>
> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
> are restarted and up. The error says the agent can't talk to the
> broker. Is there anything in the broker.log?
>
> Best regards
>
> Martin Sivak
>
> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote:
>> I've restarted those two services across all hosts, have taken the
>> Hosted Engine host out of maintenance, and when I try to migrate the
>> Hosted Engine over to another host, it reports that all three hosts
>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>> host'.
>>
>> On the host that the Hosted Engine is currently on it reports in the agent.log:
>>
>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>> Connection closed: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>> getting service path: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>> call last):
>> File
>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 191, in _run_agent
>> return action(he)
>> File
>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 64, in action_proper
>> return
>> he.start_monitoring()
>> File
>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 411, in start_monitoring
>> self._initialize_sanlock()
>> File
>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 691, in _initialize_sanlock
>>
>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>> File
>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>> line 162, in get_service_path
>> .format(str(e)))
>> RequestError: Failed
>> to get service path: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>>
>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote:
>>> Hi,
>>>
>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.
>>>
>>> The scheduling message just means that the host has score 0 or is not
>>> reporting score at all.
>>>
>>> Martin
>>>
>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote:
>>>> Thanks Martin, do I have to restart anything? When I try to use the
>>>> 'migrate' operation, it complains that the other two hosts
'did not
>>>> satisfy internal filter HA because it is not a Hosted Engine host..'
>>>> (even though I reinstalled both these hosts with the 'deploy hosted
>>>> engine' option, which suggests that something needs restarting.
Should
>>>> I worry about the sanlock errors, or will that be resolved by the
>>>> change in host_id?
>>>>
>>>> Kind regards,
>>>>
>>>> Cam
>>>>
>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com>
wrote:
>>>>> Change the ids so they are distinct. I need to check if there is a
way
>>>>> to read the SPM ids from the engine as using the same numbers would
be
>>>>> the best.
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>> Is there any way of recovering from this situation? I'd
prefer to fix
>>>>>> the issue rather than re-deploy, but if there is no recovery
path, I
>>>>>> could perhaps try re-deploying the hosted engine. In which case,
would
>>>>>> the best option be to take a backup of the Hosted Engine, and
then
>>>>>> shut it down, re-initialise the SAN partition (or use another
>>>>>> partition) and retry the deployment? Would it be better to use
the
>>>>>> older backup from the bare metal engine that I originally used,
or use
>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have
been
>>>>>> added since switching to Hosted Engine.
>>>>>>
>>>>>> Unfortunately I have very little time left to get this working
before
>>>>>> I have to hand it over for eval (by end of Friday).
>>>>>>
>>>>>> Here are some log snippets from the cluster that are current
>>>>>>
>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted
Engine:
>>>>>>
>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b)
[storage.SANLock]
>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f
(id:
>>>>>> 3) (clusterlock:282)
>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b)
[storage.Monitor]
>>>>>> Error acquiring host id 3 for domain
>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555,
in _acquireHostId
>>>>>> self.domain.acquireHostId(self.hostId, async=True)
>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in
acquireHostId
>>>>>> self._manifest.acquireHostId(hostId, async)
>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in
acquireHostId
>>>>>> self._domainLock.acquireHostId(hostId, async)
>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>>>>> line 297, in acquireHostId
>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>> AcquireHostIdFailure: Cannot acquire host id:
>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f',
SanlockException(22, 'Sanlock
>>>>>> lockspace add failure', 'Invalid argument'))
>>>>>>
>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>>>>>>
>>>>>> MainThread::ERROR::2017-06-19
>>>>>>
13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>> Failed to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1):
timeout
>>>>>> during domain acquisition
>>>>>> MainThread::WARNING::2017-06-19
>>>>>>
13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1):
timeout
>>>>>> during domain acquisition
>>>>>> MainThread::WARNING::2017-06-19
>>>>>>
13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Unexpected error
>>>>>> Traceback (most recent call last):
>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 443, in start_monitoring
>>>>>> self._initialize_domain_monitor()
>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 823, in _initialize_domain_monitor
>>>>>> raise Exception(msg)
>>>>>> Exception: Failed to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1):
timeout
>>>>>> during domain acquisition
>>>>>> MainThread::ERROR::2017-06-19
>>>>>>
13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>
>>>>>> From sanlock.log:
>>>>>>
>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>> conflicts with name of list1 s5
>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>
>>>>>> From the two other hosts:
>>>>>>
>>>>>> host 2:
>>>>>>
>>>>>> vdsm.log
>>>>>>
>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4)
[jsonrpc.JsonRpcServer]
>>>>>> Internal server error (__init__:570)
>>>>>> Traceback (most recent call last):
>>>>>> File
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>>>>>> 565, in _handle_request
>>>>>> res = method(**params)
>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>>>>>> 202, in _dynamicMethod
>>>>>> result = fn(*methodArgs)
>>>>>> File "/usr/share/vdsm/API.py", line 1454, in
getAllVmIoTunePolicies
>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in
getAllVmIoTunePolicies
>>>>>> 'current_values': v.getIoTune()}
>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in
getIoTune
>>>>>> result = self.getIoTuneResponse()
>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in
getIoTuneResponse
>>>>>> res = self._dom.blockIoTune(
>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
>>>>>> 47, in __getattr__
>>>>>> % self.vmid)
>>>>>> NotConnectedError: VM
u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
>>>>>> started yet or was shut down
>>>>>>
>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> OVF_STORE volume path:
>>>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> Found an OVF for HE VM, trying to convert
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> Got vm.conf from OVF_STORE
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59
2017
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>> MainThread::INFO::2017-06-29
>>>>>>
10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>> Reloading vm.conf from the shared storage domain
>>>>>>
>>>>>> /var/log/messages:
>>>>>>
>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to
a partition!
>>>>>>
>>>>>>
>>>>>> host 1:
>>>>>>
>>>>>> /var/log/messages also in sanlock.log
>>>>>>
>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29
11:01:02+0100
>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29
11:01:03+0100
>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262
>>>>>>
>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log:
>>>>>>
>>>>>> MainThread::ERROR::2017-06-27
>>>>>>
15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>> Failed to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1):
timeout
>>>>>> during domain acquisition
>>>>>> MainThread::WARNING::2017-06-27
>>>>>>
15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1):
timeout
>>>>>> during domain acquisition
>>>>>> MainThread::WARNING::2017-06-27
>>>>>>
15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Unexpected error
>>>>>> Traceback (most recent call last):
>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 443, in start_monitoring
>>>>>> self._initialize_domain_monitor()
>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 823, in _initialize_domain_monitor
>>>>>> raise Exception(msg)
>>>>>> Exception: Failed to start monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1):
timeout
>>>>>> during domain acquisition
>>>>>> MainThread::ERROR::2017-06-27
>>>>>>
15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>> MainThread::INFO::2017-06-27
>>>>>>
15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::INFO::2017-06-27
>>>>>>
15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>> Failed to stop monitoring domain
>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain
is
>>>>>> member of pool:
u'domain=207221b2-959b-426b-b945-18e1adfed62f'
>>>>>> MainThread::INFO::2017-06-27
>>>>>>
15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>> Agent shutting down
>>>>>>
>>>>>>
>>>>>> Thanks for any help,
>>>>>>
>>>>>>
>>>>>> Cam
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> yes, on two of the machines they have the same host_id. The
other has
>>>>>>> a different host_id.
>>>>>>>
>>>>>>> To update since yesterday: I reinstalled and deployed Hosted
Engine on
>>>>>>> the other host (so all three hosts in the cluster now have
it
>>>>>>> installed). The second one I deployed said it was able to
host the
>>>>>>> engine (unlike the first I reinstalled), so I tried putting
the host
>>>>>>> with the Hosted Engine on it into maintenance to see if it
would
>>>>>>> migrate over. It managed to move all hosts but the Hosted
Engine. And
>>>>>>> now the host that said it was able to host the engine says
>>>>>>> 'unavailable due to HA score'. The host that it was
trying to move
>>>>>>> from is now in 'preparing for maintenance' for the
last 12 hours.
>>>>>>>
>>>>>>> The summary is:
>>>>>>>
>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts,
reinstalled
>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can
host the Hosted
>>>>>>> Hngine, host_id of '2' in
/etc/ovirt-hosted-engine/hosted-engine.conf.
>>>>>>> 'add_lockspace' fails in sanlock.log
>>>>>>>
>>>>>>> kvm-ldn-02 - the other host that was pre-existing before
Hosted Engine
>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'.
Had an icon
>>>>>>> saying that it was able to host the Hosted Engine, but after
migration
>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it
reports:
>>>>>>> 'unavailable due to HA score'. It has a host_id of
'1' in
>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in
sanlock.log
>>>>>>>
>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on,
which was
>>>>>>> not part of the original cluster. I restored the bare-metal
engine
>>>>>>> backup in the Hosted Engine on this host when deploying it,
without
>>>>>>> error. It currently has the Hosted Engine on it (as the only
VM after
>>>>>>> I put that host into maintenance to test the HA of Hosted
Engine).
>>>>>>> Sanlock log shows conflicts
>>>>>>>
>>>>>>> I will look through all the logs for any other errors. Please
let me
>>>>>>> know if you need any logs or other
clarification/information.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Campbell
>>>>>>>
>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> can you please check the contents of
>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure
which one it is
>>>>>>>> right now) and search for host-id?
>>>>>>>>
>>>>>>>> Make sure the IDs are different. If they are not, then
there is a bug somewhere.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>> I see this on the host it is trying to migrate in
/var/log/sanlock:
>>>>>>>>>
>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528
lockspace
>>>>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528
delta_acquire host_id 1
>>>>>>>>> busy1 1 2 1042692
3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528
add_lockspace fail result -262
>>>>>>>>>
>>>>>>>>> The sanlock service is running. Why would this
occur?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> C
>>>>>>>>>
>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>> Hi Martin,
>>>>>>>>>>
>>>>>>>>>> Thanks for the reply. I have done this, and the
deployment completed
>>>>>>>>>> without error. However, it still will not allow
the Hosted Engine
>>>>>>>>>> migrate to another host. The
>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got
created ok on the host
>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service,
though it starts,
>>>>>>>>>> reports:
>>>>>>>>>>
>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>
>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting
oVirt Hosted Engine
>>>>>>>>>> High Availability Communications Broker...
>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]:
ovirt-ha-broker
>>>>>>>>>>
ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>>>>>>>>> Failed to read metadata from
>>>>>>>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>>>>>>>
Traceback (most
>>>>>>>>>> recent call last):
>>>>>>>>>>
File
>>>>>>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>>>>> line 129, in get_raw_stats_for_service_type
>>>>>>>>>>
f =
>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY |
os.O_SYNC)
>>>>>>>>>>
OSError: [Errno 2]
>>>>>>>>>> No such file or directory:
>>>>>>>>>>
'/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>>>>>>
>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>
>>>>>>>>>> I checked the path, and it exists. I can run
'less -f' on it fine. The
>>>>>>>>>> perms are slightly different on the host that is
running the VM vs the
>>>>>>>>>> one that is reporting errors (600 vs 660),
ownership is vdsm:qemu. Is
>>>>>>>>>> this a san locking issue?
>>>>>>>>>>
>>>>>>>>>> Thanks for any help,
>>>>>>>>>>
>>>>>>>>>> Cam
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>>>>>> Should it be? It was not in the
instructions for the migration from
>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>
>>>>>>>>>>> The hosted engine will only migrate to hosts
that have the services
>>>>>>>>>>> running. Please put one other host to
maintenance and select Hosted
>>>>>>>>>>> engine action: DEPLOY in the reinstall
dialog.
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>> Martin Sivak
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>> I changed the
'os.other.devices.display.protocols.value.3.6 =
>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to
have the same display protocols
>>>>>>>>>>>> as 4 and the hosted engine now appears in
the list of VMs. I am
>>>>>>>>>>>> guessing the compatibility version was
causing it to use the 3.6
>>>>>>>>>>>> version. However, I am still unable to
migrate the engine VM to
>>>>>>>>>>>> another host. When I try putting the host
it is currently on into
>>>>>>>>>>>> maintenance, it reports:
>>>>>>>>>>>>
>>>>>>>>>>>> Error while executing action: Cannot
switch the Host(s) to Maintenance mode.
>>>>>>>>>>>> There are no available hosts capable of
running the engine VM.
>>>>>>>>>>>>
>>>>>>>>>>>> Running 'hosted-engine
--vm-status' still shows 'Engine status:
>>>>>>>>>>>> unknown stale-data'.
>>>>>>>>>>>>
>>>>>>>>>>>> The ovirt-ha-broker service is only
running on one host. It was set to
>>>>>>>>>>>> 'disabled' in systemd. It
won't start as there is no
>>>>>>>>>>>>
/etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
>>>>>>>>>>>> Should it be? It was not in the
instructions for the migration from
>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Cam
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>> Hi Tomas,
>>>>>>>>>>>>>
>>>>>>>>>>>>> So in my
/usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>>>>>>>>>>>>> engine VM, I have:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>
os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>>>>>>
>>>>>>>>>>>>> That seems to match - I assume since
this is 4.1, the 3.6 should not apply
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there somewhere else I should be
looking?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM,
Tomas Jelinek <tjelinek(a)redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM,
Michal Skrivanek
>>>>>>>>>>>>>>
<michal.skrivanek(a)redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > On 22 Jun 2017, at
12:31, Martin Sivak <msivak(a)redhat.com> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Tomas, what fields are
needed in a VM to pass the check that causes
>>>>>>>>>>>>>>> > the following error?
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>> >>>>>
'ImportVm'
>>>>>>>>>>>>>>> >>>>> failed
for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> to match the OS and VM
Display type;-)
>>>>>>>>>>>>>>> Configuration is in
osinfo….e.g. if that is import from older releases on
>>>>>>>>>>>>>>> Linux this is typically
caused by the cahgen of cirrus to vga for non-SPICE
>>>>>>>>>>>>>>> VMs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> yep, the default supported
combinations for 4.0+ is this:
>>>>>>>>>>>>>>
os.other.devices.display.protocols.value =
>>>>>>>>>>>>>>
spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at
12:19 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>> >> Hi Martin,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> just as a random
comment, do you still have the database backup from
>>>>>>>>>>>>>>> >>> the bare metal
-> VM attempt? It might be possible to just try again
>>>>>>>>>>>>>>> >>> using it. Or in
the worst case.. update the offending value there
>>>>>>>>>>>>>>> >>> before restoring
it to the new engine instance.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I still have the
backup. I'd rather do the latter, as re-running the
>>>>>>>>>>>>>>> >> HE deployment is
quite lengthy and involved (I have to re-initialise
>>>>>>>>>>>>>>> >> the FC storage each
time). Do you know what the offending value(s)
>>>>>>>>>>>>>>> >> would be? Would it
be in the Postgres DB or in a config file
>>>>>>>>>>>>>>> >> somewhere?
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Cheers,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Cam
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>> Regards
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Martin Sivak
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> On Thu, Jun 22,
2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>> >>>> Hi Yanir,
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Thanks for
the reply.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>> First of
all, maybe a chain reaction of :
>>>>>>>>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>> >>>>>
'ImportVm'
>>>>>>>>>>>>>>> >>>>> failed
for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>> >>>>> is
causing the hosted engine vm not to be set up correctly and
>>>>>>>>>>>>>>> >>>>> further
>>>>>>>>>>>>>>> >>>>> actions
were made when the hosted engine vm wasnt in a stable state.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> As for
now, are you trying to revert back to a previous/initial
>>>>>>>>>>>>>>> >>>>> state ?
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> I'm not
trying to revert it to a previous state for now. This was a
>>>>>>>>>>>>>>> >>>> migration
from a bare metal engine, and it didn't report any error
>>>>>>>>>>>>>>> >>>> during the
migration. I'd had some problems on my first attempts at
>>>>>>>>>>>>>>> >>>> this
migration, whereby it never completed (due to a proxy issue) but
>>>>>>>>>>>>>>> >>>> I managed to
resolve this. Do you know of a way to get the Hosted
>>>>>>>>>>>>>>> >>>> Engine VM
into a stable state, without rebuilding the entire cluster
>>>>>>>>>>>>>>> >>>> from scratch
(since I have a lot of VMs on it)?
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Thanks for
any help.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Regards,
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Cam
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>>
Regards,
>>>>>>>>>>>>>>> >>>>> Yanir
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> On Wed,
Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Hi
Jenny/Martin,
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Any
idea what I can do here? The hosted engine VM has no log on any
>>>>>>>>>>>>>>> >>>>>> host
in /var/log/libvirt/qemu, and I fear that if I need to put the
>>>>>>>>>>>>>>> >>>>>> host
into maintenance, e.g., to upgrade it that I created it on
>>>>>>>>>>>>>>> >>>>>>
(which
>>>>>>>>>>>>>>> >>>>>> I
think is hosting it), or if it fails for any reason, it won't get
>>>>>>>>>>>>>>> >>>>>>
migrated to another host, and I will not be able to manage the
>>>>>>>>>>>>>>> >>>>>>
cluster. It seems to be a very dangerous position to be in.
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>>
Thanks,
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> Cam
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> On
Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>>
Thanks Martin. The hosts are all part of the same cluster.
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
I get these errors in the engine.log on the engine:
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
2017-06-19 03:28:05,030Z WARN
>>>>>>>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>> >>>>>>>
'ImportVm'
>>>>>>>>>>>>>>> >>>>>>>
failed for user SYST
>>>>>>>>>>>>>>> >>>>>>>
EM. Reasons:
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>> >>>>>>>
2017-06-19 03:28:05,030Z INFO
>>>>>>>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>>>>>>>>>>>>>> >>>>>>>
'EngineLock:{exclusiveLocks='[a
>>>>>>>>>>>>>>> >>>>>>>
79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>>>>>>>>>>>>>>> >>>>>>>
HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>>>>>>> >>>>>>>
sharedLocks=
>>>>>>>>>>>>>>> >>>>>>>
'[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>>>>>>>>>>>>>>> >>>>>>>
2017-06-19 03:28:05,030Z ERROR
>>>>>>>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>>>>>>>>>>>>>>> >>>>>>>
Engine VM
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
The sanlock.log reports conflicts on that same host, and a
>>>>>>>>>>>>>>> >>>>>>>
different
>>>>>>>>>>>>>>> >>>>>>>
error on the other hosts, not sure if they are related.
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
And this in the /var/log/ovirt-hosted-engine-ha/agent log on the
>>>>>>>>>>>>>>> >>>>>>>
host
>>>>>>>>>>>>>>> >>>>>>>
which I deployed the hosted engine VM on:
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>> >>>>>>>
Unable to extract HEVM OVF
>>>>>>>>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>> >>>>>>>
Failed extracting VM OVF from the OVF_STORE volume, falling back
>>>>>>>>>>>>>>> >>>>>>>
to
>>>>>>>>>>>>>>> >>>>>>>
initial vm.conf
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
I've seen some of these issues reported in bugzilla, but they were
>>>>>>>>>>>>>>> >>>>>>>
for
>>>>>>>>>>>>>>> >>>>>>>
older versions of oVirt (and appear to be resolved).
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
I will install that package on the other two hosts, for which I
>>>>>>>>>>>>>>> >>>>>>>
will
>>>>>>>>>>>>>>> >>>>>>>
put them in maintenance as vdsm is installed as an upgrade. I
>>>>>>>>>>>>>>> >>>>>>>
guess
>>>>>>>>>>>>>>> >>>>>>>
restarting vdsm is a good idea after that?
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
Thanks,
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
Campbell
>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>> >>>>>>>
On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com>
>>>>>>>>>>>>>>> >>>>>>>
wrote:
>>>>>>>>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>> you do not have to install it on all hosts. But you
should have
>>>>>>>>>>>>>>>
>>>>>>>> more
>>>>>>>>>>>>>>>
>>>>>>>> than one and ideally all hosted engine enabled nodes
should
>>>>>>>>>>>>>>>
>>>>>>>> belong to
>>>>>>>>>>>>>>>
>>>>>>>> the same engine cluster.
>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>> Does ovirt-hosted-engine-ha need to be installed
across all
>>>>>>>>>>>>>>>
>>>>>>>>> hosts?
>>>>>>>>>>>>>>>
>>>>>>>>> Could that be the reason it is failing to see it
properly?
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>> Cam
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>> Logs are attached. I can see errors in there, but
am unsure how
>>>>>>>>>>>>>>>
>>>>>>>>>> they
>>>>>>>>>>>>>>>
>>>>>>>>>> arose.
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>> Campbell
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>>>>>>>>
>>>>>>>>>> <etokar(a)redhat.com>
>>>>>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>> From the output it looks like the agent is
down, try starting
>>>>>>>>>>>>>>>
>>>>>>>>>>> it by
>>>>>>>>>>>>>>>
>>>>>>>>>>> running:
>>>>>>>>>>>>>>>
>>>>>>>>>>> systemctl start ovirt-ha-agent.
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>> The engine is supposed to see the hosted
engine storage domain
>>>>>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>> import it
>>>>>>>>>>>>>>>
>>>>>>>>>>> to the system, then it should import the
hosted engine vm.
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>> Can you attach the agent log from the host
>>>>>>>>>>>>>>>
>>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>>>>>>>
>>>>>>>>>>> and the engine log from the engine vm
>>>>>>>>>>>>>>>
>>>>>>>>>>> (/var/log/ovirt-engine/engine.log)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>> Jenny
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc
<iucounu(a)gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What version are you running?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> 4.1.2.2-1.el7.centos
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> For the hosted engine vm to be
imported and displayed in the
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> engine, you
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> must first create a master storage
domain.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> To provide a bit more detail: this was a
migration of a
>>>>>>>>>>>>>>>
>>>>>>>>>>>> bare-metal
>>>>>>>>>>>>>>>
>>>>>>>>>>>> engine in an existing cluster to a hosted
engine VM for that
>>>>>>>>>>>>>>>
>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>
>>>>>>>>>>>> As part of this migration, I built an
entirely new host and
>>>>>>>>>>>>>>>
>>>>>>>>>>>> ran
>>>>>>>>>>>>>>>
>>>>>>>>>>>> 'hosted-engine --deploy'
(followed these instructions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...).
>>>>>>>>>>>>>>>
>>>>>>>>>>>> I restored the backup from the engine and
it completed
>>>>>>>>>>>>>>>
>>>>>>>>>>>> without any
>>>>>>>>>>>>>>>
>>>>>>>>>>>> errors. I didn't see any instructions
regarding a master
>>>>>>>>>>>>>>>
>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>
>>>>>>>>>>>> domain in the page above. The cluster has
two existing master
>>>>>>>>>>>>>>>
>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>
>>>>>>>>>>>> domains, one is fibre channel, which is
up, and one ISO
>>>>>>>>>>>>>>>
>>>>>>>>>>>> domain,
>>>>>>>>>>>>>>>
>>>>>>>>>>>> which
>>>>>>>>>>>>>>>
>>>>>>>>>>>> is currently offline.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you mean the hosted engine
commands are failing?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> you run hosted-engine --vm-status
now?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Interestingly, whereas when I ran it
before, it exited with
>>>>>>>>>>>>>>>
>>>>>>>>>>>> no
>>>>>>>>>>>>>>>
>>>>>>>>>>>> output
>>>>>>>>>>>>>>>
>>>>>>>>>>>> and a return code of '1', it now
reports:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> --== Host 1 status ==--
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> conf_on_shared_storage :
True
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Status up-to-date :
False
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Hostname :
>>>>>>>>>>>>>>>
>>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Host ID : 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Engine status :
unknown stale-data
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Score : 0
>>>>>>>>>>>>>>>
>>>>>>>>>>>> stopped :
True
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Local maintenance :
False
>>>>>>>>>>>>>>>
>>>>>>>>>>>> crc32 :
0217f07b
>>>>>>>>>>>>>>>
>>>>>>>>>>>> local_conf_timestamp :
2911
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Host timestamp :
2897
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Extra metadata (valid at timestamp):
>>>>>>>>>>>>>>>
>>>>>>>>>>>> metadata_parse_version=1
>>>>>>>>>>>>>>>
>>>>>>>>>>>> metadata_feature_version=1
>>>>>>>>>>>>>>>
>>>>>>>>>>>> timestamp=2897 (Thu Jun 15
16:22:54 2017)
>>>>>>>>>>>>>>>
>>>>>>>>>>>> host-id=1
>>>>>>>>>>>>>>>
>>>>>>>>>>>> score=0
>>>>>>>>>>>>>>>
>>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun
15 16:23:08 2017)
>>>>>>>>>>>>>>>
>>>>>>>>>>>> conf_on_shared_storage=True
>>>>>>>>>>>>>>>
>>>>>>>>>>>> maintenance=False
>>>>>>>>>>>>>>>
>>>>>>>>>>>> state=AgentStopped
>>>>>>>>>>>>>>>
>>>>>>>>>>>> stopped=True
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Yet I can login to the web GUI fine. I
guess it is not HA due
>>>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>
>>>>>>>>>>>> being
>>>>>>>>>>>>>>>
>>>>>>>>>>>> in an unknown state currently? Does the
hosted-engine-ha rpm
>>>>>>>>>>>>>>>
>>>>>>>>>>>> need
>>>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>
>>>>>>>>>>>> be installed across all nodes in the
cluster, btw?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc
<iucounu(a)gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've migrated from a
bare-metal engine to a hosted engine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> no errors during the install,
however, the hosted engine
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> did not
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> on the host I deployed it on, and
it returns nothing (exit
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> however). I could not ping it
either. So I tried starting
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it via
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 'hosted-engine
--vm-start' and it returned:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But it then became available. I
logged into it
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> successfully. It
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas why the hosted-engine
commands fail, and why it
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>
>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>
>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>
>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>> >>>>>>
_______________________________________________
>>>>>>>>>>>>>>> >>>>>>
Users mailing list
>>>>>>>>>>>>>>> >>>>>>
Users(a)ovirt.org
>>>>>>>>>>>>>>> >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >
_______________________________________________
>>>>>>>>>>>>>>> > Users mailing list
>>>>>>>>>>>>>>> > Users(a)ovirt.org
>>>>>>>>>>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>