, since
ovirt-ha-agent was not running anyway, but it fails with the following
error:
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 191, in _run_agent
return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 67, in action_clean
return he.clean(options.force_cleanup)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 345, in clean
self._initialize_domain_monitor()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
occurred, giving up. Please review the log and consider filing a bug.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote:
Actually, it looks like sanlock problems:
"SanlockInitializationError: Failed to initialize sanlock, the
number of errors has exceeded the limit"
On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote:
> Sorry, I am mistaken, two hosts failed for the agent with the following error:
>
> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
> ERROR Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
> ERROR Shutting down the agent because of 3 failures in a row!
>
> What could cause these timeouts? Some other service not running?
>
> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote:
>> Both services are up on all three hosts. The broke logs just report:
>>
>> Thread-6549::INFO::2017-06-29
>>
17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>> Connection established
>> Thread-6549::INFO::2017-06-29
>>
17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>> Connection closed
>>
>> Thanks,
>>
>> Cam
>>
>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote:
>>> Hi,
>>>
>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>>> are restarted and up. The error says the agent can't talk to the
>>> broker. Is there anything in the broker.log?
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote:
>>>> I've restarted those two services across all hosts, have taken the
>>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>>> Hosted Engine over to another host, it reports that all three hosts
>>>> 'did not satisfy internal filter HA because it is not a Hosted
Engine
>>>> host'.
>>>>
>>>> On the host that the Hosted Engine is currently on it reports in the
agent.log:
>>>>
>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>>> Connection closed: Connection closed
>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>>> getting service path: Connection closed
>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>>>> call last):
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>> line 191, in _run_agent
>>>> return action(he)
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>> line 64, in action_proper
>>>> return
>>>> he.start_monitoring()
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>> line 411, in start_monitoring
>>>>
self._initialize_sanlock()
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>> line 691, in _initialize_sanlock
>>>>
>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>> line 162, in get_service_path
>>>> .format(str(e)))
>>>> RequestError: Failed
>>>> to get service path: Connection closed
>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>>>>
>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com>
wrote:
>>>>> Hi,
>>>>>
>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker
services.
>>>>>
>>>>> The scheduling message just means that the host has score 0 or is
not
>>>>> reporting score at all.
>>>>>
>>>>> Martin
>>>>>
>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>> Thanks Martin, do I have to restart anything? When I try to use
the
>>>>>> 'migrate' operation, it complains that the other two
hosts 'did not
>>>>>> satisfy internal filter HA because it is not a Hosted Engine
host..'
>>>>>> (even though I reinstalled both these hosts with the 'deploy
hosted
>>>>>> engine' option, which suggests that something needs
restarting. Should
>>>>>> I worry about the sanlock errors, or will that be resolved by
the
>>>>>> change in host_id?
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Cam
>>>>>>
>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>> Change the ids so they are distinct. I need to check if there
is a way
>>>>>>> to read the SPM ids from the engine as using the same numbers
would be
>>>>>>> the best.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>> Is there any way of recovering from this situation?
I'd prefer to fix
>>>>>>>> the issue rather than re-deploy, but if there is no
recovery path, I
>>>>>>>> could perhaps try re-deploying the hosted engine. In
which case, would
>>>>>>>> the best option be to take a backup of the Hosted Engine,
and then
>>>>>>>> shut it down, re-initialise the SAN partition (or use
another
>>>>>>>> partition) and retry the deployment? Would it be better
to use the
>>>>>>>> older backup from the bare metal engine that I originally
used, or use
>>>>>>>> a backup from the Hosted Engine? I'm not sure if any
VMs have been
>>>>>>>> added since switching to Hosted Engine.
>>>>>>>>
>>>>>>>> Unfortunately I have very little time left to get this
working before
>>>>>>>> I have to hand it over for eval (by end of Friday).
>>>>>>>>
>>>>>>>> Here are some log snippets from the cluster that are
current
>>>>>>>>
>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted
Engine:
>>>>>>>>
>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b)
[storage.SANLock]
>>>>>>>> Acquiring host id for domain
207221b2-959b-426b-b945-18e1adfed62f (id:
>>>>>>>> 3) (clusterlock:282)
>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b)
[storage.Monitor]
>>>>>>>> Error acquiring host id 3 for domain
>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File "/usr/share/vdsm/storage/monitor.py",
line 555, in _acquireHostId
>>>>>>>> self.domain.acquireHostId(self.hostId, async=True)
>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line
790, in acquireHostId
>>>>>>>> self._manifest.acquireHostId(hostId, async)
>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line
449, in acquireHostId
>>>>>>>> self._domainLock.acquireHostId(hostId, async)
>>>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>>>>>>> line 297, in acquireHostId
>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>>> AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f',
SanlockException(22, 'Sanlock
>>>>>>>> lockspace add failure', 'Invalid argument'))
>>>>>>>>
>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the
same host:
>>>>>>>>
>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>
13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>> Failed to start monitoring domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>> during domain acquisition
>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>
13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>> Error while monitoring engine: Failed to start monitoring
domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>> during domain acquisition
>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>
13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>> Unexpected error
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>> line 443, in start_monitoring
>>>>>>>> self._initialize_domain_monitor()
>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>> raise Exception(msg)
>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>> during domain acquisition
>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>
13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>>
>>>>>>>> From sanlock.log:
>>>>>>>>
>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>>>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>> conflicts with name of list1 s5
>>>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>
>>>>>>>> From the two other hosts:
>>>>>>>>
>>>>>>>> host 2:
>>>>>>>>
>>>>>>>> vdsm.log
>>>>>>>>
>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4)
[jsonrpc.JsonRpcServer]
>>>>>>>> Internal server error (__init__:570)
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>>>>>>>> 565, in _handle_request
>>>>>>>> res = method(**params)
>>>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>>>>>>>> 202, in _dynamicMethod
>>>>>>>> result = fn(*methodArgs)
>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in
getAllVmIoTunePolicies
>>>>>>>> io_tune_policies_dict =
self._cif.getAllVmIoTunePolicies()
>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448,
in getAllVmIoTunePolicies
>>>>>>>> 'current_values': v.getIoTune()}
>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803,
in getIoTune
>>>>>>>> result = self.getIoTuneResponse()
>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816,
in getIoTuneResponse
>>>>>>>> res = self._dom.blockIoTune(
>>>>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
>>>>>>>> 47, in __getattr__
>>>>>>>> % self.vmid)
>>>>>>>> NotConnectedError: VM
u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
>>>>>>>> started yet or was shut down
>>>>>>>>
>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>>>
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>>>>>> Found OVF_STORE:
imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>> OVF_STORE volume path:
>>>>>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>> Found an OVF for HE VM, trying to convert
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>> Got vm.conf from OVF_STORE
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29
10:53:59 2017
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>
10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>>>> Reloading vm.conf from the shared storage domain
>>>>>>>>
>>>>>>>> /var/log/messages:
>>>>>>>>
>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl
80306d02 to a partition!
>>>>>>>>
>>>>>>>>
>>>>>>>> host 1:
>>>>>>>>
>>>>>>>> /var/log/messages also in sanlock.log
>>>>>>>>
>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29
11:01:02+0100
>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2
1193177
>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29
11:01:03+0100
>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262
>>>>>>>>
>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log:
>>>>>>>>
>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>
15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>> Failed to start monitoring domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>> during domain acquisition
>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>
15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>> Error while monitoring engine: Failed to start monitoring
domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>> during domain acquisition
>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>
15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>> Unexpected error
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>> line 443, in start_monitoring
>>>>>>>> self._initialize_domain_monitor()
>>>>>>>> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>> raise Exception(msg)
>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f,
host_id=1): timeout
>>>>>>>> during domain acquisition
>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>
15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>
15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>>>> VDSM domain monitor status: PENDING
>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>
15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>>>> Failed to stop monitoring domain
>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage
domain is
>>>>>>>> member of pool:
u'domain=207221b2-959b-426b-b945-18e1adfed62f'
>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>
15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>>> Agent shutting down
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for any help,
>>>>>>>>
>>>>>>>>
>>>>>>>> Cam
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>> Hi Martin,
>>>>>>>>>
>>>>>>>>> yes, on two of the machines they have the same
host_id. The other has
>>>>>>>>> a different host_id.
>>>>>>>>>
>>>>>>>>> To update since yesterday: I reinstalled and deployed
Hosted Engine on
>>>>>>>>> the other host (so all three hosts in the cluster now
have it
>>>>>>>>> installed). The second one I deployed said it was
able to host the
>>>>>>>>> engine (unlike the first I reinstalled), so I tried
putting the host
>>>>>>>>> with the Hosted Engine on it into maintenance to see
if it would
>>>>>>>>> migrate over. It managed to move all hosts but the
Hosted Engine. And
>>>>>>>>> now the host that said it was able to host the engine
says
>>>>>>>>> 'unavailable due to HA score'. The host that
it was trying to move
>>>>>>>>> from is now in 'preparing for maintenance'
for the last 12 hours.
>>>>>>>>>
>>>>>>>>> The summary is:
>>>>>>>>>
>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine
hosts, reinstalled
>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying
it can host the Hosted
>>>>>>>>> Hngine, host_id of '2' in
/etc/ovirt-hosted-engine/hosted-engine.conf.
>>>>>>>>> 'add_lockspace' fails in sanlock.log
>>>>>>>>>
>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing
before Hosted Engine
>>>>>>>>> was created. Reinstalled with 'Deploy Hosted
Engine'. Had an icon
>>>>>>>>> saying that it was able to host the Hosted Engine,
but after migration
>>>>>>>>> was attempted when putting kvm-ldn-03 into
maintenance, it reports:
>>>>>>>>> 'unavailable due to HA score'. It has a
host_id of '1' in
>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No
errors in sanlock.log
>>>>>>>>>
>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted
Engine on, which was
>>>>>>>>> not part of the original cluster. I restored the
bare-metal engine
>>>>>>>>> backup in the Hosted Engine on this host when
deploying it, without
>>>>>>>>> error. It currently has the Hosted Engine on it (as
the only VM after
>>>>>>>>> I put that host into maintenance to test the HA of
Hosted Engine).
>>>>>>>>> Sanlock log shows conflicts
>>>>>>>>>
>>>>>>>>> I will look through all the logs for any other
errors. Please let me
>>>>>>>>> know if you need any logs or other
clarification/information.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Campbell
>>>>>>>>>
>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> can you please check the contents of
>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not
sure which one it is
>>>>>>>>>> right now) and search for host-id?
>>>>>>>>>>
>>>>>>>>>> Make sure the IDs are different. If they are not,
then there is a bug somewhere.
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>> I see this on the host it is trying to
migrate in /var/log/sanlock:
>>>>>>>>>>>
>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528
lockspace
>>>>>>>>>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]:
s3528 delta_acquire host_id 1
>>>>>>>>>>> busy1 1 2 1042692
3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528
add_lockspace fail result -262
>>>>>>>>>>>
>>>>>>>>>>> The sanlock service is running. Why would
this occur?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> C
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the reply. I have done this,
and the deployment completed
>>>>>>>>>>>> without error. However, it still will not
allow the Hosted Engine
>>>>>>>>>>>> migrate to another host. The
>>>>>>>>>>>>
/etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>>>>>>>>>>>> I re-installed, but the
ovirt-ha-broker.service, though it starts,
>>>>>>>>>>>> reports:
>>>>>>>>>>>>
>>>>>>>>>>>>
--------------------8<-------------------
>>>>>>>>>>>>
>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]:
Starting oVirt Hosted Engine
>>>>>>>>>>>> High Availability Communications
Broker...
>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01
ovirt-ha-broker[6101]: ovirt-ha-broker
>>>>>>>>>>>>
ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>>>>>>>>>>> Failed to read metadata from
>>>>>>>>>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>>>>>>>>>
Traceback (most
>>>>>>>>>>>> recent call last):
>>>>>>>>>>>>
File
>>>>>>>>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>>>>>>> line 129, in
get_raw_stats_for_service_type
>>>>>>>>>>>>
f =
>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY |
os.O_SYNC)
>>>>>>>>>>>>
OSError: [Errno 2]
>>>>>>>>>>>> No such file or directory:
>>>>>>>>>>>>
'/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>>>>>>>>
>>>>>>>>>>>>
--------------------8<-------------------
>>>>>>>>>>>>
>>>>>>>>>>>> I checked the path, and it exists. I can
run 'less -f' on it fine. The
>>>>>>>>>>>> perms are slightly different on the host
that is running the VM vs the
>>>>>>>>>>>> one that is reporting errors (600 vs
660), ownership is vdsm:qemu. Is
>>>>>>>>>>>> this a san locking issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>
>>>>>>>>>>>> Cam
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin
Sivak <msivak(a)redhat.com> wrote:
>>>>>>>>>>>>>> Should it be? It was not in the
instructions for the migration from
>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>
>>>>>>>>>>>>> The hosted engine will only migrate
to hosts that have the services
>>>>>>>>>>>>> running. Please put one other host to
maintenance and select Hosted
>>>>>>>>>>>>> engine action: DEPLOY in the
reinstall dialog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> Martin Sivak
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>> I changed the
'os.other.devices.display.protocols.value.3.6 =
>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl'
line to have the same display protocols
>>>>>>>>>>>>>> as 4 and the hosted engine now
appears in the list of VMs. I am
>>>>>>>>>>>>>> guessing the compatibility
version was causing it to use the 3.6
>>>>>>>>>>>>>> version. However, I am still
unable to migrate the engine VM to
>>>>>>>>>>>>>> another host. When I try putting
the host it is currently on into
>>>>>>>>>>>>>> maintenance, it reports:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Error while executing action:
Cannot switch the Host(s) to Maintenance mode.
>>>>>>>>>>>>>> There are no available hosts
capable of running the engine VM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Running 'hosted-engine
--vm-status' still shows 'Engine status:
>>>>>>>>>>>>>> unknown stale-data'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The ovirt-ha-broker service is
only running on one host. It was set to
>>>>>>>>>>>>>> 'disabled' in systemd. It
won't start as there is no
>>>>>>>>>>>>>>
/etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
>>>>>>>>>>>>>> Should it be? It was not in the
instructions for the migration from
>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM,
cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>> Hi Tomas,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So in my
/usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>>>>>>>>>>>>>>> engine VM, I have:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>
os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That seems to match - I
assume since this is 4.1, the 3.6 should not apply
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there somewhere else I
should be looking?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40
AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at
12:38 PM, Michal Skrivanek
>>>>>>>>>>>>>>>>
<michal.skrivanek(a)redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > On 22 Jun 2017,
at 12:31, Martin Sivak <msivak(a)redhat.com> wrote:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Tomas, what
fields are needed in a VM to pass the check that causes
>>>>>>>>>>>>>>>>> > the following
error?
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >>>>>
WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>> >>>>>
'ImportVm'
>>>>>>>>>>>>>>>>> >>>>>
failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> to match the OS and
VM Display type;-)
>>>>>>>>>>>>>>>>> Configuration is in
osinfo….e.g. if that is import from older releases on
>>>>>>>>>>>>>>>>> Linux this is
typically caused by the cahgen of cirrus to vga for non-SPICE
>>>>>>>>>>>>>>>>> VMs
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yep, the default
supported combinations for 4.0+ is this:
>>>>>>>>>>>>>>>>
os.other.devices.display.protocols.value =
>>>>>>>>>>>>>>>>
spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Thanks.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On Thu, Jun 22,
2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>> >> Hi Martin,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> just as
a random comment, do you still have the database backup from
>>>>>>>>>>>>>>>>> >>> the bare
metal -> VM attempt? It might be possible to just try again
>>>>>>>>>>>>>>>>> >>> using
it. Or in the worst case.. update the offending value there
>>>>>>>>>>>>>>>>> >>> before
restoring it to the new engine instance.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> I still have
the backup. I'd rather do the latter, as re-running the
>>>>>>>>>>>>>>>>> >> HE
deployment is quite lengthy and involved (I have to re-initialise
>>>>>>>>>>>>>>>>> >> the FC
storage each time). Do you know what the offending value(s)
>>>>>>>>>>>>>>>>> >> would be?
Would it be in the Postgres DB or in a config file
>>>>>>>>>>>>>>>>> >> somewhere?
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Cheers,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Cam
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>> Regards
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Martin
Sivak
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> On Thu,
Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>>> Hi
Yanir,
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
Thanks for the reply.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>>
First of all, maybe a chain reaction of :
>>>>>>>>>>>>>>>>> >>>>>
WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>> >>>>>
'ImportVm'
>>>>>>>>>>>>>>>>> >>>>>
failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>> >>>>>
is causing the hosted engine vm not to be set up correctly and
>>>>>>>>>>>>>>>>> >>>>>
further
>>>>>>>>>>>>>>>>> >>>>>
actions were made when the hosted engine vm wasnt in a stable state.
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
As for now, are you trying to revert back to a previous/initial
>>>>>>>>>>>>>>>>> >>>>>
state ?
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
I'm not trying to revert it to a previous state for now. This was a
>>>>>>>>>>>>>>>>> >>>>
migration from a bare metal engine, and it didn't report any error
>>>>>>>>>>>>>>>>> >>>>
during the migration. I'd had some problems on my first attempts at
>>>>>>>>>>>>>>>>> >>>> this
migration, whereby it never completed (due to a proxy issue) but
>>>>>>>>>>>>>>>>> >>>> I
managed to resolve this. Do you know of a way to get the Hosted
>>>>>>>>>>>>>>>>> >>>>
Engine VM into a stable state, without rebuilding the entire cluster
>>>>>>>>>>>>>>>>> >>>> from
scratch (since I have a lot of VMs on it)?
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
Thanks for any help.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
Regards,
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> Cam
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>>
Regards,
>>>>>>>>>>>>>>>>> >>>>>
Yanir
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>> Hi Jenny/Martin,
>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>> Any idea what I can do here? The hosted engine VM has no log on
any
>>>>>>>>>>>>>>>>>
>>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put
the
>>>>>>>>>>>>>>>>>
>>>>>> host into maintenance, e.g., to upgrade it that I created it on
>>>>>>>>>>>>>>>>>
>>>>>> (which
>>>>>>>>>>>>>>>>>
>>>>>> I think is hosting it), or if it fails for any reason, it
won't get
>>>>>>>>>>>>>>>>>
>>>>>> migrated to another host, and I will not be able to manage the
>>>>>>>>>>>>>>>>>
>>>>>> cluster. It seems to be a very dangerous position to be in.
>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>> Cam
>>>>>>>>>>>>>>>>>
>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>> Thanks Martin. The hosts are all part of the same cluster.
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> I get these errors in the engine.log on the engine:
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> 2017-06-19 03:28:05,030Z WARN
>>>>>>>>>>>>>>>>>
>>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>
>>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>>>>>>>>>
>>>>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>
>>>>>>> failed for user SYST
>>>>>>>>>>>>>>>>>
>>>>>>> EM. Reasons:
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>
VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>
>>>>>>> 2017-06-19 03:28:05,030Z INFO
>>>>>>>>>>>>>>>>>
>>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>
>>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>>>>>>>>>>>>>>>>
>>>>>>> 'EngineLock:{exclusiveLocks='[a
>>>>>>>>>>>>>>>>>
>>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>>>>>>>>>
>>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName
HostedEngine>,
>>>>>>>>>>>>>>>>>
>>>>>>> HostedEngine=<VM_NAME,
ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>>>>>>>>>
>>>>>>> sharedLocks=
>>>>>>>>>>>>>>>>>
>>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>>>>>>>>>
>>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName
HostedEngine>]'}'
>>>>>>>>>>>>>>>>>
>>>>>>> 2017-06-19 03:28:05,030Z ERROR
>>>>>>>>>>>>>>>>>
>>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>>>>>>>>>
>>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the
Hosted
>>>>>>>>>>>>>>>>>
>>>>>>> Engine VM
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> The sanlock.log reports conflicts on that same host, and a
>>>>>>>>>>>>>>>>>
>>>>>>> different
>>>>>>>>>>>>>>>>>
>>>>>>> error on the other hosts, not sure if they are related.
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on
the
>>>>>>>>>>>>>>>>>
>>>>>>> host
>>>>>>>>>>>>>>>>>
>>>>>>> which I deployed the hosted engine VM on:
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>
13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>>>>
>>>>>>> Unable to extract HEVM OVF
>>>>>>>>>>>>>>>>>
>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>
13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>>>>
>>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling
back
>>>>>>>>>>>>>>>>>
>>>>>>> to
>>>>>>>>>>>>>>>>>
>>>>>>> initial vm.conf
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> I've seen some of these issues reported in bugzilla, but
they were
>>>>>>>>>>>>>>>>>
>>>>>>> for
>>>>>>>>>>>>>>>>>
>>>>>>> older versions of oVirt (and appear to be resolved).
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> I will install that package on the other two hosts, for which
I
>>>>>>>>>>>>>>>>>
>>>>>>> will
>>>>>>>>>>>>>>>>>
>>>>>>> put them in maintenance as vdsm is installed as an upgrade.
I
>>>>>>>>>>>>>>>>>
>>>>>>> guess
>>>>>>>>>>>>>>>>>
>>>>>>> restarting vdsm is a good idea after that?
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> Campbell
>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak
<msivak(a)redhat.com>
>>>>>>>>>>>>>>>>>
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>> you do not have to install it on all hosts. But you
should have
>>>>>>>>>>>>>>>>>
>>>>>>>> more
>>>>>>>>>>>>>>>>>
>>>>>>>> than one and ideally all hosted engine enabled nodes
should
>>>>>>>>>>>>>>>>>
>>>>>>>> belong to
>>>>>>>>>>>>>>>>>
>>>>>>>> the same engine cluster.
>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>> Does ovirt-hosted-engine-ha need to be installed
across all
>>>>>>>>>>>>>>>>>
>>>>>>>>> hosts?
>>>>>>>>>>>>>>>>>
>>>>>>>>> Could that be the reason it is failing to see it
properly?
>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> Logs are attached. I can see errors in there, but
am unsure how
>>>>>>>>>>>>>>>>>
>>>>>>>>>> they
>>>>>>>>>>>>>>>>>
>>>>>>>>>> arose.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> Campbell
>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>>>>>>>>>>
>>>>>>>>>> <etokar(a)redhat.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> From the output it looks like the agent is
down, try starting
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> it by
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> running:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> systemctl start ovirt-ha-agent.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> The engine is supposed to see the hosted
engine storage domain
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> import it
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> to the system, then it should import the
hosted engine vm.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> Can you attach the agent log from the host
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> and the engine log from the engine vm
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> (/var/log/ovirt-engine/engine.log)?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> Jenny
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc
<iucounu(a)gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What version are you running?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> 4.1.2.2-1.el7.centos
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> For the hosted engine vm to be
imported and displayed in the
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> engine, you
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> must first create a master storage
domain.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> To provide a bit more detail: this was a
migration of a
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> bare-metal
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> engine in an existing cluster to a hosted
engine VM for that
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> As part of this migration, I built an
entirely new host and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> ran
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> 'hosted-engine --deploy'
(followed these instructions:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> I restored the backup from the engine and
it completed
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> without any
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> errors. I didn't see any instructions
regarding a master
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> domain in the page above. The cluster has
two existing master
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> domains, one is fibre channel, which is
up, and one ISO
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> domain,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> is currently offline.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you mean the hosted engine
commands are failing?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> you run hosted-engine --vm-status
now?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Interestingly, whereas when I ran it
before, it exited with
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> output
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> and a return code of '1', it now
reports:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> --== Host 1 status ==--
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> conf_on_shared_storage :
True
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Status up-to-date :
False
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Hostname :
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Host ID : 1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Engine status :
unknown stale-data
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Score : 0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> stopped :
True
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Local maintenance :
False
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> crc32 :
0217f07b
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> local_conf_timestamp :
2911
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Host timestamp :
2897
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Extra metadata (valid at timestamp):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> metadata_parse_version=1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> metadata_feature_version=1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> timestamp=2897 (Thu Jun 15
16:22:54 2017)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> host-id=1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> score=0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun
15 16:23:08 2017)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> conf_on_shared_storage=True
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> maintenance=False
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> state=AgentStopped
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> stopped=True
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Yet I can login to the web GUI fine. I
guess it is not HA due
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> in an unknown state currently? Does the
hosted-engine-ha rpm
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> be installed across all nodes in the
cluster, btw?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc
<iucounu(a)gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've migrated from a
bare-metal engine to a hosted engine.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> no errors during the install,
however, the hosted engine
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> did not
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> on the host I deployed it on, and
it returns nothing (exit
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is 1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> however). I could not ping it
either. So I tried starting
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it via
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 'hosted-engine
--vm-start' and it returned:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But it then became available. I
logged into it
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> successfully. It
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas why the hosted-engine
commands fail, and why it
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>
>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>
>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>>
>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>
>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>
>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>>
>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >
_______________________________________________
>>>>>>>>>>>>>>>>> > Users mailing
list
>>>>>>>>>>>>>>>>> > Users(a)ovirt.org
>>>>>>>>>>>>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>