Is there any way of recovering from this situation? I'd prefer to fix
the issue rather than re-deploy, but if there is no recovery path, I
could perhaps try re-deploying the hosted engine. In which case, would
the best option be to take a backup of the Hosted Engine, and then
shut it down, re-initialise the SAN partition (or use another
partition) and retry the deployment? Would it be better to use the
older backup from the bare metal engine that I originally used, or use
a backup from the Hosted Engine? I'm not sure if any VMs have been
added since switching to Hosted Engine.
Unfortunately I have very little time left to get this working before
I have to hand it over for eval (by end of Friday).
Here are some log snippets from the cluster that are current
In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock]
Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
3) (clusterlock:282)
2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
Error acquiring host id 3 for domain
207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
self.domain.acquireHostId(self.hostId, async=True)
File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
self._manifest.acquireHostId(hostId, async)
File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
self._domainLock.acquireHostId(hostId, async)
File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
line 297, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id:
('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
lockspace add failure', 'Invalid argument'))
From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
MainThread::ERROR::2017-06-19
13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-19
13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Error while monitoring engine: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-19
13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Unexpected error
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 443, in start_monitoring
self._initialize_domain_monitor()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::ERROR::2017-06-19
13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
From sanlock.log:
2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
conflicts with name of list1 s5
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
From the two other hosts:
host 2:
vdsm.log
2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
Internal server error (__init__:570)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
565, in _handle_request
res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
202, in _dynamicMethod
result = fn(*methodArgs)
File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies
'current_values': v.getIoTune()}
File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
result = self.getIoTuneResponse()
File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse
res = self._dom.blockIoTune(
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
47, in __getattr__
% self.vmid)
NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
started yet or was shut down
/var/log/ovirt-hosted-engine-ha/agent.log
MainThread::INFO::2017-06-29
10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
MainThread::INFO::2017-06-29
10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2017-06-29
10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
OVF_STORE volume path:
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
MainThread::INFO::2017-06-29
10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Found an OVF for HE VM, trying to convert
MainThread::INFO::2017-06-29
10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Got vm.conf from OVF_STORE
MainThread::INFO::2017-06-29
10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017
MainThread::INFO::2017-06-29
10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUnexpectedlyDown (score: 0)
MainThread::INFO::2017-06-29
10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
Reloading vm.conf from the shared storage domain
/var/log/messages:
Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition!
host 1:
/var/log/messages also in sanlock.log
Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100
678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100
678326 [24159]: s4531 add_lockspace fail result -262
/var/log/ovirt-hosted-engine-ha/agent.log:
MainThread::ERROR::2017-06-27
15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-27
15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Error while monitoring engine: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-27
15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Unexpected error
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 443, in start_monitoring
self._initialize_domain_monitor()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::ERROR::2017-06-27
15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
MainThread::INFO::2017-06-27
15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
VDSM domain monitor status: PENDING
MainThread::INFO::2017-06-27
15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
Failed to stop monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is
member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f'
MainThread::INFO::2017-06-27
15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
Thanks for any help,
Cam
On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote:
Hi Martin,
yes, on two of the machines they have the same host_id. The other has
a different host_id.
To update since yesterday: I reinstalled and deployed Hosted Engine on
the other host (so all three hosts in the cluster now have it
installed). The second one I deployed said it was able to host the
engine (unlike the first I reinstalled), so I tried putting the host
with the Hosted Engine on it into maintenance to see if it would
migrate over. It managed to move all hosts but the Hosted Engine. And
now the host that said it was able to host the engine says
'unavailable due to HA score'. The host that it was trying to move
from is now in 'preparing for maintenance' for the last 12 hours.
The summary is:
kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled
with 'Deploy Hosted Engine'. No icon saying it can host the Hosted
Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf.
'add_lockspace' fails in sanlock.log
kvm-ldn-02 - the other host that was pre-existing before Hosted Engine
was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon
saying that it was able to host the Hosted Engine, but after migration
was attempted when putting kvm-ldn-03 into maintenance, it reports:
'unavailable due to HA score'. It has a host_id of '1' in
/etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log
kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was
not part of the original cluster. I restored the bare-metal engine
backup in the Hosted Engine on this host when deploying it, without
error. It currently has the Hosted Engine on it (as the only VM after
I put that host into maintenance to test the HA of Hosted Engine).
Sanlock log shows conflicts
I will look through all the logs for any other errors. Please let me
know if you need any logs or other clarification/information.
Thanks,
Campbell
On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote:
> Hi,
>
> can you please check the contents of
> /etc/ovirt-hosted-engine/hosted-engine.conf or
> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is
> right now) and search for host-id?
>
> Make sure the IDs are different. If they are not, then there is a bug somewhere.
>
> Martin
>
> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote:
>> I see this on the host it is trying to migrate in /var/log/sanlock:
>>
>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262
>>
>> The sanlock service is running. Why would this occur?
>>
>> Thanks,
>>
>> C
>>
>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote:
>>> Hi Martin,
>>>
>>> Thanks for the reply. I have done this, and the deployment completed
>>> without error. However, it still will not allow the Hosted Engine
>>> migrate to another host. The
>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>>> reports:
>>>
>>> --------------------8<-------------------
>>>
>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>>> High Availability Communications Broker...
>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>> Failed to read metadata from
>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>> Traceback (most
>>> recent call last):
>>> File
>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>> line 129, in get_raw_stats_for_service_type
>>> f =
>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>> OSError: [Errno 2]
>>> No such file or directory:
>>>
'/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>
>>> --------------------8<-------------------
>>>
>>> I checked the path, and it exists. I can run 'less -f' on it fine.
The
>>> perms are slightly different on the host that is running the VM vs the
>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
>>> this a san locking issue?
>>>
>>> Thanks for any help,
>>>
>>> Cam
>>>
>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com>
wrote:
>>>>> Should it be? It was not in the instructions for the migration from
>>>>> bare-metal to Hosted VM
>>>>
>>>> The hosted engine will only migrate to hosts that have the services
>>>> running. Please put one other host to maintenance and select Hosted
>>>> engine action: DEPLOY in the reinstall dialog.
>>>>
>>>> Best regards
>>>>
>>>> Martin Sivak
>>>>
>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 =
>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display
protocols
>>>>> as 4 and the hosted engine now appears in the list of VMs. I am
>>>>> guessing the compatibility version was causing it to use the 3.6
>>>>> version. However, I am still unable to migrate the engine VM to
>>>>> another host. When I try putting the host it is currently on into
>>>>> maintenance, it reports:
>>>>>
>>>>> Error while executing action: Cannot switch the Host(s) to
Maintenance mode.
>>>>> There are no available hosts capable of running the engine VM.
>>>>>
>>>>> Running 'hosted-engine --vm-status' still shows 'Engine
status:
>>>>> unknown stale-data'.
>>>>>
>>>>> The ovirt-ha-broker service is only running on one host. It was set
to
>>>>> 'disabled' in systemd. It won't start as there is no
>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
>>>>> Should it be? It was not in the instructions for the migration from
>>>>> bare-metal to Hosted VM
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Cam
>>>>>
>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>> Hi Tomas,
>>>>>>
>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties
on my
>>>>>> engine VM, I have:
>>>>>>
>>>>>> os.other.devices.display.protocols.value =
spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>> os.other.devices.display.protocols.value.3.6 =
spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>
>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should
not apply
>>>>>>
>>>>>> Is there somewhere else I should be looking?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Cam
>>>>>>
>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek
<tjelinek(a)redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>>>>> <michal.skrivanek(a)redhat.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>> >
>>>>>>>> > Tomas, what fields are needed in a VM to pass the
check that causes
>>>>>>>> > the following error?
>>>>>>>> >
>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23)
[] Validation of action
>>>>>>>> >>>>> 'ImportVm'
>>>>>>>> >>>>> failed for user SYSTEM. Reasons:
VAR__ACTION__IMPORT
>>>>>>>> >>>>>
>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>
>>>>>>>> to match the OS and VM Display type;-)
>>>>>>>> Configuration is in osinfo….e.g. if that is import from
older releases on
>>>>>>>> Linux this is typically caused by the cahgen of cirrus to
vga for non-SPICE
>>>>>>>> VMs
>>>>>>>
>>>>>>>
>>>>>>> yep, the default supported combinations for 4.0+ is this:
>>>>>>> os.other.devices.display.protocols.value =
>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >
>>>>>>>> > Thanks.
>>>>>>>> >
>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>> >> Hi Martin,
>>>>>>>> >>
>>>>>>>> >>>
>>>>>>>> >>> just as a random comment, do you still have
the database backup from
>>>>>>>> >>> the bare metal -> VM attempt? It might be
possible to just try again
>>>>>>>> >>> using it. Or in the worst case.. update the
offending value there
>>>>>>>> >>> before restoring it to the new engine
instance.
>>>>>>>> >>
>>>>>>>> >> I still have the backup. I'd rather do the
latter, as re-running the
>>>>>>>> >> HE deployment is quite lengthy and involved (I
have to re-initialise
>>>>>>>> >> the FC storage each time). Do you know what the
offending value(s)
>>>>>>>> >> would be? Would it be in the Postgres DB or in a
config file
>>>>>>>> >> somewhere?
>>>>>>>> >>
>>>>>>>> >> Cheers,
>>>>>>>> >>
>>>>>>>> >> Cam
>>>>>>>> >>
>>>>>>>> >>> Regards
>>>>>>>> >>>
>>>>>>>> >>> Martin Sivak
>>>>>>>> >>>
>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>> >>>> Hi Yanir,
>>>>>>>> >>>>
>>>>>>>> >>>> Thanks for the reply.
>>>>>>>> >>>>
>>>>>>>> >>>>> First of all, maybe a chain reaction
of :
>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23)
[] Validation of action
>>>>>>>> >>>>> 'ImportVm'
>>>>>>>> >>>>> failed for user SYSTEM. Reasons:
VAR__ACTION__IMPORT
>>>>>>>> >>>>>
>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>> >>>>> is causing the hosted engine vm not
to be set up correctly and
>>>>>>>> >>>>> further
>>>>>>>> >>>>> actions were made when the hosted
engine vm wasnt in a stable state.
>>>>>>>> >>>>>
>>>>>>>> >>>>> As for now, are you trying to revert
back to a previous/initial
>>>>>>>> >>>>> state ?
>>>>>>>> >>>>
>>>>>>>> >>>> I'm not trying to revert it to a
previous state for now. This was a
>>>>>>>> >>>> migration from a bare metal engine, and
it didn't report any error
>>>>>>>> >>>> during the migration. I'd had some
problems on my first attempts at
>>>>>>>> >>>> this migration, whereby it never
completed (due to a proxy issue) but
>>>>>>>> >>>> I managed to resolve this. Do you know
of a way to get the Hosted
>>>>>>>> >>>> Engine VM into a stable state, without
rebuilding the entire cluster
>>>>>>>> >>>> from scratch (since I have a lot of VMs
on it)?
>>>>>>>> >>>>
>>>>>>>> >>>> Thanks for any help.
>>>>>>>> >>>>
>>>>>>>> >>>> Regards,
>>>>>>>> >>>>
>>>>>>>> >>>> Cam
>>>>>>>> >>>>
>>>>>>>> >>>>> Regards,
>>>>>>>> >>>>> Yanir
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Hi Jenny/Martin,
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Any idea what I can do here? The
hosted engine VM has no log on any
>>>>>>>> >>>>>> host in /var/log/libvirt/qemu,
and I fear that if I need to put the
>>>>>>>> >>>>>> host into maintenance, e.g., to
upgrade it that I created it on
>>>>>>>> >>>>>> (which
>>>>>>>> >>>>>> I think is hosting it), or if it
fails for any reason, it won't get
>>>>>>>> >>>>>> migrated to another host, and I
will not be able to manage the
>>>>>>>> >>>>>> cluster. It seems to be a very
dangerous position to be in.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Thanks,
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Cam
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48
AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>> >>>>>>> Thanks Martin. The hosts are
all part of the same cluster.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> I get these errors in the
engine.log on the engine:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z
WARN
>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>> >>>>>>> 'ImportVm'
>>>>>>>> >>>>>>> failed for user SYST
>>>>>>>> >>>>>>> EM. Reasons:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z
INFO
>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>>>>>>> >>>>>>>
'EngineLock:{exclusiveLocks='[a
>>>>>>>> >>>>>>>
79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>>>>>>>> >>>>>>> HostedEngine=<VM_NAME,
ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>> >>>>>>> sharedLocks=
>>>>>>>> >>>>>>>
'[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z
ERROR
>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>>>>>>>> >>>>>>> Engine VM
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> The sanlock.log reports
conflicts on that same host, and a
>>>>>>>> >>>>>>> different
>>>>>>>> >>>>>>> error on the other hosts,
not sure if they are related.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> And this in the
/var/log/ovirt-hosted-engine-ha/agent log on the
>>>>>>>> >>>>>>> host
>>>>>>>> >>>>>>> which I deployed the hosted
engine VM on:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>> >>>>>>> Unable to extract HEVM OVF
>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>> >>>>>>> Failed extracting VM OVF
from the OVF_STORE volume, falling back
>>>>>>>> >>>>>>> to
>>>>>>>> >>>>>>> initial vm.conf
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> I've seen some of these
issues reported in bugzilla, but they were
>>>>>>>> >>>>>>> for
>>>>>>>> >>>>>>> older versions of oVirt (and
appear to be resolved).
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> I will install that package
on the other two hosts, for which I
>>>>>>>> >>>>>>> will
>>>>>>>> >>>>>>> put them in maintenance as
vdsm is installed as an upgrade. I
>>>>>>>> >>>>>>> guess
>>>>>>>> >>>>>>> restarting vdsm is a good
idea after that?
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Thanks,
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Campbell
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at
10:51 AM, Martin Sivak <msivak(a)redhat.com>
>>>>>>>> >>>>>>> wrote:
>>>>>>>> >>>>>>>> Hi,
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> you do not have to
install it on all hosts. But you should have
>>>>>>>> >>>>>>>> more
>>>>>>>> >>>>>>>> than one and ideally all
hosted engine enabled nodes should
>>>>>>>> >>>>>>>> belong to
>>>>>>>> >>>>>>>> the same engine
cluster.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Best regards
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Martin Sivak
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at
11:29 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>> >>>>>>>>> Hi Jenny,
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Does
ovirt-hosted-engine-ha need to be installed across all
>>>>>>>> >>>>>>>>> hosts?
>>>>>>>> >>>>>>>>> Could that be the
reason it is failing to see it properly?
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Cam
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017
at 1:27 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>> >>>>>>>>>> Hi Jenny,
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Logs are
attached. I can see errors in there, but am unsure how
>>>>>>>> >>>>>>>>>> they
>>>>>>>> >>>>>>>>>> arose.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Thanks,
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Campbell
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> On Mon, Jun 19,
2017 at 12:29 PM, Evgenia Tokar
>>>>>>>> >>>>>>>>>>
<etokar(a)redhat.com>
>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>> >>>>>>>>>>> From the
output it looks like the agent is down, try starting
>>>>>>>> >>>>>>>>>>> it by
>>>>>>>> >>>>>>>>>>> running:
>>>>>>>> >>>>>>>>>>> systemctl
start ovirt-ha-agent.
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> The engine
is supposed to see the hosted engine storage domain
>>>>>>>> >>>>>>>>>>> and
>>>>>>>> >>>>>>>>>>> import it
>>>>>>>> >>>>>>>>>>> to the
system, then it should import the hosted engine vm.
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> Can you
attach the agent log from the host
>>>>>>>> >>>>>>>>>>>
(/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>> >>>>>>>>>>> and the
engine log from the engine vm
>>>>>>>> >>>>>>>>>>>
(/var/log/ovirt-engine/engine.log)?
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>> >>>>>>>>>>> Jenny
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> On Mon, Jun
19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com>
>>>>>>>> >>>>>>>>>>> wrote:
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> Hi
Jenny,
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> What
version are you running?
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
4.1.2.2-1.el7.centos
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> For
the hosted engine vm to be imported and displayed in the
>>>>>>>> >>>>>>>>>>>>>
engine, you
>>>>>>>> >>>>>>>>>>>>> must
first create a master storage domain.
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> To
provide a bit more detail: this was a migration of a
>>>>>>>> >>>>>>>>>>>>
bare-metal
>>>>>>>> >>>>>>>>>>>> engine
in an existing cluster to a hosted engine VM for that
>>>>>>>> >>>>>>>>>>>>
cluster.
>>>>>>>> >>>>>>>>>>>> As part
of this migration, I built an entirely new host and
>>>>>>>> >>>>>>>>>>>> ran
>>>>>>>> >>>>>>>>>>>>
'hosted-engine --deploy' (followed these instructions:
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...).
>>>>>>>> >>>>>>>>>>>> I
restored the backup from the engine and it completed
>>>>>>>> >>>>>>>>>>>> without
any
>>>>>>>> >>>>>>>>>>>> errors.
I didn't see any instructions regarding a master
>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>> >>>>>>>>>>>> domain
in the page above. The cluster has two existing master
>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>> >>>>>>>>>>>> domains,
one is fibre channel, which is up, and one ISO
>>>>>>>> >>>>>>>>>>>> domain,
>>>>>>>> >>>>>>>>>>>> which
>>>>>>>> >>>>>>>>>>>> is
currently offline.
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> What
do you mean the hosted engine commands are failing?
>>>>>>>> >>>>>>>>>>>>>
What
>>>>>>>> >>>>>>>>>>>>>
happens
>>>>>>>> >>>>>>>>>>>>>
when
>>>>>>>> >>>>>>>>>>>>> you
run hosted-engine --vm-status now?
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
Interestingly, whereas when I ran it before, it exited with
>>>>>>>> >>>>>>>>>>>> no
>>>>>>>> >>>>>>>>>>>> output
>>>>>>>> >>>>>>>>>>>> and a
return code of '1', it now reports:
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> --==
Host 1 status ==--
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
conf_on_shared_storage : True
>>>>>>>> >>>>>>>>>>>> Status
up-to-date : False
>>>>>>>> >>>>>>>>>>>> Hostname
:
>>>>>>>> >>>>>>>>>>>>
kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>> >>>>>>>>>>>> Host ID
: 1
>>>>>>>> >>>>>>>>>>>> Engine
status : unknown stale-data
>>>>>>>> >>>>>>>>>>>> Score
: 0
>>>>>>>> >>>>>>>>>>>> stopped
: True
>>>>>>>> >>>>>>>>>>>> Local
maintenance : False
>>>>>>>> >>>>>>>>>>>> crc32
: 0217f07b
>>>>>>>> >>>>>>>>>>>>
local_conf_timestamp : 2911
>>>>>>>> >>>>>>>>>>>> Host
timestamp : 2897
>>>>>>>> >>>>>>>>>>>> Extra
metadata (valid at timestamp):
>>>>>>>> >>>>>>>>>>>>
metadata_parse_version=1
>>>>>>>> >>>>>>>>>>>>
metadata_feature_version=1
>>>>>>>> >>>>>>>>>>>>
timestamp=2897 (Thu Jun 15 16:22:54 2017)
>>>>>>>> >>>>>>>>>>>>
host-id=1
>>>>>>>> >>>>>>>>>>>>
score=0
>>>>>>>> >>>>>>>>>>>>
vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017)
>>>>>>>> >>>>>>>>>>>>
conf_on_shared_storage=True
>>>>>>>> >>>>>>>>>>>>
maintenance=False
>>>>>>>> >>>>>>>>>>>>
state=AgentStopped
>>>>>>>> >>>>>>>>>>>>
stopped=True
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> Yet I
can login to the web GUI fine. I guess it is not HA due
>>>>>>>> >>>>>>>>>>>> to
>>>>>>>> >>>>>>>>>>>> being
>>>>>>>> >>>>>>>>>>>> in an
unknown state currently? Does the hosted-engine-ha rpm
>>>>>>>> >>>>>>>>>>>> need
>>>>>>>> >>>>>>>>>>>> to
>>>>>>>> >>>>>>>>>>>> be
installed across all nodes in the cluster, btw?
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> Thanks
for the help,
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> Cam
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>
Jenny Tokar
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> On
Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com>
>>>>>>>> >>>>>>>>>>>>>
wrote:
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
Hi,
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
I've migrated from a bare-metal engine to a hosted engine.
>>>>>>>> >>>>>>>>>>>>>>
There
>>>>>>>> >>>>>>>>>>>>>>
were
>>>>>>>> >>>>>>>>>>>>>>
no errors during the install, however, the hosted engine
>>>>>>>> >>>>>>>>>>>>>>
did not
>>>>>>>> >>>>>>>>>>>>>>
get
>>>>>>>> >>>>>>>>>>>>>>
started. I tried running:
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
hosted-engine --status
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
on the host I deployed it on, and it returns nothing (exit
>>>>>>>> >>>>>>>>>>>>>>
code
>>>>>>>> >>>>>>>>>>>>>>
is 1
>>>>>>>> >>>>>>>>>>>>>>
however). I could not ping it either. So I tried starting
>>>>>>>> >>>>>>>>>>>>>>
it via
>>>>>>>> >>>>>>>>>>>>>>
'hosted-engine --vm-start' and it returned:
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
Virtual machine does not exist
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
But it then became available. I logged into it
>>>>>>>> >>>>>>>>>>>>>>
successfully. It
>>>>>>>> >>>>>>>>>>>>>>
is not
>>>>>>>> >>>>>>>>>>>>>>
in the list of VMs however.
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
Any ideas why the hosted-engine commands fail, and why it
>>>>>>>> >>>>>>>>>>>>>>
is not
>>>>>>>> >>>>>>>>>>>>>>
in
>>>>>>>> >>>>>>>>>>>>>>
the list of virtual machines?
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
Thanks for any help,
>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>>
Cam
>>>>>>>> >>>>>>>>>>>>>>
_______________________________________________
>>>>>>>> >>>>>>>>>>>>>>
Users mailing list
>>>>>>>> >>>>>>>>>>>>>>
Users(a)ovirt.org
>>>>>>>> >>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>
_______________________________________________
>>>>>>>> >>>>>>>>> Users mailing list
>>>>>>>> >>>>>>>>> Users(a)ovirt.org
>>>>>>>> >>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>> >>>>>>
_______________________________________________
>>>>>>>> >>>>>> Users mailing list
>>>>>>>> >>>>>> Users(a)ovirt.org
>>>>>>>> >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> > _______________________________________________
>>>>>>>> > Users mailing list
>>>>>>>> > Users(a)ovirt.org
>>>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>