Change the ids so they are distinct. I need to check if there is a way
to read the SPM ids from the engine as using the same numbers would be
the best.
Martin
On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote:
Is there any way of recovering from this situation? I'd prefer to
fix
the issue rather than re-deploy, but if there is no recovery path, I
could perhaps try re-deploying the hosted engine. In which case, would
the best option be to take a backup of the Hosted Engine, and then
shut it down, re-initialise the SAN partition (or use another
partition) and retry the deployment? Would it be better to use the
older backup from the bare metal engine that I originally used, or use
a backup from the Hosted Engine? I'm not sure if any VMs have been
added since switching to Hosted Engine.
Unfortunately I have very little time left to get this working before
I have to hand it over for eval (by end of Friday).
Here are some log snippets from the cluster that are current
In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock]
Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
3) (clusterlock:282)
2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
Error acquiring host id 3 for domain
207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
self.domain.acquireHostId(self.hostId, async=True)
File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
self._manifest.acquireHostId(hostId, async)
File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
self._domainLock.acquireHostId(hostId, async)
File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
line 297, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id:
('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
lockspace add failure', 'Invalid argument'))
From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
MainThread::ERROR::2017-06-19
13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-19
13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Error while monitoring engine: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-19
13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Unexpected error
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 443, in start_monitoring
self._initialize_domain_monitor()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::ERROR::2017-06-19
13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
From sanlock.log:
2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
conflicts with name of list1 s5
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
From the two other hosts:
host 2:
vdsm.log
2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
Internal server error (__init__:570)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
565, in _handle_request
res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
202, in _dynamicMethod
result = fn(*methodArgs)
File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies
'current_values': v.getIoTune()}
File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
result = self.getIoTuneResponse()
File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse
res = self._dom.blockIoTune(
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
47, in __getattr__
% self.vmid)
NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
started yet or was shut down
/var/log/ovirt-hosted-engine-ha/agent.log
MainThread::INFO::2017-06-29
10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
MainThread::INFO::2017-06-29
10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2017-06-29
10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
OVF_STORE volume path:
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
MainThread::INFO::2017-06-29
10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Found an OVF for HE VM, trying to convert
MainThread::INFO::2017-06-29
10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Got vm.conf from OVF_STORE
MainThread::INFO::2017-06-29
10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017
MainThread::INFO::2017-06-29
10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUnexpectedlyDown (score: 0)
MainThread::INFO::2017-06-29
10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
Reloading vm.conf from the shared storage domain
/var/log/messages:
Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition!
host 1:
/var/log/messages also in sanlock.log
Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100
678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100
678326 [24159]: s4531 add_lockspace fail result -262
/var/log/ovirt-hosted-engine-ha/agent.log:
MainThread::ERROR::2017-06-27
15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-27
15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Error while monitoring engine: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-27
15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Unexpected error
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 443, in start_monitoring
self._initialize_domain_monitor()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::ERROR::2017-06-27
15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
MainThread::INFO::2017-06-27
15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
VDSM domain monitor status: PENDING
MainThread::INFO::2017-06-27
15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
Failed to stop monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is
member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f'
MainThread::INFO::2017-06-27
15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
Thanks for any help,
Cam
On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote:
> Hi Martin,
>
> yes, on two of the machines they have the same host_id. The other has
> a different host_id.
>
> To update since yesterday: I reinstalled and deployed Hosted Engine on
> the other host (so all three hosts in the cluster now have it
> installed). The second one I deployed said it was able to host the
> engine (unlike the first I reinstalled), so I tried putting the host
> with the Hosted Engine on it into maintenance to see if it would
> migrate over. It managed to move all hosts but the Hosted Engine. And
> now the host that said it was able to host the engine says
> 'unavailable due to HA score'. The host that it was trying to move
> from is now in 'preparing for maintenance' for the last 12 hours.
>
> The summary is:
>
> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled
> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted
> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf.
> 'add_lockspace' fails in sanlock.log
>
> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine
> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon
> saying that it was able to host the Hosted Engine, but after migration
> was attempted when putting kvm-ldn-03 into maintenance, it reports:
> 'unavailable due to HA score'. It has a host_id of '1' in
> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log
>
> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was
> not part of the original cluster. I restored the bare-metal engine
> backup in the Hosted Engine on this host when deploying it, without
> error. It currently has the Hosted Engine on it (as the only VM after
> I put that host into maintenance to test the HA of Hosted Engine).
> Sanlock log shows conflicts
>
> I will look through all the logs for any other errors. Please let me
> know if you need any logs or other clarification/information.
>
> Thanks,
>
> Campbell
>
> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote:
>> Hi,
>>
>> can you please check the contents of
>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is
>> right now) and search for host-id?
>>
>> Make sure the IDs are different. If they are not, then there is a bug somewhere.
>>
>> Martin
>>
>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote:
>>> I see this on the host it is trying to migrate in /var/log/sanlock:
>>>
>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262
>>>
>>> The sanlock service is running. Why would this occur?
>>>
>>> Thanks,
>>>
>>> C
>>>
>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote:
>>>> Hi Martin,
>>>>
>>>> Thanks for the reply. I have done this, and the deployment completed
>>>> without error. However, it still will not allow the Hosted Engine
>>>> migrate to another host. The
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>>>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>>>> reports:
>>>>
>>>> --------------------8<-------------------
>>>>
>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>>>> High Availability Communications Broker...
>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>>> Failed to read metadata from
>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>> Traceback (most
>>>> recent call last):
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>> line 129, in get_raw_stats_for_service_type
>>>> f =
>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>>> OSError: [Errno 2]
>>>> No such file or directory:
>>>>
'/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>
>>>> --------------------8<-------------------
>>>>
>>>> I checked the path, and it exists. I can run 'less -f' on it
fine. The
>>>> perms are slightly different on the host that is running the VM vs the
>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
>>>> this a san locking issue?
>>>>
>>>> Thanks for any help,
>>>>
>>>> Cam
>>>>
>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com>
wrote:
>>>>>> Should it be? It was not in the instructions for the migration
from
>>>>>> bare-metal to Hosted VM
>>>>>
>>>>> The hosted engine will only migrate to hosts that have the services
>>>>> running. Please put one other host to maintenance and select Hosted
>>>>> engine action: DEPLOY in the reinstall dialog.
>>>>>
>>>>> Best regards
>>>>>
>>>>> Martin Sivak
>>>>>
>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6
=
>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display
protocols
>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am
>>>>>> guessing the compatibility version was causing it to use the 3.6
>>>>>> version. However, I am still unable to migrate the engine VM to
>>>>>> another host. When I try putting the host it is currently on
into
>>>>>> maintenance, it reports:
>>>>>>
>>>>>> Error while executing action: Cannot switch the Host(s) to
Maintenance mode.
>>>>>> There are no available hosts capable of running the engine VM.
>>>>>>
>>>>>> Running 'hosted-engine --vm-status' still shows
'Engine status:
>>>>>> unknown stale-data'.
>>>>>>
>>>>>> The ovirt-ha-broker service is only running on one host. It was
set to
>>>>>> 'disabled' in systemd. It won't start as there is no
>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two
hosts.
>>>>>> Should it be? It was not in the instructions for the migration
from
>>>>>> bare-metal to Hosted VM
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Cam
>>>>>>
>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>>> Hi Tomas,
>>>>>>>
>>>>>>> So in my
/usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>>>>>>> engine VM, I have:
>>>>>>>
>>>>>>> os.other.devices.display.protocols.value =
spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>> os.other.devices.display.protocols.value.3.6 =
spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>
>>>>>>> That seems to match - I assume since this is 4.1, the 3.6
should not apply
>>>>>>>
>>>>>>> Is there somewhere else I should be looking?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Cam
>>>>>>>
>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek
<tjelinek(a)redhat.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>>>>>> <michal.skrivanek(a)redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Tomas, what fields are needed in a VM to pass
the check that causes
>>>>>>>>> > the following error?
>>>>>>>>> >
>>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>> >>>>> failed for user SYSTEM. Reasons:
VAR__ACTION__IMPORT
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>
>>>>>>>>> to match the OS and VM Display type;-)
>>>>>>>>> Configuration is in osinfo….e.g. if that is import
from older releases on
>>>>>>>>> Linux this is typically caused by the cahgen of
cirrus to vga for non-SPICE
>>>>>>>>> VMs
>>>>>>>>
>>>>>>>>
>>>>>>>> yep, the default supported combinations for 4.0+ is
this:
>>>>>>>> os.other.devices.display.protocols.value =
>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > Thanks.
>>>>>>>>> >
>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>> >> Hi Martin,
>>>>>>>>> >>
>>>>>>>>> >>>
>>>>>>>>> >>> just as a random comment, do you still
have the database backup from
>>>>>>>>> >>> the bare metal -> VM attempt? It
might be possible to just try again
>>>>>>>>> >>> using it. Or in the worst case.. update
the offending value there
>>>>>>>>> >>> before restoring it to the new engine
instance.
>>>>>>>>> >>
>>>>>>>>> >> I still have the backup. I'd rather do
the latter, as re-running the
>>>>>>>>> >> HE deployment is quite lengthy and involved
(I have to re-initialise
>>>>>>>>> >> the FC storage each time). Do you know what
the offending value(s)
>>>>>>>>> >> would be? Would it be in the Postgres DB or
in a config file
>>>>>>>>> >> somewhere?
>>>>>>>>> >>
>>>>>>>>> >> Cheers,
>>>>>>>>> >>
>>>>>>>>> >> Cam
>>>>>>>>> >>
>>>>>>>>> >>> Regards
>>>>>>>>> >>>
>>>>>>>>> >>> Martin Sivak
>>>>>>>>> >>>
>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>> >>>> Hi Yanir,
>>>>>>>>> >>>>
>>>>>>>>> >>>> Thanks for the reply.
>>>>>>>>> >>>>
>>>>>>>>> >>>>> First of all, maybe a chain
reaction of :
>>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>> >>>>> failed for user SYSTEM. Reasons:
VAR__ACTION__IMPORT
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>> >>>>> is causing the hosted engine vm
not to be set up correctly and
>>>>>>>>> >>>>> further
>>>>>>>>> >>>>> actions were made when the
hosted engine vm wasnt in a stable state.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> As for now, are you trying to
revert back to a previous/initial
>>>>>>>>> >>>>> state ?
>>>>>>>>> >>>>
>>>>>>>>> >>>> I'm not trying to revert it to a
previous state for now. This was a
>>>>>>>>> >>>> migration from a bare metal engine,
and it didn't report any error
>>>>>>>>> >>>> during the migration. I'd had
some problems on my first attempts at
>>>>>>>>> >>>> this migration, whereby it never
completed (due to a proxy issue) but
>>>>>>>>> >>>> I managed to resolve this. Do you
know of a way to get the Hosted
>>>>>>>>> >>>> Engine VM into a stable state,
without rebuilding the entire cluster
>>>>>>>>> >>>> from scratch (since I have a lot of
VMs on it)?
>>>>>>>>> >>>>
>>>>>>>>> >>>> Thanks for any help.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Regards,
>>>>>>>>> >>>>
>>>>>>>>> >>>> Cam
>>>>>>>>> >>>>
>>>>>>>>> >>>>> Regards,
>>>>>>>>> >>>>> Yanir
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM,
cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Hi Jenny/Martin,
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Any idea what I can do here?
The hosted engine VM has no log on any
>>>>>>>>> >>>>>> host in
/var/log/libvirt/qemu, and I fear that if I need to put the
>>>>>>>>> >>>>>> host into maintenance, e.g.,
to upgrade it that I created it on
>>>>>>>>> >>>>>> (which
>>>>>>>>> >>>>>> I think is hosting it), or
if it fails for any reason, it won't get
>>>>>>>>> >>>>>> migrated to another host,
and I will not be able to manage the
>>>>>>>>> >>>>>> cluster. It seems to be a
very dangerous position to be in.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Thanks,
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Cam
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at
11:48 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>> >>>>>>> Thanks Martin. The hosts
are all part of the same cluster.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> I get these errors in
the engine.log on the engine:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z
WARN
>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>> >>>>>>> 'ImportVm'
>>>>>>>>> >>>>>>> failed for user SYST
>>>>>>>>> >>>>>>> EM. Reasons:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z
INFO
>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>>>>>>>> >>>>>>>
'EngineLock:{exclusiveLocks='[a
>>>>>>>>> >>>>>>>
79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>>>>>>>>> >>>>>>>
HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>> >>>>>>> sharedLocks=
>>>>>>>>> >>>>>>>
'[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z
ERROR
>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>>>>>>>>> >>>>>>> Engine VM
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> The sanlock.log reports
conflicts on that same host, and a
>>>>>>>>> >>>>>>> different
>>>>>>>>> >>>>>>> error on the other
hosts, not sure if they are related.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> And this in the
/var/log/ovirt-hosted-engine-ha/agent log on the
>>>>>>>>> >>>>>>> host
>>>>>>>>> >>>>>>> which I deployed the
hosted engine VM on:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>> >>>>>>> Unable to extract HEVM
OVF
>>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>> >>>>>>> Failed extracting VM OVF
from the OVF_STORE volume, falling back
>>>>>>>>> >>>>>>> to
>>>>>>>>> >>>>>>> initial vm.conf
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> I've seen some of
these issues reported in bugzilla, but they were
>>>>>>>>> >>>>>>> for
>>>>>>>>> >>>>>>> older versions of oVirt
(and appear to be resolved).
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> I will install that
package on the other two hosts, for which I
>>>>>>>>> >>>>>>> will
>>>>>>>>> >>>>>>> put them in maintenance
as vdsm is installed as an upgrade. I
>>>>>>>>> >>>>>>> guess
>>>>>>>>> >>>>>>> restarting vdsm is a
good idea after that?
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Campbell
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at
10:51 AM, Martin Sivak <msivak(a)redhat.com>
>>>>>>>>> >>>>>>> wrote:
>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> you do not have to
install it on all hosts. But you should have
>>>>>>>>> >>>>>>>> more
>>>>>>>>> >>>>>>>> than one and ideally
all hosted engine enabled nodes should
>>>>>>>>> >>>>>>>> belong to
>>>>>>>>> >>>>>>>> the same engine
cluster.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Best regards
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Martin Sivak
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017
at 11:29 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>> >>>>>>>>> Hi Jenny,
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Does
ovirt-hosted-engine-ha need to be installed across all
>>>>>>>>> >>>>>>>>> hosts?
>>>>>>>>> >>>>>>>>> Could that be
the reason it is failing to see it properly?
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Cam
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> On Mon, Jun 19,
2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>> >>>>>>>>>> Hi Jenny,
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Logs are
attached. I can see errors in there, but am unsure how
>>>>>>>>> >>>>>>>>>> they
>>>>>>>>> >>>>>>>>>> arose.
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Campbell
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> On Mon, Jun
19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>> >>>>>>>>>>
<etokar(a)redhat.com>
>>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>>> >>>>>>>>>>> From the
output it looks like the agent is down, try starting
>>>>>>>>> >>>>>>>>>>> it by
>>>>>>>>> >>>>>>>>>>>
running:
>>>>>>>>> >>>>>>>>>>>
systemctl start ovirt-ha-agent.
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> The
engine is supposed to see the hosted engine storage domain
>>>>>>>>> >>>>>>>>>>> and
>>>>>>>>> >>>>>>>>>>> import
it
>>>>>>>>> >>>>>>>>>>> to the
system, then it should import the hosted engine vm.
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> Can you
attach the agent log from the host
>>>>>>>>> >>>>>>>>>>>
(/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>> >>>>>>>>>>> and the
engine log from the engine vm
>>>>>>>>> >>>>>>>>>>>
(/var/log/ovirt-engine/engine.log)?
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>>> Jenny
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> On Mon,
Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com>
>>>>>>>>> >>>>>>>>>>> wrote:
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> Hi
Jenny,
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
What version are you running?
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>
4.1.2.2-1.el7.centos
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
For the hosted engine vm to be imported and displayed in the
>>>>>>>>> >>>>>>>>>>>>>
engine, you
>>>>>>>>> >>>>>>>>>>>>>
must first create a master storage domain.
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> To
provide a bit more detail: this was a migration of a
>>>>>>>>> >>>>>>>>>>>>
bare-metal
>>>>>>>>> >>>>>>>>>>>>
engine in an existing cluster to a hosted engine VM for that
>>>>>>>>> >>>>>>>>>>>>
cluster.
>>>>>>>>> >>>>>>>>>>>> As
part of this migration, I built an entirely new host and
>>>>>>>>> >>>>>>>>>>>> ran
>>>>>>>>> >>>>>>>>>>>>
'hosted-engine --deploy' (followed these instructions:
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>
http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...).
>>>>>>>>> >>>>>>>>>>>> I
restored the backup from the engine and it completed
>>>>>>>>> >>>>>>>>>>>>
without any
>>>>>>>>> >>>>>>>>>>>>
errors. I didn't see any instructions regarding a master
>>>>>>>>> >>>>>>>>>>>>
storage
>>>>>>>>> >>>>>>>>>>>>
domain in the page above. The cluster has two existing master
>>>>>>>>> >>>>>>>>>>>>
storage
>>>>>>>>> >>>>>>>>>>>>
domains, one is fibre channel, which is up, and one ISO
>>>>>>>>> >>>>>>>>>>>>
domain,
>>>>>>>>> >>>>>>>>>>>>
which
>>>>>>>>> >>>>>>>>>>>> is
currently offline.
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
What do you mean the hosted engine commands are failing?
>>>>>>>>> >>>>>>>>>>>>>
What
>>>>>>>>> >>>>>>>>>>>>>
happens
>>>>>>>>> >>>>>>>>>>>>>
when
>>>>>>>>> >>>>>>>>>>>>>
you run hosted-engine --vm-status now?
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>
Interestingly, whereas when I ran it before, it exited with
>>>>>>>>> >>>>>>>>>>>> no
>>>>>>>>> >>>>>>>>>>>>
output
>>>>>>>>> >>>>>>>>>>>> and
a return code of '1', it now reports:
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> --==
Host 1 status ==--
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>
conf_on_shared_storage : True
>>>>>>>>> >>>>>>>>>>>>
Status up-to-date : False
>>>>>>>>> >>>>>>>>>>>>
Hostname :
>>>>>>>>> >>>>>>>>>>>>
kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>> >>>>>>>>>>>> Host
ID : 1
>>>>>>>>> >>>>>>>>>>>>
Engine status : unknown stale-data
>>>>>>>>> >>>>>>>>>>>>
Score : 0
>>>>>>>>> >>>>>>>>>>>>
stopped : True
>>>>>>>>> >>>>>>>>>>>>
Local maintenance : False
>>>>>>>>> >>>>>>>>>>>>
crc32 : 0217f07b
>>>>>>>>> >>>>>>>>>>>>
local_conf_timestamp : 2911
>>>>>>>>> >>>>>>>>>>>> Host
timestamp : 2897
>>>>>>>>> >>>>>>>>>>>>
Extra metadata (valid at timestamp):
>>>>>>>>> >>>>>>>>>>>>
metadata_parse_version=1
>>>>>>>>> >>>>>>>>>>>>
metadata_feature_version=1
>>>>>>>>> >>>>>>>>>>>>
timestamp=2897 (Thu Jun 15 16:22:54 2017)
>>>>>>>>> >>>>>>>>>>>>
host-id=1
>>>>>>>>> >>>>>>>>>>>>
score=0
>>>>>>>>> >>>>>>>>>>>>
vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017)
>>>>>>>>> >>>>>>>>>>>>
conf_on_shared_storage=True
>>>>>>>>> >>>>>>>>>>>>
maintenance=False
>>>>>>>>> >>>>>>>>>>>>
state=AgentStopped
>>>>>>>>> >>>>>>>>>>>>
stopped=True
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> Yet
I can login to the web GUI fine. I guess it is not HA due
>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>> >>>>>>>>>>>>
being
>>>>>>>>> >>>>>>>>>>>> in
an unknown state currently? Does the hosted-engine-ha rpm
>>>>>>>>> >>>>>>>>>>>>
need
>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>> >>>>>>>>>>>> be
installed across all nodes in the cluster, btw?
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>
Thanks for the help,
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> Cam
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
Jenny Tokar
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com>
>>>>>>>>> >>>>>>>>>>>>>
wrote:
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> I've migrated from a
bare-metal engine to a hosted engine.
>>>>>>>>>
>>>>>>>>>>>>>> There
>>>>>>>>>
>>>>>>>>>>>>>> were
>>>>>>>>>
>>>>>>>>>>>>>> no errors during the install,
however, the hosted engine
>>>>>>>>>
>>>>>>>>>>>>>> did not
>>>>>>>>>
>>>>>>>>>>>>>> get
>>>>>>>>>
>>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> on the host I deployed it on, and
it returns nothing (exit
>>>>>>>>>
>>>>>>>>>>>>>> code
>>>>>>>>>
>>>>>>>>>>>>>> is 1
>>>>>>>>>
>>>>>>>>>>>>>> however). I could not ping it
either. So I tried starting
>>>>>>>>>
>>>>>>>>>>>>>> it via
>>>>>>>>>
>>>>>>>>>>>>>> 'hosted-engine
--vm-start' and it returned:
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> But it then became available. I
logged into it
>>>>>>>>>
>>>>>>>>>>>>>> successfully. It
>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>
>>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> Any ideas why the hosted-engine
commands fail, and why it
>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>
>>>>>>>>>>>>>> in
>>>>>>>>>
>>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>
>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>
>>>>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>
>>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>
_______________________________________________
>>>>>>>>> >>>>>>>>> Users mailing
list
>>>>>>>>> >>>>>>>>> Users(a)ovirt.org
>>>>>>>>> >>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>> >>>>>>
_______________________________________________
>>>>>>>>> >>>>>> Users mailing list
>>>>>>>>> >>>>>> Users(a)ovirt.org
>>>>>>>>> >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> > _______________________________________________
>>>>>>>>> > Users mailing list
>>>>>>>>> > Users(a)ovirt.org
>>>>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>