Thanks Martin, do I have to restart anything? When I try to use the
'migrate' operation, it complains that the other two hosts 'did not
satisfy internal filter HA because it is not a Hosted Engine host..'
(even though I reinstalled both these hosts with the 'deploy hosted
engine' option, which suggests that something needs restarting. Should
I worry about the sanlock errors, or will that be resolved by the
change in host_id?
Kind regards,
Cam
On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote:
Change the ids so they are distinct. I need to check if there is a
way
to read the SPM ids from the engine as using the same numbers would be
the best.
Martin
On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote:
> Is there any way of recovering from this situation? I'd prefer to fix
> the issue rather than re-deploy, but if there is no recovery path, I
> could perhaps try re-deploying the hosted engine. In which case, would
> the best option be to take a backup of the Hosted Engine, and then
> shut it down, re-initialise the SAN partition (or use another
> partition) and retry the deployment? Would it be better to use the
> older backup from the bare metal engine that I originally used, or use
> a backup from the Hosted Engine? I'm not sure if any VMs have been
> added since switching to Hosted Engine.
>
> Unfortunately I have very little time left to get this working before
> I have to hand it over for eval (by end of Friday).
>
> Here are some log snippets from the cluster that are current
>
> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>
> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock]
> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
> 3) (clusterlock:282)
> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
> Error acquiring host id 3 for domain
> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
> self.domain.acquireHostId(self.hostId, async=True)
> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
> self._manifest.acquireHostId(hostId, async)
> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
> self._domainLock.acquireHostId(hostId, async)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
> line 297, in acquireHostId
> raise se.AcquireHostIdFailure(self._sdUUID, e)
> AcquireHostIdFailure: Cannot acquire host id:
> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
> lockspace add failure', 'Invalid argument'))
>
> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>
> MainThread::ERROR::2017-06-19
>
13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
> Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::WARNING::2017-06-19
>
13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Error while monitoring engine: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::WARNING::2017-06-19
>
13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Unexpected error
> Traceback (most recent call last):
> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 443, in start_monitoring
> self._initialize_domain_monitor()
> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 823, in _initialize_domain_monitor
> raise Exception(msg)
> Exception: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::ERROR::2017-06-19
>
13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Shutting down the agent because of 3 failures in a row!
>
> From sanlock.log:
>
> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
> conflicts with name of list1 s5
>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>
> From the two other hosts:
>
> host 2:
>
> vdsm.log
>
> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
> Internal server error (__init__:570)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
> 565, in _handle_request
> res = method(**params)
> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
> 202, in _dynamicMethod
> result = fn(*methodArgs)
> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies
> 'current_values': v.getIoTune()}
> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
> result = self.getIoTuneResponse()
> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse
> res = self._dom.blockIoTune(
> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
> 47, in __getattr__
> % self.vmid)
> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
> started yet or was shut down
>
> /var/log/ovirt-hosted-engine-ha/agent.log
>
> MainThread::INFO::2017-06-29
>
10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
> MainThread::INFO::2017-06-29
>
10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
> Extracting Engine VM OVF from the OVF_STORE
> MainThread::INFO::2017-06-29
>
10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
> OVF_STORE volume path:
>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
> MainThread::INFO::2017-06-29
>
10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> Found an OVF for HE VM, trying to convert
> MainThread::INFO::2017-06-29
>
10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> Got vm.conf from OVF_STORE
> MainThread::INFO::2017-06-29
>
10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017
> MainThread::INFO::2017-06-29
>
10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUnexpectedlyDown (score: 0)
> MainThread::INFO::2017-06-29
>
10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
> Reloading vm.conf from the shared storage domain
>
> /var/log/messages:
>
> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition!
>
>
> host 1:
>
> /var/log/messages also in sanlock.log
>
> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100
> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100
> 678326 [24159]: s4531 add_lockspace fail result -262
>
> /var/log/ovirt-hosted-engine-ha/agent.log:
>
> MainThread::ERROR::2017-06-27
>
15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
> Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::WARNING::2017-06-27
>
15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Error while monitoring engine: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::WARNING::2017-06-27
>
15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Unexpected error
> Traceback (most recent call last):
> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 443, in start_monitoring
> self._initialize_domain_monitor()
> File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 823, in _initialize_domain_monitor
> raise Exception(msg)
> Exception: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::ERROR::2017-06-27
>
15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Shutting down the agent because of 3 failures in a row!
> MainThread::INFO::2017-06-27
>
15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
> VDSM domain monitor status: PENDING
> MainThread::INFO::2017-06-27
>
15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
> Failed to stop monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is
> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f'
> MainThread::INFO::2017-06-27
> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> Agent shutting down
>
>
> Thanks for any help,
>
>
> Cam
>
>
> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote:
>> Hi Martin,
>>
>> yes, on two of the machines they have the same host_id. The other has
>> a different host_id.
>>
>> To update since yesterday: I reinstalled and deployed Hosted Engine on
>> the other host (so all three hosts in the cluster now have it
>> installed). The second one I deployed said it was able to host the
>> engine (unlike the first I reinstalled), so I tried putting the host
>> with the Hosted Engine on it into maintenance to see if it would
>> migrate over. It managed to move all hosts but the Hosted Engine. And
>> now the host that said it was able to host the engine says
>> 'unavailable due to HA score'. The host that it was trying to move
>> from is now in 'preparing for maintenance' for the last 12 hours.
>>
>> The summary is:
>>
>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled
>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted
>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf.
>> 'add_lockspace' fails in sanlock.log
>>
>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine
>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon
>> saying that it was able to host the Hosted Engine, but after migration
>> was attempted when putting kvm-ldn-03 into maintenance, it reports:
>> 'unavailable due to HA score'. It has a host_id of '1' in
>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log
>>
>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was
>> not part of the original cluster. I restored the bare-metal engine
>> backup in the Hosted Engine on this host when deploying it, without
>> error. It currently has the Hosted Engine on it (as the only VM after
>> I put that host into maintenance to test the HA of Hosted Engine).
>> Sanlock log shows conflicts
>>
>> I will look through all the logs for any other errors. Please let me
>> know if you need any logs or other clarification/information.
>>
>> Thanks,
>>
>> Campbell
>>
>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote:
>>> Hi,
>>>
>>> can you please check the contents of
>>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is
>>> right now) and search for host-id?
>>>
>>> Make sure the IDs are different. If they are not, then there is a bug
somewhere.
>>>
>>> Martin
>>>
>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote:
>>>> I see this on the host it is trying to migrate in /var/log/sanlock:
>>>>
>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>>>>
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result
-262
>>>>
>>>> The sanlock service is running. Why would this occur?
>>>>
>>>> Thanks,
>>>>
>>>> C
>>>>
>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>> Hi Martin,
>>>>>
>>>>> Thanks for the reply. I have done this, and the deployment completed
>>>>> without error. However, it still will not allow the Hosted Engine
>>>>> migrate to another host. The
>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the
host
>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>>>>> reports:
>>>>>
>>>>> --------------------8<-------------------
>>>>>
>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>>>>> High Availability Communications Broker...
>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>>>> Failed to read metadata from
>>>>>
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>> Traceback (most
>>>>> recent call last):
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>> line 129, in get_raw_stats_for_service_type
>>>>> f =
>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>>>> OSError: [Errno 2]
>>>>> No such file or directory:
>>>>>
'/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>
>>>>> --------------------8<-------------------
>>>>>
>>>>> I checked the path, and it exists. I can run 'less -f' on it
fine. The
>>>>> perms are slightly different on the host that is running the VM vs
the
>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu.
Is
>>>>> this a san locking issue?
>>>>>
>>>>> Thanks for any help,
>>>>>
>>>>> Cam
>>>>>
>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>> Should it be? It was not in the instructions for the
migration from
>>>>>>> bare-metal to Hosted VM
>>>>>>
>>>>>> The hosted engine will only migrate to hosts that have the
services
>>>>>> running. Please put one other host to maintenance and select
Hosted
>>>>>> engine action: DEPLOY in the reinstall dialog.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Martin Sivak
>>>>>>
>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com>
wrote:
>>>>>>> I changed the
'os.other.devices.display.protocols.value.3.6 =
>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same
display protocols
>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I
am
>>>>>>> guessing the compatibility version was causing it to use the
3.6
>>>>>>> version. However, I am still unable to migrate the engine VM
to
>>>>>>> another host. When I try putting the host it is currently on
into
>>>>>>> maintenance, it reports:
>>>>>>>
>>>>>>> Error while executing action: Cannot switch the Host(s) to
Maintenance mode.
>>>>>>> There are no available hosts capable of running the engine
VM.
>>>>>>>
>>>>>>> Running 'hosted-engine --vm-status' still shows
'Engine status:
>>>>>>> unknown stale-data'.
>>>>>>>
>>>>>>> The ovirt-ha-broker service is only running on one host. It
was set to
>>>>>>> 'disabled' in systemd. It won't start as there is
no
>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two
hosts.
>>>>>>> Should it be? It was not in the instructions for the
migration from
>>>>>>> bare-metal to Hosted VM
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Cam
>>>>>>>
>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>> Hi Tomas,
>>>>>>>>
>>>>>>>> So in my
/usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>>>>>>>> engine VM, I have:
>>>>>>>>
>>>>>>>> os.other.devices.display.protocols.value =
spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>> os.other.devices.display.protocols.value.3.6 =
spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>
>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6
should not apply
>>>>>>>>
>>>>>>>> Is there somewhere else I should be looking?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Cam
>>>>>>>>
>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek
<tjelinek(a)redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak
<msivak(a)redhat.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Tomas, what fields are needed in a VM to
pass the check that causes
>>>>>>>>>> > the following error?
>>>>>>>>>> >
>>>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>> >>>>> failed for user SYSTEM.
Reasons: VAR__ACTION__IMPORT
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>
>>>>>>>>>> to match the OS and VM Display type;-)
>>>>>>>>>> Configuration is in osinfo….e.g. if that is
import from older releases on
>>>>>>>>>> Linux this is typically caused by the cahgen of
cirrus to vga for non-SPICE
>>>>>>>>>> VMs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> yep, the default supported combinations for 4.0+ is
this:
>>>>>>>>> os.other.devices.display.protocols.value =
>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > Thanks.
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc
<iucounu(a)gmail.com> wrote:
>>>>>>>>>> >> Hi Martin,
>>>>>>>>>> >>
>>>>>>>>>> >>>
>>>>>>>>>> >>> just as a random comment, do you
still have the database backup from
>>>>>>>>>> >>> the bare metal -> VM attempt? It
might be possible to just try again
>>>>>>>>>> >>> using it. Or in the worst case..
update the offending value there
>>>>>>>>>> >>> before restoring it to the new
engine instance.
>>>>>>>>>> >>
>>>>>>>>>> >> I still have the backup. I'd rather
do the latter, as re-running the
>>>>>>>>>> >> HE deployment is quite lengthy and
involved (I have to re-initialise
>>>>>>>>>> >> the FC storage each time). Do you know
what the offending value(s)
>>>>>>>>>> >> would be? Would it be in the Postgres DB
or in a config file
>>>>>>>>>> >> somewhere?
>>>>>>>>>> >>
>>>>>>>>>> >> Cheers,
>>>>>>>>>> >>
>>>>>>>>>> >> Cam
>>>>>>>>>> >>
>>>>>>>>>> >>> Regards
>>>>>>>>>> >>>
>>>>>>>>>> >>> Martin Sivak
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM,
cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>> >>>> Hi Yanir,
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Thanks for the reply.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>> First of all, maybe a chain
reaction of :
>>>>>>>>>> >>>>> WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>> >>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>> >>>>> failed for user SYSTEM.
Reasons: VAR__ACTION__IMPORT
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>> >>>>> is causing the hosted engine
vm not to be set up correctly and
>>>>>>>>>> >>>>> further
>>>>>>>>>> >>>>> actions were made when the
hosted engine vm wasnt in a stable state.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> As for now, are you trying
to revert back to a previous/initial
>>>>>>>>>> >>>>> state ?
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> I'm not trying to revert it
to a previous state for now. This was a
>>>>>>>>>> >>>> migration from a bare metal
engine, and it didn't report any error
>>>>>>>>>> >>>> during the migration. I'd
had some problems on my first attempts at
>>>>>>>>>> >>>> this migration, whereby it never
completed (due to a proxy issue) but
>>>>>>>>>> >>>> I managed to resolve this. Do
you know of a way to get the Hosted
>>>>>>>>>> >>>> Engine VM into a stable state,
without rebuilding the entire cluster
>>>>>>>>>> >>>> from scratch (since I have a lot
of VMs on it)?
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Thanks for any help.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Regards,
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Cam
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>> Regards,
>>>>>>>>>> >>>>> Yanir
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32
PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Hi Jenny/Martin,
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Any idea what I can do
here? The hosted engine VM has no log on any
>>>>>>>>>> >>>>>> host in
/var/log/libvirt/qemu, and I fear that if I need to put the
>>>>>>>>>> >>>>>> host into maintenance,
e.g., to upgrade it that I created it on
>>>>>>>>>> >>>>>> (which
>>>>>>>>>> >>>>>> I think is hosting it),
or if it fails for any reason, it won't get
>>>>>>>>>> >>>>>> migrated to another
host, and I will not be able to manage the
>>>>>>>>>> >>>>>> cluster. It seems to be
a very dangerous position to be in.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Cam
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at
11:48 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>> >>>>>>> Thanks Martin. The
hosts are all part of the same cluster.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> I get these errors
in the engine.log on the engine:
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> 2017-06-19
03:28:05,030Z WARN
>>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>>>>>>>>> >>>>>>> 'ImportVm'
>>>>>>>>>> >>>>>>> failed for user
SYST
>>>>>>>>>> >>>>>>> EM. Reasons:
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>> >>>>>>> 2017-06-19
03:28:05,030Z INFO
>>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>>>>>>>>> >>>>>>>
'EngineLock:{exclusiveLocks='[a
>>>>>>>>>> >>>>>>>
79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>>>>>>>>>> >>>>>>>
HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>> >>>>>>> sharedLocks=
>>>>>>>>>> >>>>>>>
'[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>> >>>>>>>
ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>>>>>>>>>> >>>>>>> 2017-06-19
03:28:05,030Z ERROR
>>>>>>>>>> >>>>>>>
[org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>> >>>>>>>
(org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>>>>>>>>>> >>>>>>> Engine VM
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> The sanlock.log
reports conflicts on that same host, and a
>>>>>>>>>> >>>>>>> different
>>>>>>>>>> >>>>>>> error on the other
hosts, not sure if they are related.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> And this in the
/var/log/ovirt-hosted-engine-ha/agent log on the
>>>>>>>>>> >>>>>>> host
>>>>>>>>>> >>>>>>> which I deployed the
hosted engine VM on:
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>> >>>>>>> Unable to extract
HEVM OVF
>>>>>>>>>> >>>>>>>
MainThread::ERROR::2017-06-19
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>> >>>>>>> Failed extracting VM
OVF from the OVF_STORE volume, falling back
>>>>>>>>>> >>>>>>> to
>>>>>>>>>> >>>>>>> initial vm.conf
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> I've seen some
of these issues reported in bugzilla, but they were
>>>>>>>>>> >>>>>>> for
>>>>>>>>>> >>>>>>> older versions of
oVirt (and appear to be resolved).
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> I will install that
package on the other two hosts, for which I
>>>>>>>>>> >>>>>>> will
>>>>>>>>>> >>>>>>> put them in
maintenance as vdsm is installed as an upgrade. I
>>>>>>>>>> >>>>>>> guess
>>>>>>>>>> >>>>>>> restarting vdsm is a
good idea after that?
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Campbell
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017
at 10:51 AM, Martin Sivak <msivak(a)redhat.com>
>>>>>>>>>> >>>>>>> wrote:
>>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> you do not have
to install it on all hosts. But you should have
>>>>>>>>>> >>>>>>>> more
>>>>>>>>>> >>>>>>>> than one and
ideally all hosted engine enabled nodes should
>>>>>>>>>> >>>>>>>> belong to
>>>>>>>>>> >>>>>>>> the same engine
cluster.
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Best regards
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Martin Sivak
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> On Wed, Jun 21,
2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>> >>>>>>>>> Hi Jenny,
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> Does
ovirt-hosted-engine-ha need to be installed across all
>>>>>>>>>> >>>>>>>>> hosts?
>>>>>>>>>> >>>>>>>>> Could that
be the reason it is failing to see it properly?
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> Cam
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> On Mon, Jun
19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote:
>>>>>>>>>> >>>>>>>>>> Hi
Jenny,
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> Logs are
attached. I can see errors in there, but am unsure how
>>>>>>>>>> >>>>>>>>>> they
>>>>>>>>>> >>>>>>>>>> arose.
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> Thanks,
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>
Campbell
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> On Mon,
Jun 19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>>> >>>>>>>>>>
<etokar(a)redhat.com>
>>>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>>>> >>>>>>>>>>> From
the output it looks like the agent is down, try starting
>>>>>>>>>> >>>>>>>>>>> it
by
>>>>>>>>>> >>>>>>>>>>>
running:
>>>>>>>>>> >>>>>>>>>>>
systemctl start ovirt-ha-agent.
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> The
engine is supposed to see the hosted engine storage domain
>>>>>>>>>> >>>>>>>>>>> and
>>>>>>>>>> >>>>>>>>>>>
import it
>>>>>>>>>> >>>>>>>>>>> to
the system, then it should import the hosted engine vm.
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> Can
you attach the agent log from the host
>>>>>>>>>> >>>>>>>>>>>
(/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>> >>>>>>>>>>> and
the engine log from the engine vm
>>>>>>>>>> >>>>>>>>>>>
(/var/log/ovirt-engine/engine.log)?
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>
Thanks,
>>>>>>>>>> >>>>>>>>>>>
Jenny
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>> On
Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com>
>>>>>>>>>> >>>>>>>>>>>
wrote:
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
Hi Jenny,
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> What version are you running?
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
4.1.2.2-1.el7.centos
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> For the hosted engine vm to be
imported and displayed in the
>>>>>>>>>>
>>>>>>>>>>>>> engine, you
>>>>>>>>>>
>>>>>>>>>>>>> must first create a master storage
domain.
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
To provide a bit more detail: this was a migration of a
>>>>>>>>>> >>>>>>>>>>>>
bare-metal
>>>>>>>>>> >>>>>>>>>>>>
engine in an existing cluster to a hosted engine VM for that
>>>>>>>>>> >>>>>>>>>>>>
cluster.
>>>>>>>>>> >>>>>>>>>>>>
As part of this migration, I built an entirely new host and
>>>>>>>>>> >>>>>>>>>>>>
ran
>>>>>>>>>> >>>>>>>>>>>>
'hosted-engine --deploy' (followed these instructions:
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...).
>>>>>>>>>> >>>>>>>>>>>>
I restored the backup from the engine and it completed
>>>>>>>>>> >>>>>>>>>>>>
without any
>>>>>>>>>> >>>>>>>>>>>>
errors. I didn't see any instructions regarding a master
>>>>>>>>>> >>>>>>>>>>>>
storage
>>>>>>>>>> >>>>>>>>>>>>
domain in the page above. The cluster has two existing master
>>>>>>>>>> >>>>>>>>>>>>
storage
>>>>>>>>>> >>>>>>>>>>>>
domains, one is fibre channel, which is up, and one ISO
>>>>>>>>>> >>>>>>>>>>>>
domain,
>>>>>>>>>> >>>>>>>>>>>>
which
>>>>>>>>>> >>>>>>>>>>>>
is currently offline.
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> What do you mean the hosted engine
commands are failing?
>>>>>>>>>>
>>>>>>>>>>>>> What
>>>>>>>>>>
>>>>>>>>>>>>> happens
>>>>>>>>>>
>>>>>>>>>>>>> when
>>>>>>>>>>
>>>>>>>>>>>>> you run hosted-engine --vm-status
now?
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
Interestingly, whereas when I ran it before, it exited with
>>>>>>>>>> >>>>>>>>>>>>
no
>>>>>>>>>> >>>>>>>>>>>>
output
>>>>>>>>>> >>>>>>>>>>>>
and a return code of '1', it now reports:
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
--== Host 1 status ==--
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
conf_on_shared_storage : True
>>>>>>>>>> >>>>>>>>>>>>
Status up-to-date : False
>>>>>>>>>> >>>>>>>>>>>>
Hostname :
>>>>>>>>>> >>>>>>>>>>>>
kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>> >>>>>>>>>>>>
Host ID : 1
>>>>>>>>>> >>>>>>>>>>>>
Engine status : unknown stale-data
>>>>>>>>>> >>>>>>>>>>>>
Score : 0
>>>>>>>>>> >>>>>>>>>>>>
stopped : True
>>>>>>>>>> >>>>>>>>>>>>
Local maintenance : False
>>>>>>>>>> >>>>>>>>>>>>
crc32 : 0217f07b
>>>>>>>>>> >>>>>>>>>>>>
local_conf_timestamp : 2911
>>>>>>>>>> >>>>>>>>>>>>
Host timestamp : 2897
>>>>>>>>>> >>>>>>>>>>>>
Extra metadata (valid at timestamp):
>>>>>>>>>> >>>>>>>>>>>>
metadata_parse_version=1
>>>>>>>>>> >>>>>>>>>>>>
metadata_feature_version=1
>>>>>>>>>> >>>>>>>>>>>>
timestamp=2897 (Thu Jun 15 16:22:54 2017)
>>>>>>>>>> >>>>>>>>>>>>
host-id=1
>>>>>>>>>> >>>>>>>>>>>>
score=0
>>>>>>>>>> >>>>>>>>>>>>
vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017)
>>>>>>>>>> >>>>>>>>>>>>
conf_on_shared_storage=True
>>>>>>>>>> >>>>>>>>>>>>
maintenance=False
>>>>>>>>>> >>>>>>>>>>>>
state=AgentStopped
>>>>>>>>>> >>>>>>>>>>>>
stopped=True
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
Yet I can login to the web GUI fine. I guess it is not HA due
>>>>>>>>>> >>>>>>>>>>>>
to
>>>>>>>>>> >>>>>>>>>>>>
being
>>>>>>>>>> >>>>>>>>>>>>
in an unknown state currently? Does the hosted-engine-ha rpm
>>>>>>>>>> >>>>>>>>>>>>
need
>>>>>>>>>> >>>>>>>>>>>>
to
>>>>>>>>>> >>>>>>>>>>>>
be installed across all nodes in the cluster, btw?
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
Thanks for the help,
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>>
Cam
>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc
<iucounu(a)gmail.com>
>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> I've migrated from a
bare-metal engine to a hosted engine.
>>>>>>>>>>
>>>>>>>>>>>>>> There
>>>>>>>>>>
>>>>>>>>>>>>>> were
>>>>>>>>>>
>>>>>>>>>>>>>> no errors during the install,
however, the hosted engine
>>>>>>>>>>
>>>>>>>>>>>>>> did not
>>>>>>>>>>
>>>>>>>>>>>>>> get
>>>>>>>>>>
>>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> on the host I deployed it on, and
it returns nothing (exit
>>>>>>>>>>
>>>>>>>>>>>>>> code
>>>>>>>>>>
>>>>>>>>>>>>>> is 1
>>>>>>>>>>
>>>>>>>>>>>>>> however). I could not ping it
either. So I tried starting
>>>>>>>>>>
>>>>>>>>>>>>>> it via
>>>>>>>>>>
>>>>>>>>>>>>>> 'hosted-engine
--vm-start' and it returned:
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> But it then became available. I
logged into it
>>>>>>>>>>
>>>>>>>>>>>>>> successfully. It
>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>
>>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas why the hosted-engine
commands fail, and why it
>>>>>>>>>>
>>>>>>>>>>>>>> is not
>>>>>>>>>>
>>>>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>
>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>
>>>>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>
>>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>> >>>>>>>>>
_______________________________________________
>>>>>>>>>> >>>>>>>>> Users
mailing list
>>>>>>>>>> >>>>>>>>>
Users(a)ovirt.org
>>>>>>>>>> >>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>> >>>>>>
_______________________________________________
>>>>>>>>>> >>>>>> Users mailing list
>>>>>>>>>> >>>>>> Users(a)ovirt.org
>>>>>>>>>> >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >
_______________________________________________
>>>>>>>>>> > Users mailing list
>>>>>>>>>> > Users(a)ovirt.org
>>>>>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>