HostedEngine VM not visible, but running

Re: [ovirt-users] [ovirt-devel]...

Sound Device Custom Property

cmc

Thursday, 15 June 2017 Thu, 15 Jun '17

10:32 a.m.

Hi, I've migrated from a bare-metal engine to a hosted engine. There were no errors during the install, however, the hosted engine did not get started. I tried running: hosted-engine --status on the host I deployed it on, and it returns nothing (exit code is 1 however). I could not ping it either. So I tried starting it via 'hosted-engine --vm-start' and it returned: Virtual machine does not exist But it then became available. I logged into it successfully. It is not in the list of VMs however. Any ideas why the hosted-engine commands fail, and why it is not in the list of virtual machines? Thanks for any help, Cam

Show replies by date

Evgenia Tokar

Sunday, 18 June Sun, 18 Jun

6:33 a.m.

Hi, What version are you running? For the hosted engine vm to be imported and displayed in the engine, you must first create a master storage domain. What do you mean the hosted engine commands are failing? What happens when you run hosted-engine --vm-status now? Jenny Tokar On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote:

...

cmc

Monday, 19 June Mon, 19 Jun

4:41 a.m.

Hi Jenny,

...

What version are you running?

4.1.2.2-1.el7.centos

...

For the hosted engine vm to be imported and displayed in the engine, you must first create a master storage domain.

To provide a bit more detail: this was a migration of a bare-metal engine in an existing cluster to a hosted engine VM for that cluster. As part of this migration, I built an entirely new host and ran 'hosted-engine --deploy' (followed these instructions: http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). I restored the backup from the engine and it completed without any errors. I didn't see any instructions regarding a master storage domain in the page above. The cluster has two existing master storage domains, one is fibre channel, which is up, and one ISO domain, which is currently offline.

...

What do you mean the hosted engine commands are failing? What happens when you run hosted-engine --vm-status now?

Interestingly, whereas when I ran it before, it exited with no output and a return code of '1', it now reports: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : kvm-ldn-03.ldn.fscfc.co.uk Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : 0217f07b local_conf_timestamp : 2911 Host timestamp : 2897 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2897 (Thu Jun 15 16:22:54 2017) host-id=1 score=0 vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True Yet I can login to the web GUI fine. I guess it is not HA due to being in an unknown state currently? Does the hosted-engine-ha rpm need to be installed across all nodes in the cluster, btw? Thanks for the help, Cam

...

Jenny Tokar On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: > > Hi, > > I've migrated from a bare-metal engine to a hosted engine. There were > no errors during the install, however, the hosted engine did not get > started. I tried running: > > hosted-engine --status > > on the host I deployed it on, and it returns nothing (exit code is 1 > however). I could not ping it either. So I tried starting it via > 'hosted-engine --vm-start' and it returned: > > Virtual machine does not exist > > But it then became available. I logged into it successfully. It is not > in the list of VMs however. > > Any ideas why the hosted-engine commands fail, and why it is not in > the list of virtual machines? > > Thanks for any help, > > Cam > _______________________________________________ > Users mailing list > Users(a)ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

Evgenia Tokar

6:29 a.m.

...

From the output it looks like the agent is down, try starting it by

running: systemctl start ovirt-ha-agent. The engine is supposed to see the hosted engine storage domain and import it to the system, then it should import the hosted engine vm. Can you attach the agent log from the host (/var/log/ovirt-hosted-engine-ha/agent.log) and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)? Thanks, Jenny On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote:

...

Hi Jenny, > What version are you running? 4.1.2.2-1.el7.centos > For the hosted engine vm to be imported and displayed in the engine, you > must first create a master storage domain. To provide a bit more detail: this was a migration of a bare-metal engine in an existing cluster to a hosted engine VM for that cluster. As part of this migration, I built an entirely new host and ran 'hosted-engine --deploy' (followed these instructions: http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_ Metal_to_an_EL-Based_Self-Hosted_Environment/). I restored the backup from the engine and it completed without any errors. I didn't see any instructions regarding a master storage domain in the page above. The cluster has two existing master storage domains, one is fibre channel, which is up, and one ISO domain, which is currently offline. > What do you mean the hosted engine commands are failing? What happens when > you run hosted-engine --vm-status now? Interestingly, whereas when I ran it before, it exited with no output and a return code of '1', it now reports: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : kvm-ldn-03.ldn.fscfc.co.uk Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : 0217f07b local_conf_timestamp : 2911 Host timestamp : 2897 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2897 (Thu Jun 15 16:22:54 2017) host-id=1 score=0 vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True Yet I can login to the web GUI fine. I guess it is not HA due to being in an unknown state currently? Does the hosted-engine-ha rpm need to be installed across all nodes in the cluster, btw? Thanks for the help, Cam > > Jenny Tokar > > > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >> >> Hi, >> >> I've migrated from a bare-metal engine to a hosted engine. There were >> no errors during the install, however, the hosted engine did not get >> started. I tried running: >> >> hosted-engine --status >> >> on the host I deployed it on, and it returns nothing (exit code is 1 >> however). I could not ping it either. So I tried starting it via >> 'hosted-engine --vm-start' and it returned: >> >> Virtual machine does not exist >> >> But it then became available. I logged into it successfully. It is not >> in the list of VMs however. >> >> Any ideas why the hosted-engine commands fail, and why it is not in >> the list of virtual machines? >> >> Thanks for any help, >> >> Cam >> _______________________________________________ >> Users mailing list >> Users(a)ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > >

cmc

7:27 a.m.

Hi Jenny, Logs are attached. I can see errors in there, but am unsure how they arose. Thanks, Campbell On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> wrote:

...

From the output it looks like the agent is down, try starting it by running: systemctl start ovirt-ha-agent. The engine is supposed to see the hosted engine storage domain and import it to the system, then it should import the hosted engine vm. Can you attach the agent log from the host (/var/log/ovirt-hosted-engine-ha/agent.log) and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)? Thanks, Jenny On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: > > Hi Jenny, > > > What version are you running? > > 4.1.2.2-1.el7.centos > > > For the hosted engine vm to be imported and displayed in the engine, you > > must first create a master storage domain. > > To provide a bit more detail: this was a migration of a bare-metal > engine in an existing cluster to a hosted engine VM for that cluster. > As part of this migration, I built an entirely new host and ran > 'hosted-engine --deploy' (followed these instructions: > > http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). > I restored the backup from the engine and it completed without any > errors. I didn't see any instructions regarding a master storage > domain in the page above. The cluster has two existing master storage > domains, one is fibre channel, which is up, and one ISO domain, which > is currently offline. > > > What do you mean the hosted engine commands are failing? What happens > > when > > you run hosted-engine --vm-status now? > > Interestingly, whereas when I ran it before, it exited with no output > and a return code of '1', it now reports: > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : kvm-ldn-03.ldn.fscfc.co.uk > Host ID : 1 > Engine status : unknown stale-data > Score : 0 > stopped : True > Local maintenance : False > crc32 : 0217f07b > local_conf_timestamp : 2911 > Host timestamp : 2897 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=2897 (Thu Jun 15 16:22:54 2017) > host-id=1 > score=0 > vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True > > Yet I can login to the web GUI fine. I guess it is not HA due to being > in an unknown state currently? Does the hosted-engine-ha rpm need to > be installed across all nodes in the cluster, btw? > > Thanks for the help, > > Cam > > > > > Jenny Tokar > > > > > > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: > >> > >> Hi, > >> > >> I've migrated from a bare-metal engine to a hosted engine. There were > >> no errors during the install, however, the hosted engine did not get > >> started. I tried running: > >> > >> hosted-engine --status > >> > >> on the host I deployed it on, and it returns nothing (exit code is 1 > >> however). I could not ping it either. So I tried starting it via > >> 'hosted-engine --vm-start' and it returned: > >> > >> Virtual machine does not exist > >> > >> But it then became available. I logged into it successfully. It is not > >> in the list of VMs however. > >> > >> Any ideas why the hosted-engine commands fail, and why it is not in > >> the list of virtual machines? > >> > >> Thanks for any help, > >> > >> Cam > >> _______________________________________________ > >> Users mailing list > >> Users(a)ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/users > > > >

cmc

Wednesday, 21 June Wed, 21 Jun

4:29 a.m.

...

Hi Jenny, Logs are attached. I can see errors in there, but am unsure how they arose. Thanks, Campbell On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> wrote: > From the output it looks like the agent is down, try starting it by running: > systemctl start ovirt-ha-agent. > > The engine is supposed to see the hosted engine storage domain and import it > to the system, then it should import the hosted engine vm. > > Can you attach the agent log from the host > (/var/log/ovirt-hosted-engine-ha/agent.log) > and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)? > > Thanks, > Jenny > > > On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >> >> Hi Jenny, >> >> > What version are you running? >> >> 4.1.2.2-1.el7.centos >> >> > For the hosted engine vm to be imported and displayed in the engine, you >> > must first create a master storage domain. >> >> To provide a bit more detail: this was a migration of a bare-metal >> engine in an existing cluster to a hosted engine VM for that cluster. >> As part of this migration, I built an entirely new host and ran >> 'hosted-engine --deploy' (followed these instructions: >> >> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >> I restored the backup from the engine and it completed without any >> errors. I didn't see any instructions regarding a master storage >> domain in the page above. The cluster has two existing master storage >> domains, one is fibre channel, which is up, and one ISO domain, which >> is currently offline. >> >> > What do you mean the hosted engine commands are failing? What happens >> > when >> > you run hosted-engine --vm-status now? >> >> Interestingly, whereas when I ran it before, it exited with no output >> and a return code of '1', it now reports: >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >> Host ID : 1 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : 0217f07b >> local_conf_timestamp : 2911 >> Host timestamp : 2897 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=2897 (Thu Jun 15 16:22:54 2017) >> host-id=1 >> score=0 >> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> Yet I can login to the web GUI fine. I guess it is not HA due to being >> in an unknown state currently? Does the hosted-engine-ha rpm need to >> be installed across all nodes in the cluster, btw? >> >> Thanks for the help, >> >> Cam >> >> > >> > Jenny Tokar >> > >> > >> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >> >> >> >> Hi, >> >> >> >> I've migrated from a bare-metal engine to a hosted engine. There were >> >> no errors during the install, however, the hosted engine did not get >> >> started. I tried running: >> >> >> >> hosted-engine --status >> >> >> >> on the host I deployed it on, and it returns nothing (exit code is 1 >> >> however). I could not ping it either. So I tried starting it via >> >> 'hosted-engine --vm-start' and it returned: >> >> >> >> Virtual machine does not exist >> >> >> >> But it then became available. I logged into it successfully. It is not >> >> in the list of VMs however. >> >> >> >> Any ideas why the hosted-engine commands fail, and why it is not in >> >> the list of virtual machines? >> >> >> >> Thanks for any help, >> >> >> >> Cam >> >> _______________________________________________ >> >> Users mailing list >> >> Users(a)ovirt.org >> >> http://lists.ovirt.org/mailman/listinfo/users >> > >> > > >

Martin Sivak

4:51 a.m.

...

Hi Jenny, Does ovirt-hosted-engine-ha need to be installed across all hosts? Could that be the reason it is failing to see it properly? Thanks, Cam On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: > Hi Jenny, > > Logs are attached. I can see errors in there, but am unsure how they arose. > > Thanks, > > Campbell > > On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> wrote: >> From the output it looks like the agent is down, try starting it by running: >> systemctl start ovirt-ha-agent. >> >> The engine is supposed to see the hosted engine storage domain and import it >> to the system, then it should import the hosted engine vm. >> >> Can you attach the agent log from the host >> (/var/log/ovirt-hosted-engine-ha/agent.log) >> and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)? >> >> Thanks, >> Jenny >> >> >> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>> >>> Hi Jenny, >>> >>> > What version are you running? >>> >>> 4.1.2.2-1.el7.centos >>> >>> > For the hosted engine vm to be imported and displayed in the engine, you >>> > must first create a master storage domain. >>> >>> To provide a bit more detail: this was a migration of a bare-metal >>> engine in an existing cluster to a hosted engine VM for that cluster. >>> As part of this migration, I built an entirely new host and ran >>> 'hosted-engine --deploy' (followed these instructions: >>> >>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>> I restored the backup from the engine and it completed without any >>> errors. I didn't see any instructions regarding a master storage >>> domain in the page above. The cluster has two existing master storage >>> domains, one is fibre channel, which is up, and one ISO domain, which >>> is currently offline. >>> >>> > What do you mean the hosted engine commands are failing? What happens >>> > when >>> > you run hosted-engine --vm-status now? >>> >>> Interestingly, whereas when I ran it before, it exited with no output >>> and a return code of '1', it now reports: >>> >>> --== Host 1 status ==-- >>> >>> conf_on_shared_storage : True >>> Status up-to-date : False >>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>> Host ID : 1 >>> Engine status : unknown stale-data >>> Score : 0 >>> stopped : True >>> Local maintenance : False >>> crc32 : 0217f07b >>> local_conf_timestamp : 2911 >>> Host timestamp : 2897 >>> Extra metadata (valid at timestamp): >>> metadata_parse_version=1 >>> metadata_feature_version=1 >>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>> host-id=1 >>> score=0 >>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>> conf_on_shared_storage=True >>> maintenance=False >>> state=AgentStopped >>> stopped=True >>> >>> Yet I can login to the web GUI fine. I guess it is not HA due to being >>> in an unknown state currently? Does the hosted-engine-ha rpm need to >>> be installed across all nodes in the cluster, btw? >>> >>> Thanks for the help, >>> >>> Cam >>> >>> > >>> > Jenny Tokar >>> > >>> > >>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>> >> >>> >> Hi, >>> >> >>> >> I've migrated from a bare-metal engine to a hosted engine. There were >>> >> no errors during the install, however, the hosted engine did not get >>> >> started. I tried running: >>> >> >>> >> hosted-engine --status >>> >> >>> >> on the host I deployed it on, and it returns nothing (exit code is 1 >>> >> however). I could not ping it either. So I tried starting it via >>> >> 'hosted-engine --vm-start' and it returned: >>> >> >>> >> Virtual machine does not exist >>> >> >>> >> But it then became available. I logged into it successfully. It is not >>> >> in the list of VMs however. >>> >> >>> >> Any ideas why the hosted-engine commands fail, and why it is not in >>> >> the list of virtual machines? >>> >> >>> >> Thanks for any help, >>> >> >>> >> Cam >>> >> _______________________________________________ >>> >> Users mailing list >>> >> Users(a)ovirt.org >>> >> http://lists.ovirt.org/mailman/listinfo/users >>> > >>> > >> >> _______________________________________________ Users mailing list Users(a)ovirt.org http://lists.ovirt.org/mailman/listinfo/users

cmc

5:48 a.m.

...

Hi, you do not have to install it on all hosts. But you should have more than one and ideally all hosted engine enabled nodes should belong to the same engine cluster. Best regards Martin Sivak On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: > Hi Jenny, > > Does ovirt-hosted-engine-ha need to be installed across all hosts? > Could that be the reason it is failing to see it properly? > > Thanks, > > Cam > > On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >> Hi Jenny, >> >> Logs are attached. I can see errors in there, but am unsure how they arose. >> >> Thanks, >> >> Campbell >> >> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> wrote: >>> From the output it looks like the agent is down, try starting it by running: >>> systemctl start ovirt-ha-agent. >>> >>> The engine is supposed to see the hosted engine storage domain and import it >>> to the system, then it should import the hosted engine vm. >>> >>> Can you attach the agent log from the host >>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>> and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)? >>> >>> Thanks, >>> Jenny >>> >>> >>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>> Hi Jenny, >>>> >>>> > What version are you running? >>>> >>>> 4.1.2.2-1.el7.centos >>>> >>>> > For the hosted engine vm to be imported and displayed in the engine, you >>>> > must first create a master storage domain. >>>> >>>> To provide a bit more detail: this was a migration of a bare-metal >>>> engine in an existing cluster to a hosted engine VM for that cluster. >>>> As part of this migration, I built an entirely new host and ran >>>> 'hosted-engine --deploy' (followed these instructions: >>>> >>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>> I restored the backup from the engine and it completed without any >>>> errors. I didn't see any instructions regarding a master storage >>>> domain in the page above. The cluster has two existing master storage >>>> domains, one is fibre channel, which is up, and one ISO domain, which >>>> is currently offline. >>>> >>>> > What do you mean the hosted engine commands are failing? What happens >>>> > when >>>> > you run hosted-engine --vm-status now? >>>> >>>> Interestingly, whereas when I ran it before, it exited with no output >>>> and a return code of '1', it now reports: >>>> >>>> --== Host 1 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : False >>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>>> Host ID : 1 >>>> Engine status : unknown stale-data >>>> Score : 0 >>>> stopped : True >>>> Local maintenance : False >>>> crc32 : 0217f07b >>>> local_conf_timestamp : 2911 >>>> Host timestamp : 2897 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>> host-id=1 >>>> score=0 >>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=AgentStopped >>>> stopped=True >>>> >>>> Yet I can login to the web GUI fine. I guess it is not HA due to being >>>> in an unknown state currently? Does the hosted-engine-ha rpm need to >>>> be installed across all nodes in the cluster, btw? >>>> >>>> Thanks for the help, >>>> >>>> Cam >>>> >>>> > >>>> > Jenny Tokar >>>> > >>>> > >>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> I've migrated from a bare-metal engine to a hosted engine. There were >>>> >> no errors during the install, however, the hosted engine did not get >>>> >> started. I tried running: >>>> >> >>>> >> hosted-engine --status >>>> >> >>>> >> on the host I deployed it on, and it returns nothing (exit code is 1 >>>> >> however). I could not ping it either. So I tried starting it via >>>> >> 'hosted-engine --vm-start' and it returned: >>>> >> >>>> >> Virtual machine does not exist >>>> >> >>>> >> But it then became available. I logged into it successfully. It is not >>>> >> in the list of VMs however. >>>> >> >>>> >> Any ideas why the hosted-engine commands fail, and why it is not in >>>> >> the list of virtual machines? >>>> >> >>>> >> Thanks for any help, >>>> >> >>>> >> Cam >>>> >> _______________________________________________ >>>> >> Users mailing list >>>> >> Users(a)ovirt.org >>>> >> http://lists.ovirt.org/mailman/listinfo/users >>>> > >>>> > >>> >>> > _______________________________________________ > Users mailing list > Users(a)ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

cmc

8:32 a.m.

...

Thanks Martin. The hosts are all part of the same cluster. I get these errors in the engine.log on the engine: 2017-06-19 03:28:05,030Z WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' failed for user SYST EM. Reasons: VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS 2017-06-19 03:28:05,030Z INFO [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object 'EngineLock:{exclusiveLocks='[a 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', sharedLocks= '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' 2017-06-19 03:28:05,030Z ERROR [org.ovirt.engine.core.bll.HostedEngineImporter] (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted Engine VM The sanlock.log reports conflicts on that same host, and a different error on the other hosts, not sure if they are related. And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host which I deployed the hosted engine VM on: MainThread::ERROR::2017-06-19 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Unable to extract HEVM OVF MainThread::ERROR::2017-06-19 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf I've seen some of these issues reported in bugzilla, but they were for older versions of oVirt (and appear to be resolved). I will install that package on the other two hosts, for which I will put them in maintenance as vdsm is installed as an upgrade. I guess restarting vdsm is a good idea after that? Thanks, Campbell On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> wrote: > Hi, > > you do not have to install it on all hosts. But you should have more > than one and ideally all hosted engine enabled nodes should belong to > the same engine cluster. > > Best regards > > Martin Sivak > > On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >> Hi Jenny, >> >> Does ovirt-hosted-engine-ha need to be installed across all hosts? >> Could that be the reason it is failing to see it properly? >> >> Thanks, >> >> Cam >> >> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>> Hi Jenny, >>> >>> Logs are attached. I can see errors in there, but am unsure how they arose. >>> >>> Thanks, >>> >>> Campbell >>> >>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> wrote: >>>> From the output it looks like the agent is down, try starting it by running: >>>> systemctl start ovirt-ha-agent. >>>> >>>> The engine is supposed to see the hosted engine storage domain and import it >>>> to the system, then it should import the hosted engine vm. >>>> >>>> Can you attach the agent log from the host >>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>> and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)? >>>> >>>> Thanks, >>>> Jenny >>>> >>>> >>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>> Hi Jenny, >>>>> >>>>> > What version are you running? >>>>> >>>>> 4.1.2.2-1.el7.centos >>>>> >>>>> > For the hosted engine vm to be imported and displayed in the engine, you >>>>> > must first create a master storage domain. >>>>> >>>>> To provide a bit more detail: this was a migration of a bare-metal >>>>> engine in an existing cluster to a hosted engine VM for that cluster. >>>>> As part of this migration, I built an entirely new host and ran >>>>> 'hosted-engine --deploy' (followed these instructions: >>>>> >>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>> I restored the backup from the engine and it completed without any >>>>> errors. I didn't see any instructions regarding a master storage >>>>> domain in the page above. The cluster has two existing master storage >>>>> domains, one is fibre channel, which is up, and one ISO domain, which >>>>> is currently offline. >>>>> >>>>> > What do you mean the hosted engine commands are failing? What happens >>>>> > when >>>>> > you run hosted-engine --vm-status now? >>>>> >>>>> Interestingly, whereas when I ran it before, it exited with no output >>>>> and a return code of '1', it now reports: >>>>> >>>>> --== Host 1 status ==-- >>>>> >>>>> conf_on_shared_storage : True >>>>> Status up-to-date : False >>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>>>> Host ID : 1 >>>>> Engine status : unknown stale-data >>>>> Score : 0 >>>>> stopped : True >>>>> Local maintenance : False >>>>> crc32 : 0217f07b >>>>> local_conf_timestamp : 2911 >>>>> Host timestamp : 2897 >>>>> Extra metadata (valid at timestamp): >>>>> metadata_parse_version=1 >>>>> metadata_feature_version=1 >>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>> host-id=1 >>>>> score=0 >>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>> conf_on_shared_storage=True >>>>> maintenance=False >>>>> state=AgentStopped >>>>> stopped=True >>>>> >>>>> Yet I can login to the web GUI fine. I guess it is not HA due to being >>>>> in an unknown state currently? Does the hosted-engine-ha rpm need to >>>>> be installed across all nodes in the cluster, btw? >>>>> >>>>> Thanks for the help, >>>>> >>>>> Cam >>>>> >>>>> > >>>>> > Jenny Tokar >>>>> > >>>>> > >>>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> I've migrated from a bare-metal engine to a hosted engine. There were >>>>> >> no errors during the install, however, the hosted engine did not get >>>>> >> started. I tried running: >>>>> >> >>>>> >> hosted-engine --status >>>>> >> >>>>> >> on the host I deployed it on, and it returns nothing (exit code is 1 >>>>> >> however). I could not ping it either. So I tried starting it via >>>>> >> 'hosted-engine --vm-start' and it returned: >>>>> >> >>>>> >> Virtual machine does not exist >>>>> >> >>>>> >> But it then became available. I logged into it successfully. It is not >>>>> >> in the list of VMs however. >>>>> >> >>>>> >> Any ideas why the hosted-engine commands fail, and why it is not in >>>>> >> the list of virtual machines? >>>>> >> >>>>> >> Thanks for any help, >>>>> >> >>>>> >> Cam >>>>> >> _______________________________________________ >>>>> >> Users mailing list >>>>> >> Users(a)ovirt.org >>>>> >> http://lists.ovirt.org/mailman/listinfo/users >>>>> > >>>>> > >>>> >>>> >> _______________________________________________ >> Users mailing list >> Users(a)ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users

Yanir Quinn

Thursday, 22 June Thu, 22 Jun

4:15 a.m.

HI, First of all, maybe a chain reaction of : WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS is causing the hosted engine vm not to be set up correctly and further actions were made when the hosted engine vm wasnt in a stable state. As for now, are you trying to revert back to a previous/initial state ? Regards, Yanir On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote:

...

Hi Jenny/Martin, Any idea what I can do here? The hosted engine VM has no log on any host in /var/log/libvirt/qemu, and I fear that if I need to put the host into maintenance, e.g., to upgrade it that I created it on (which I think is hosting it), or if it fails for any reason, it won't get migrated to another host, and I will not be able to manage the cluster. It seems to be a very dangerous position to be in. Thanks, Cam On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: > Thanks Martin. The hosts are all part of the same cluster. > > I get these errors in the engine.log on the engine: > > 2017-06-19 03:28:05,030Z WARN > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' > failed for user SYST > EM. Reasons: VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS > 2017-06-19 03:28:05,030Z INFO > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object > 'EngineLock:{exclusiveLocks='[a > 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, > HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', > sharedLocks= > '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' > 2017-06-19 03:28:05,030Z ERROR > [org.ovirt.engine.core.bll.HostedEngineImporter] > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted > Engine VM > > The sanlock.log reports conflicts on that same host, and a different > error on the other hosts, not sure if they are related. > > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host > which I deployed the hosted engine VM on: > > MainThread::ERROR::2017-06-19 > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > Unable to extract HEVM OVF > MainThread::ERROR::2017-06-19 > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Failed extracting VM OVF from the OVF_STORE volume, falling back to > initial vm.conf > > I've seen some of these issues reported in bugzilla, but they were for > older versions of oVirt (and appear to be resolved). > > I will install that package on the other two hosts, for which I will > put them in maintenance as vdsm is installed as an upgrade. I guess > restarting vdsm is a good idea after that? > > Thanks, > > Campbell > > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> wrote: >> Hi, >> >> you do not have to install it on all hosts. But you should have more >> than one and ideally all hosted engine enabled nodes should belong to >> the same engine cluster. >> >> Best regards >> >> Martin Sivak >> >> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>> Hi Jenny, >>> >>> Does ovirt-hosted-engine-ha need to be installed across all hosts? >>> Could that be the reason it is failing to see it properly? >>> >>> Thanks, >>> >>> Cam >>> >>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Hi Jenny, >>>> >>>> Logs are attached. I can see errors in there, but am unsure how they arose. >>>> >>>> Thanks, >>>> >>>> Campbell >>>> >>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> wrote: >>>>> From the output it looks like the agent is down, try starting it by running: >>>>> systemctl start ovirt-ha-agent. >>>>> >>>>> The engine is supposed to see the hosted engine storage domain and import it >>>>> to the system, then it should import the hosted engine vm. >>>>> >>>>> Can you attach the agent log from the host >>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>> and the engine log from the engine vm (/var/log/ovirt-engine/engine. log)? >>>>> >>>>> Thanks, >>>>> Jenny >>>>> >>>>> >>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >>>>>> Hi Jenny, >>>>>> >>>>>> > What version are you running? >>>>>> >>>>>> 4.1.2.2-1.el7.centos >>>>>> >>>>>> > For the hosted engine vm to be imported and displayed in the engine, you >>>>>> > must first create a master storage domain. >>>>>> >>>>>> To provide a bit more detail: this was a migration of a bare-metal >>>>>> engine in an existing cluster to a hosted engine VM for that cluster. >>>>>> As part of this migration, I built an entirely new host and ran >>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>> >>>>>> http://www.ovirt.org/documentation/self-hosted/ chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/). >>>>>> I restored the backup from the engine and it completed without any >>>>>> errors. I didn't see any instructions regarding a master storage >>>>>> domain in the page above. The cluster has two existing master storage >>>>>> domains, one is fibre channel, which is up, and one ISO domain, which >>>>>> is currently offline. >>>>>> >>>>>> > What do you mean the hosted engine commands are failing? What happens >>>>>> > when >>>>>> > you run hosted-engine --vm-status now? >>>>>> >>>>>> Interestingly, whereas when I ran it before, it exited with no output >>>>>> and a return code of '1', it now reports: >>>>>> >>>>>> --== Host 1 status ==-- >>>>>> >>>>>> conf_on_shared_storage : True >>>>>> Status up-to-date : False >>>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>>>>> Host ID : 1 >>>>>> Engine status : unknown stale-data >>>>>> Score : 0 >>>>>> stopped : True >>>>>> Local maintenance : False >>>>>> crc32 : 0217f07b >>>>>> local_conf_timestamp : 2911 >>>>>> Host timestamp : 2897 >>>>>> Extra metadata (valid at timestamp): >>>>>> metadata_parse_version=1 >>>>>> metadata_feature_version=1 >>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>> host-id=1 >>>>>> score=0 >>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>> conf_on_shared_storage=True >>>>>> maintenance=False >>>>>> state=AgentStopped >>>>>> stopped=True >>>>>> >>>>>> Yet I can login to the web GUI fine. I guess it is not HA due to being >>>>>> in an unknown state currently? Does the hosted-engine-ha rpm need to >>>>>> be installed across all nodes in the cluster, btw? >>>>>> >>>>>> Thanks for the help, >>>>>> >>>>>> Cam >>>>>> >>>>>> > >>>>>> > Jenny Tokar >>>>>> > >>>>>> > >>>>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >> >>>>>> >> Hi, >>>>>> >> >>>>>> >> I've migrated from a bare-metal engine to a hosted engine. There were >>>>>> >> no errors during the install, however, the hosted engine did not get >>>>>> >> started. I tried running: >>>>>> >> >>>>>> >> hosted-engine --status >>>>>> >> >>>>>> >> on the host I deployed it on, and it returns nothing (exit code is 1 >>>>>> >> however). I could not ping it either. So I tried starting it via >>>>>> >> 'hosted-engine --vm-start' and it returned: >>>>>> >> >>>>>> >> Virtual machine does not exist >>>>>> >> >>>>>> >> But it then became available. I logged into it successfully. It is not >>>>>> >> in the list of VMs however. >>>>>> >> >>>>>> >> Any ideas why the hosted-engine commands fail, and why it is not in >>>>>> >> the list of virtual machines? >>>>>> >> >>>>>> >> Thanks for any help, >>>>>> >> >>>>>> >> Cam >>>>>> >> _______________________________________________ >>>>>> >> Users mailing list >>>>>> >> Users(a)ovirt.org >>>>>> >> http://lists.ovirt.org/mailman/listinfo/users >>>>>> > >>>>>> > >>>>> >>>>> >>> _______________________________________________ >>> Users mailing list >>> Users(a)ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users(a)ovirt.org http://lists.ovirt.org/mailman/listinfo/users

cmc

4:39 a.m.

Hi Yanir, Thanks for the reply.

...

First of all, maybe a chain reaction of : WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS is causing the hosted engine vm not to be set up correctly and further actions were made when the hosted engine vm wasnt in a stable state. As for now, are you trying to revert back to a previous/initial state ?

I'm not trying to revert it to a previous state for now. This was a migration from a bare metal engine, and it didn't report any error during the migration. I'd had some problems on my first attempts at this migration, whereby it never completed (due to a proxy issue) but I managed to resolve this. Do you know of a way to get the Hosted Engine VM into a stable state, without rebuilding the entire cluster from scratch (since I have a lot of VMs on it)? Thanks for any help. Regards, Cam

...

Regards, Yanir On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: > > Hi Jenny/Martin, > > Any idea what I can do here? The hosted engine VM has no log on any > host in /var/log/libvirt/qemu, and I fear that if I need to put the > host into maintenance, e.g., to upgrade it that I created it on (which > I think is hosting it), or if it fails for any reason, it won't get > migrated to another host, and I will not be able to manage the > cluster. It seems to be a very dangerous position to be in. > > Thanks, > > Cam > > On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: > > Thanks Martin. The hosts are all part of the same cluster. > > > > I get these errors in the engine.log on the engine: > > > > 2017-06-19 03:28:05,030Z WARN > > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' > > failed for user SYST > > EM. Reasons: > > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS > > 2017-06-19 03:28:05,030Z INFO > > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object > > 'EngineLock:{exclusiveLocks='[a > > 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, > > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, > > HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', > > sharedLocks= > > '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, > > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' > > 2017-06-19 03:28:05,030Z ERROR > > [org.ovirt.engine.core.bll.HostedEngineImporter] > > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted > > Engine VM > > > > The sanlock.log reports conflicts on that same host, and a different > > error on the other hosts, not sure if they are related. > > > > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host > > which I deployed the hosted engine VM on: > > > > MainThread::ERROR::2017-06-19 > > > > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) > > Unable to extract HEVM OVF > > MainThread::ERROR::2017-06-19 > > > > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > > Failed extracting VM OVF from the OVF_STORE volume, falling back to > > initial vm.conf > > > > I've seen some of these issues reported in bugzilla, but they were for > > older versions of oVirt (and appear to be resolved). > > > > I will install that package on the other two hosts, for which I will > > put them in maintenance as vdsm is installed as an upgrade. I guess > > restarting vdsm is a good idea after that? > > > > Thanks, > > > > Campbell > > > > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> > > wrote: > >> Hi, > >> > >> you do not have to install it on all hosts. But you should have more > >> than one and ideally all hosted engine enabled nodes should belong to > >> the same engine cluster. > >> > >> Best regards > >> > >> Martin Sivak > >> > >> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: > >>> Hi Jenny, > >>> > >>> Does ovirt-hosted-engine-ha need to be installed across all hosts? > >>> Could that be the reason it is failing to see it properly? > >>> > >>> Thanks, > >>> > >>> Cam > >>> > >>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: > >>>> Hi Jenny, > >>>> > >>>> Logs are attached. I can see errors in there, but am unsure how they > >>>> arose. > >>>> > >>>> Thanks, > >>>> > >>>> Campbell > >>>> > >>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> > >>>> wrote: > >>>>> From the output it looks like the agent is down, try starting it by > >>>>> running: > >>>>> systemctl start ovirt-ha-agent. > >>>>> > >>>>> The engine is supposed to see the hosted engine storage domain and > >>>>> import it > >>>>> to the system, then it should import the hosted engine vm. > >>>>> > >>>>> Can you attach the agent log from the host > >>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) > >>>>> and the engine log from the engine vm > >>>>> (/var/log/ovirt-engine/engine.log)? > >>>>> > >>>>> Thanks, > >>>>> Jenny > >>>>> > >>>>> > >>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: > >>>>>> > >>>>>> Hi Jenny, > >>>>>> > >>>>>> > What version are you running? > >>>>>> > >>>>>> 4.1.2.2-1.el7.centos > >>>>>> > >>>>>> > For the hosted engine vm to be imported and displayed in the > >>>>>> > engine, you > >>>>>> > must first create a master storage domain. > >>>>>> > >>>>>> To provide a bit more detail: this was a migration of a bare-metal > >>>>>> engine in an existing cluster to a hosted engine VM for that > >>>>>> cluster. > >>>>>> As part of this migration, I built an entirely new host and ran > >>>>>> 'hosted-engine --deploy' (followed these instructions: > >>>>>> > >>>>>> > >>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). > >>>>>> I restored the backup from the engine and it completed without any > >>>>>> errors. I didn't see any instructions regarding a master storage > >>>>>> domain in the page above. The cluster has two existing master > >>>>>> storage > >>>>>> domains, one is fibre channel, which is up, and one ISO domain, > >>>>>> which > >>>>>> is currently offline. > >>>>>> > >>>>>> > What do you mean the hosted engine commands are failing? What > >>>>>> > happens > >>>>>> > when > >>>>>> > you run hosted-engine --vm-status now? > >>>>>> > >>>>>> Interestingly, whereas when I ran it before, it exited with no > >>>>>> output > >>>>>> and a return code of '1', it now reports: > >>>>>> > >>>>>> --== Host 1 status ==-- > >>>>>> > >>>>>> conf_on_shared_storage : True > >>>>>> Status up-to-date : False > >>>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk > >>>>>> Host ID : 1 > >>>>>> Engine status : unknown stale-data > >>>>>> Score : 0 > >>>>>> stopped : True > >>>>>> Local maintenance : False > >>>>>> crc32 : 0217f07b > >>>>>> local_conf_timestamp : 2911 > >>>>>> Host timestamp : 2897 > >>>>>> Extra metadata (valid at timestamp): > >>>>>> metadata_parse_version=1 > >>>>>> metadata_feature_version=1 > >>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) > >>>>>> host-id=1 > >>>>>> score=0 > >>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) > >>>>>> conf_on_shared_storage=True > >>>>>> maintenance=False > >>>>>> state=AgentStopped > >>>>>> stopped=True > >>>>>> > >>>>>> Yet I can login to the web GUI fine. I guess it is not HA due to > >>>>>> being > >>>>>> in an unknown state currently? Does the hosted-engine-ha rpm need > >>>>>> to > >>>>>> be installed across all nodes in the cluster, btw? > >>>>>> > >>>>>> Thanks for the help, > >>>>>> > >>>>>> Cam > >>>>>> > >>>>>> > > >>>>>> > Jenny Tokar > >>>>>> > > >>>>>> > > >>>>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: > >>>>>> >> > >>>>>> >> Hi, > >>>>>> >> > >>>>>> >> I've migrated from a bare-metal engine to a hosted engine. There > >>>>>> >> were > >>>>>> >> no errors during the install, however, the hosted engine did not > >>>>>> >> get > >>>>>> >> started. I tried running: > >>>>>> >> > >>>>>> >> hosted-engine --status > >>>>>> >> > >>>>>> >> on the host I deployed it on, and it returns nothing (exit code > >>>>>> >> is 1 > >>>>>> >> however). I could not ping it either. So I tried starting it via > >>>>>> >> 'hosted-engine --vm-start' and it returned: > >>>>>> >> > >>>>>> >> Virtual machine does not exist > >>>>>> >> > >>>>>> >> But it then became available. I logged into it successfully. It > >>>>>> >> is not > >>>>>> >> in the list of VMs however. > >>>>>> >> > >>>>>> >> Any ideas why the hosted-engine commands fail, and why it is not > >>>>>> >> in > >>>>>> >> the list of virtual machines? > >>>>>> >> > >>>>>> >> Thanks for any help, > >>>>>> >> > >>>>>> >> Cam > >>>>>> >> _______________________________________________ > >>>>>> >> Users mailing list > >>>>>> >> Users(a)ovirt.org > >>>>>> >> http://lists.ovirt.org/mailman/listinfo/users > >>>>>> > > >>>>>> > > >>>>> > >>>>> > >>> _______________________________________________ > >>> Users mailing list > >>> Users(a)ovirt.org > >>> http://lists.ovirt.org/mailman/listinfo/users > _______________________________________________ > Users mailing list > Users(a)ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

Martin Sivak

4:47 a.m.

Hi, just as a random comment, do you still have the database backup from the bare metal -> VM attempt? It might be possible to just try again using it. Or in the worst case.. update the offending value there before restoring it to the new engine instance. Regards Martin Sivak On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote:

...

Hi Yanir, Thanks for the reply. > First of all, maybe a chain reaction of : > WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS > is causing the hosted engine vm not to be set up correctly and further > actions were made when the hosted engine vm wasnt in a stable state. > > As for now, are you trying to revert back to a previous/initial state ? I'm not trying to revert it to a previous state for now. This was a migration from a bare metal engine, and it didn't report any error during the migration. I'd had some problems on my first attempts at this migration, whereby it never completed (due to a proxy issue) but I managed to resolve this. Do you know of a way to get the Hosted Engine VM into a stable state, without rebuilding the entire cluster from scratch (since I have a lot of VMs on it)? Thanks for any help. Regards, Cam > Regards, > Yanir > > On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >> >> Hi Jenny/Martin, >> >> Any idea what I can do here? The hosted engine VM has no log on any >> host in /var/log/libvirt/qemu, and I fear that if I need to put the >> host into maintenance, e.g., to upgrade it that I created it on (which >> I think is hosting it), or if it fails for any reason, it won't get >> migrated to another host, and I will not be able to manage the >> cluster. It seems to be a very dangerous position to be in. >> >> Thanks, >> >> Cam >> >> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >> > Thanks Martin. The hosts are all part of the same cluster. >> > >> > I get these errors in the engine.log on the engine: >> > >> > 2017-06-19 03:28:05,030Z WARN >> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >> > failed for user SYST >> > EM. Reasons: >> > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >> > 2017-06-19 03:28:05,030Z INFO >> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >> > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >> > 'EngineLock:{exclusiveLocks='[a >> > 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >> > HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >> > sharedLocks= >> > '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >> > 2017-06-19 03:28:05,030Z ERROR >> > [org.ovirt.engine.core.bll.HostedEngineImporter] >> > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >> > Engine VM >> > >> > The sanlock.log reports conflicts on that same host, and a different >> > error on the other hosts, not sure if they are related. >> > >> > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host >> > which I deployed the hosted engine VM on: >> > >> > MainThread::ERROR::2017-06-19 >> > >> > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >> > Unable to extract HEVM OVF >> > MainThread::ERROR::2017-06-19 >> > >> > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> > Failed extracting VM OVF from the OVF_STORE volume, falling back to >> > initial vm.conf >> > >> > I've seen some of these issues reported in bugzilla, but they were for >> > older versions of oVirt (and appear to be resolved). >> > >> > I will install that package on the other two hosts, for which I will >> > put them in maintenance as vdsm is installed as an upgrade. I guess >> > restarting vdsm is a good idea after that? >> > >> > Thanks, >> > >> > Campbell >> > >> > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >> > wrote: >> >> Hi, >> >> >> >> you do not have to install it on all hosts. But you should have more >> >> than one and ideally all hosted engine enabled nodes should belong to >> >> the same engine cluster. >> >> >> >> Best regards >> >> >> >> Martin Sivak >> >> >> >> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >> >>> Hi Jenny, >> >>> >> >>> Does ovirt-hosted-engine-ha need to be installed across all hosts? >> >>> Could that be the reason it is failing to see it properly? >> >>> >> >>> Thanks, >> >>> >> >>> Cam >> >>> >> >>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >> >>>> Hi Jenny, >> >>>> >> >>>> Logs are attached. I can see errors in there, but am unsure how they >> >>>> arose. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Campbell >> >>>> >> >>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> >> >>>> wrote: >> >>>>> From the output it looks like the agent is down, try starting it by >> >>>>> running: >> >>>>> systemctl start ovirt-ha-agent. >> >>>>> >> >>>>> The engine is supposed to see the hosted engine storage domain and >> >>>>> import it >> >>>>> to the system, then it should import the hosted engine vm. >> >>>>> >> >>>>> Can you attach the agent log from the host >> >>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >> >>>>> and the engine log from the engine vm >> >>>>> (/var/log/ovirt-engine/engine.log)? >> >>>>> >> >>>>> Thanks, >> >>>>> Jenny >> >>>>> >> >>>>> >> >>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >> >>>>>> >> >>>>>> Hi Jenny, >> >>>>>> >> >>>>>> > What version are you running? >> >>>>>> >> >>>>>> 4.1.2.2-1.el7.centos >> >>>>>> >> >>>>>> > For the hosted engine vm to be imported and displayed in the >> >>>>>> > engine, you >> >>>>>> > must first create a master storage domain. >> >>>>>> >> >>>>>> To provide a bit more detail: this was a migration of a bare-metal >> >>>>>> engine in an existing cluster to a hosted engine VM for that >> >>>>>> cluster. >> >>>>>> As part of this migration, I built an entirely new host and ran >> >>>>>> 'hosted-engine --deploy' (followed these instructions: >> >>>>>> >> >>>>>> >> >>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >> >>>>>> I restored the backup from the engine and it completed without any >> >>>>>> errors. I didn't see any instructions regarding a master storage >> >>>>>> domain in the page above. The cluster has two existing master >> >>>>>> storage >> >>>>>> domains, one is fibre channel, which is up, and one ISO domain, >> >>>>>> which >> >>>>>> is currently offline. >> >>>>>> >> >>>>>> > What do you mean the hosted engine commands are failing? What >> >>>>>> > happens >> >>>>>> > when >> >>>>>> > you run hosted-engine --vm-status now? >> >>>>>> >> >>>>>> Interestingly, whereas when I ran it before, it exited with no >> >>>>>> output >> >>>>>> and a return code of '1', it now reports: >> >>>>>> >> >>>>>> --== Host 1 status ==-- >> >>>>>> >> >>>>>> conf_on_shared_storage : True >> >>>>>> Status up-to-date : False >> >>>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >> >>>>>> Host ID : 1 >> >>>>>> Engine status : unknown stale-data >> >>>>>> Score : 0 >> >>>>>> stopped : True >> >>>>>> Local maintenance : False >> >>>>>> crc32 : 0217f07b >> >>>>>> local_conf_timestamp : 2911 >> >>>>>> Host timestamp : 2897 >> >>>>>> Extra metadata (valid at timestamp): >> >>>>>> metadata_parse_version=1 >> >>>>>> metadata_feature_version=1 >> >>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >> >>>>>> host-id=1 >> >>>>>> score=0 >> >>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >> >>>>>> conf_on_shared_storage=True >> >>>>>> maintenance=False >> >>>>>> state=AgentStopped >> >>>>>> stopped=True >> >>>>>> >> >>>>>> Yet I can login to the web GUI fine. I guess it is not HA due to >> >>>>>> being >> >>>>>> in an unknown state currently? Does the hosted-engine-ha rpm need >> >>>>>> to >> >>>>>> be installed across all nodes in the cluster, btw? >> >>>>>> >> >>>>>> Thanks for the help, >> >>>>>> >> >>>>>> Cam >> >>>>>> >> >>>>>> > >> >>>>>> > Jenny Tokar >> >>>>>> > >> >>>>>> > >> >>>>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >> >>>>>> >> >> >>>>>> >> Hi, >> >>>>>> >> >> >>>>>> >> I've migrated from a bare-metal engine to a hosted engine. There >> >>>>>> >> were >> >>>>>> >> no errors during the install, however, the hosted engine did not >> >>>>>> >> get >> >>>>>> >> started. I tried running: >> >>>>>> >> >> >>>>>> >> hosted-engine --status >> >>>>>> >> >> >>>>>> >> on the host I deployed it on, and it returns nothing (exit code >> >>>>>> >> is 1 >> >>>>>> >> however). I could not ping it either. So I tried starting it via >> >>>>>> >> 'hosted-engine --vm-start' and it returned: >> >>>>>> >> >> >>>>>> >> Virtual machine does not exist >> >>>>>> >> >> >>>>>> >> But it then became available. I logged into it successfully. It >> >>>>>> >> is not >> >>>>>> >> in the list of VMs however. >> >>>>>> >> >> >>>>>> >> Any ideas why the hosted-engine commands fail, and why it is not >> >>>>>> >> in >> >>>>>> >> the list of virtual machines? >> >>>>>> >> >> >>>>>> >> Thanks for any help, >> >>>>>> >> >> >>>>>> >> Cam >> >>>>>> >> _______________________________________________ >> >>>>>> >> Users mailing list >> >>>>>> >> Users(a)ovirt.org >> >>>>>> >> http://lists.ovirt.org/mailman/listinfo/users >> >>>>>> > >> >>>>>> > >> >>>>> >> >>>>> >> >>> _______________________________________________ >> >>> Users mailing list >> >>> Users(a)ovirt.org >> >>> http://lists.ovirt.org/mailman/listinfo/users >> _______________________________________________ >> Users mailing list >> Users(a)ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > >

cmc

5:19 a.m.

Hi Martin,

...

just as a random comment, do you still have the database backup from the bare metal -> VM attempt? It might be possible to just try again using it. Or in the worst case.. update the offending value there before restoring it to the new engine instance.

I still have the backup. I'd rather do the latter, as re-running the HE deployment is quite lengthy and involved (I have to re-initialise the FC storage each time). Do you know what the offending value(s) would be? Would it be in the Postgres DB or in a config file somewhere? Cheers, Cam

...

Regards Martin Sivak On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: > Hi Yanir, > > Thanks for the reply. > >> First of all, maybe a chain reaction of : >> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >> is causing the hosted engine vm not to be set up correctly and further >> actions were made when the hosted engine vm wasnt in a stable state. >> >> As for now, are you trying to revert back to a previous/initial state ? > > I'm not trying to revert it to a previous state for now. This was a > migration from a bare metal engine, and it didn't report any error > during the migration. I'd had some problems on my first attempts at > this migration, whereby it never completed (due to a proxy issue) but > I managed to resolve this. Do you know of a way to get the Hosted > Engine VM into a stable state, without rebuilding the entire cluster > from scratch (since I have a lot of VMs on it)? > > Thanks for any help. > > Regards, > > Cam > >> Regards, >> Yanir >> >> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>> >>> Hi Jenny/Martin, >>> >>> Any idea what I can do here? The hosted engine VM has no log on any >>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>> host into maintenance, e.g., to upgrade it that I created it on (which >>> I think is hosting it), or if it fails for any reason, it won't get >>> migrated to another host, and I will not be able to manage the >>> cluster. It seems to be a very dangerous position to be in. >>> >>> Thanks, >>> >>> Cam >>> >>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>> > Thanks Martin. The hosts are all part of the same cluster. >>> > >>> > I get these errors in the engine.log on the engine: >>> > >>> > 2017-06-19 03:28:05,030Z WARN >>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>> > failed for user SYST >>> > EM. Reasons: >>> > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>> > 2017-06-19 03:28:05,030Z INFO >>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>> > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>> > 'EngineLock:{exclusiveLocks='[a >>> > 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>> > HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>> > sharedLocks= >>> > '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>> > 2017-06-19 03:28:05,030Z ERROR >>> > [org.ovirt.engine.core.bll.HostedEngineImporter] >>> > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>> > Engine VM >>> > >>> > The sanlock.log reports conflicts on that same host, and a different >>> > error on the other hosts, not sure if they are related. >>> > >>> > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host >>> > which I deployed the hosted engine VM on: >>> > >>> > MainThread::ERROR::2017-06-19 >>> > >>> > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>> > Unable to extract HEVM OVF >>> > MainThread::ERROR::2017-06-19 >>> > >>> > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> > Failed extracting VM OVF from the OVF_STORE volume, falling back to >>> > initial vm.conf >>> > >>> > I've seen some of these issues reported in bugzilla, but they were for >>> > older versions of oVirt (and appear to be resolved). >>> > >>> > I will install that package on the other two hosts, for which I will >>> > put them in maintenance as vdsm is installed as an upgrade. I guess >>> > restarting vdsm is a good idea after that? >>> > >>> > Thanks, >>> > >>> > Campbell >>> > >>> > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>> > wrote: >>> >> Hi, >>> >> >>> >> you do not have to install it on all hosts. But you should have more >>> >> than one and ideally all hosted engine enabled nodes should belong to >>> >> the same engine cluster. >>> >> >>> >> Best regards >>> >> >>> >> Martin Sivak >>> >> >>> >> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>> >>> Hi Jenny, >>> >>> >>> >>> Does ovirt-hosted-engine-ha need to be installed across all hosts? >>> >>> Could that be the reason it is failing to see it properly? >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Cam >>> >>> >>> >>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>> >>>> Hi Jenny, >>> >>>> >>> >>>> Logs are attached. I can see errors in there, but am unsure how they >>> >>>> arose. >>> >>>> >>> >>>> Thanks, >>> >>>> >>> >>>> Campbell >>> >>>> >>> >>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> >>> >>>> wrote: >>> >>>>> From the output it looks like the agent is down, try starting it by >>> >>>>> running: >>> >>>>> systemctl start ovirt-ha-agent. >>> >>>>> >>> >>>>> The engine is supposed to see the hosted engine storage domain and >>> >>>>> import it >>> >>>>> to the system, then it should import the hosted engine vm. >>> >>>>> >>> >>>>> Can you attach the agent log from the host >>> >>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>> >>>>> and the engine log from the engine vm >>> >>>>> (/var/log/ovirt-engine/engine.log)? >>> >>>>> >>> >>>>> Thanks, >>> >>>>> Jenny >>> >>>>> >>> >>>>> >>> >>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>> >>>>>> >>> >>>>>> Hi Jenny, >>> >>>>>> >>> >>>>>> > What version are you running? >>> >>>>>> >>> >>>>>> 4.1.2.2-1.el7.centos >>> >>>>>> >>> >>>>>> > For the hosted engine vm to be imported and displayed in the >>> >>>>>> > engine, you >>> >>>>>> > must first create a master storage domain. >>> >>>>>> >>> >>>>>> To provide a bit more detail: this was a migration of a bare-metal >>> >>>>>> engine in an existing cluster to a hosted engine VM for that >>> >>>>>> cluster. >>> >>>>>> As part of this migration, I built an entirely new host and ran >>> >>>>>> 'hosted-engine --deploy' (followed these instructions: >>> >>>>>> >>> >>>>>> >>> >>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>> >>>>>> I restored the backup from the engine and it completed without any >>> >>>>>> errors. I didn't see any instructions regarding a master storage >>> >>>>>> domain in the page above. The cluster has two existing master >>> >>>>>> storage >>> >>>>>> domains, one is fibre channel, which is up, and one ISO domain, >>> >>>>>> which >>> >>>>>> is currently offline. >>> >>>>>> >>> >>>>>> > What do you mean the hosted engine commands are failing? What >>> >>>>>> > happens >>> >>>>>> > when >>> >>>>>> > you run hosted-engine --vm-status now? >>> >>>>>> >>> >>>>>> Interestingly, whereas when I ran it before, it exited with no >>> >>>>>> output >>> >>>>>> and a return code of '1', it now reports: >>> >>>>>> >>> >>>>>> --== Host 1 status ==-- >>> >>>>>> >>> >>>>>> conf_on_shared_storage : True >>> >>>>>> Status up-to-date : False >>> >>>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>> >>>>>> Host ID : 1 >>> >>>>>> Engine status : unknown stale-data >>> >>>>>> Score : 0 >>> >>>>>> stopped : True >>> >>>>>> Local maintenance : False >>> >>>>>> crc32 : 0217f07b >>> >>>>>> local_conf_timestamp : 2911 >>> >>>>>> Host timestamp : 2897 >>> >>>>>> Extra metadata (valid at timestamp): >>> >>>>>> metadata_parse_version=1 >>> >>>>>> metadata_feature_version=1 >>> >>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>> >>>>>> host-id=1 >>> >>>>>> score=0 >>> >>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>> >>>>>> conf_on_shared_storage=True >>> >>>>>> maintenance=False >>> >>>>>> state=AgentStopped >>> >>>>>> stopped=True >>> >>>>>> >>> >>>>>> Yet I can login to the web GUI fine. I guess it is not HA due to >>> >>>>>> being >>> >>>>>> in an unknown state currently? Does the hosted-engine-ha rpm need >>> >>>>>> to >>> >>>>>> be installed across all nodes in the cluster, btw? >>> >>>>>> >>> >>>>>> Thanks for the help, >>> >>>>>> >>> >>>>>> Cam >>> >>>>>> >>> >>>>>> > >>> >>>>>> > Jenny Tokar >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>> >>>>>> >> >>> >>>>>> >> Hi, >>> >>>>>> >> >>> >>>>>> >> I've migrated from a bare-metal engine to a hosted engine. There >>> >>>>>> >> were >>> >>>>>> >> no errors during the install, however, the hosted engine did not >>> >>>>>> >> get >>> >>>>>> >> started. I tried running: >>> >>>>>> >> >>> >>>>>> >> hosted-engine --status >>> >>>>>> >> >>> >>>>>> >> on the host I deployed it on, and it returns nothing (exit code >>> >>>>>> >> is 1 >>> >>>>>> >> however). I could not ping it either. So I tried starting it via >>> >>>>>> >> 'hosted-engine --vm-start' and it returned: >>> >>>>>> >> >>> >>>>>> >> Virtual machine does not exist >>> >>>>>> >> >>> >>>>>> >> But it then became available. I logged into it successfully. It >>> >>>>>> >> is not >>> >>>>>> >> in the list of VMs however. >>> >>>>>> >> >>> >>>>>> >> Any ideas why the hosted-engine commands fail, and why it is not >>> >>>>>> >> in >>> >>>>>> >> the list of virtual machines? >>> >>>>>> >> >>> >>>>>> >> Thanks for any help, >>> >>>>>> >> >>> >>>>>> >> Cam >>> >>>>>> >> _______________________________________________ >>> >>>>>> >> Users mailing list >>> >>>>>> >> Users(a)ovirt.org >>> >>>>>> >> http://lists.ovirt.org/mailman/listinfo/users >>> >>>>>> > >>> >>>>>> > >>> >>>>> >>> >>>>> >>> >>> _______________________________________________ >>> >>> Users mailing list >>> >>> Users(a)ovirt.org >>> >>> http://lists.ovirt.org/mailman/listinfo/users >>> _______________________________________________ >>> Users mailing list >>> Users(a)ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >> >>

Martin Sivak

5:31 a.m.

Tomas, what fields are needed in a VM to pass the check that causes the following error?

...

>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS

...

>>>> is causing the hosted engine vm not to be set up correctly and further >>>> actions were made when the hosted engine vm wasnt in a stable state. >>>> >>>> As for now, are you trying to revert back to a previous/initial state ? >>> >>> I'm not trying to revert it to a previous state for now. This was a >>> migration from a bare metal engine, and it didn't report any error >>> during the migration. I'd had some problems on my first attempts at >>> this migration, whereby it never completed (due to a proxy issue) but >>> I managed to resolve this. Do you know of a way to get the Hosted >>> Engine VM into a stable state, without rebuilding the entire cluster >>> from scratch (since I have a lot of VMs on it)? >>> >>> Thanks for any help. >>> >>> Regards, >>> >>> Cam >>> >>>> Regards, >>>> Yanir >>>> >>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>> Hi Jenny/Martin, >>>>> >>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>> host into maintenance, e.g., to upgrade it that I created it on (which >>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>> migrated to another host, and I will not be able to manage the >>>>> cluster. It seems to be a very dangerous position to be in. >>>>> >>>>> Thanks, >>>>> >>>>> Cam >>>>> >>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>> > Thanks Martin. The hosts are all part of the same cluster. >>>>> > >>>>> > I get these errors in the engine.log on the engine: >>>>> > >>>>> > 2017-06-19 03:28:05,030Z WARN >>>>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>>>> > failed for user SYST >>>>> > EM. Reasons: >>>>> > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>> > 2017-06-19 03:28:05,030Z INFO >>>>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>> > 'EngineLock:{exclusiveLocks='[a >>>>> > 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>> > HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>> > sharedLocks= >>>>> > '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>> > 2017-06-19 03:28:05,030Z ERROR >>>>> > [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>> > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>> > Engine VM >>>>> > >>>>> > The sanlock.log reports conflicts on that same host, and a different >>>>> > error on the other hosts, not sure if they are related. >>>>> > >>>>> > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host >>>>> > which I deployed the hosted engine VM on: >>>>> > >>>>> > MainThread::ERROR::2017-06-19 >>>>> > >>>>> > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>> > Unable to extract HEVM OVF >>>>> > MainThread::ERROR::2017-06-19 >>>>> > >>>>> > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> > Failed extracting VM OVF from the OVF_STORE volume, falling back to >>>>> > initial vm.conf >>>>> > >>>>> > I've seen some of these issues reported in bugzilla, but they were for >>>>> > older versions of oVirt (and appear to be resolved). >>>>> > >>>>> > I will install that package on the other two hosts, for which I will >>>>> > put them in maintenance as vdsm is installed as an upgrade. I guess >>>>> > restarting vdsm is a good idea after that? >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Campbell >>>>> > >>>>> > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>> > wrote: >>>>> >> Hi, >>>>> >> >>>>> >> you do not have to install it on all hosts. But you should have more >>>>> >> than one and ideally all hosted engine enabled nodes should belong to >>>>> >> the same engine cluster. >>>>> >> >>>>> >> Best regards >>>>> >> >>>>> >> Martin Sivak >>>>> >> >>>>> >> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>> Hi Jenny, >>>>> >>> >>>>> >>> Does ovirt-hosted-engine-ha need to be installed across all hosts? >>>>> >>> Could that be the reason it is failing to see it properly? >>>>> >>> >>>>> >>> Thanks, >>>>> >>> >>>>> >>> Cam >>>>> >>> >>>>> >>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>> Hi Jenny, >>>>> >>>> >>>>> >>>> Logs are attached. I can see errors in there, but am unsure how they >>>>> >>>> arose. >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> >>>>> >>>> Campbell >>>>> >>>> >>>>> >>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> >>>>> >>>> wrote: >>>>> >>>>> From the output it looks like the agent is down, try starting it by >>>>> >>>>> running: >>>>> >>>>> systemctl start ovirt-ha-agent. >>>>> >>>>> >>>>> >>>>> The engine is supposed to see the hosted engine storage domain and >>>>> >>>>> import it >>>>> >>>>> to the system, then it should import the hosted engine vm. >>>>> >>>>> >>>>> >>>>> Can you attach the agent log from the host >>>>> >>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>> >>>>> and the engine log from the engine vm >>>>> >>>>> (/var/log/ovirt-engine/engine.log)? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Jenny >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>>> >>>>> >>>>>> Hi Jenny, >>>>> >>>>>> >>>>> >>>>>> > What version are you running? >>>>> >>>>>> >>>>> >>>>>> 4.1.2.2-1.el7.centos >>>>> >>>>>> >>>>> >>>>>> > For the hosted engine vm to be imported and displayed in the >>>>> >>>>>> > engine, you >>>>> >>>>>> > must first create a master storage domain. >>>>> >>>>>> >>>>> >>>>>> To provide a bit more detail: this was a migration of a bare-metal >>>>> >>>>>> engine in an existing cluster to a hosted engine VM for that >>>>> >>>>>> cluster. >>>>> >>>>>> As part of this migration, I built an entirely new host and ran >>>>> >>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>> >>>>>> I restored the backup from the engine and it completed without any >>>>> >>>>>> errors. I didn't see any instructions regarding a master storage >>>>> >>>>>> domain in the page above. The cluster has two existing master >>>>> >>>>>> storage >>>>> >>>>>> domains, one is fibre channel, which is up, and one ISO domain, >>>>> >>>>>> which >>>>> >>>>>> is currently offline. >>>>> >>>>>> >>>>> >>>>>> > What do you mean the hosted engine commands are failing? What >>>>> >>>>>> > happens >>>>> >>>>>> > when >>>>> >>>>>> > you run hosted-engine --vm-status now? >>>>> >>>>>> >>>>> >>>>>> Interestingly, whereas when I ran it before, it exited with no >>>>> >>>>>> output >>>>> >>>>>> and a return code of '1', it now reports: >>>>> >>>>>> >>>>> >>>>>> --== Host 1 status ==-- >>>>> >>>>>> >>>>> >>>>>> conf_on_shared_storage : True >>>>> >>>>>> Status up-to-date : False >>>>> >>>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>>>> >>>>>> Host ID : 1 >>>>> >>>>>> Engine status : unknown stale-data >>>>> >>>>>> Score : 0 >>>>> >>>>>> stopped : True >>>>> >>>>>> Local maintenance : False >>>>> >>>>>> crc32 : 0217f07b >>>>> >>>>>> local_conf_timestamp : 2911 >>>>> >>>>>> Host timestamp : 2897 >>>>> >>>>>> Extra metadata (valid at timestamp): >>>>> >>>>>> metadata_parse_version=1 >>>>> >>>>>> metadata_feature_version=1 >>>>> >>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>> >>>>>> host-id=1 >>>>> >>>>>> score=0 >>>>> >>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>> >>>>>> conf_on_shared_storage=True >>>>> >>>>>> maintenance=False >>>>> >>>>>> state=AgentStopped >>>>> >>>>>> stopped=True >>>>> >>>>>> >>>>> >>>>>> Yet I can login to the web GUI fine. I guess it is not HA due to >>>>> >>>>>> being >>>>> >>>>>> in an unknown state currently? Does the hosted-engine-ha rpm need >>>>> >>>>>> to >>>>> >>>>>> be installed across all nodes in the cluster, btw? >>>>> >>>>>> >>>>> >>>>>> Thanks for the help, >>>>> >>>>>> >>>>> >>>>>> Cam >>>>> >>>>>> >>>>> >>>>>> > >>>>> >>>>>> > Jenny Tokar >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>>> >> >>>>> >>>>>> >> Hi, >>>>> >>>>>> >> >>>>> >>>>>> >> I've migrated from a bare-metal engine to a hosted engine. There >>>>> >>>>>> >> were >>>>> >>>>>> >> no errors during the install, however, the hosted engine did not >>>>> >>>>>> >> get >>>>> >>>>>> >> started. I tried running: >>>>> >>>>>> >> >>>>> >>>>>> >> hosted-engine --status >>>>> >>>>>> >> >>>>> >>>>>> >> on the host I deployed it on, and it returns nothing (exit code >>>>> >>>>>> >> is 1 >>>>> >>>>>> >> however). I could not ping it either. So I tried starting it via >>>>> >>>>>> >> 'hosted-engine --vm-start' and it returned: >>>>> >>>>>> >> >>>>> >>>>>> >> Virtual machine does not exist >>>>> >>>>>> >> >>>>> >>>>>> >> But it then became available. I logged into it successfully. It >>>>> >>>>>> >> is not >>>>> >>>>>> >> in the list of VMs however. >>>>> >>>>>> >> >>>>> >>>>>> >> Any ideas why the hosted-engine commands fail, and why it is not >>>>> >>>>>> >> in >>>>> >>>>>> >> the list of virtual machines? >>>>> >>>>>> >> >>>>> >>>>>> >> Thanks for any help, >>>>> >>>>>> >> >>>>> >>>>>> >> Cam >>>>> >>>>>> >> _______________________________________________ >>>>> >>>>>> >> Users mailing list >>>>> >>>>>> >> Users(a)ovirt.org >>>>> >>>>>> >> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>> >>>>> >>>>> >>>>> >>> _______________________________________________ >>>>> >>> Users mailing list >>>>> >>> Users(a)ovirt.org >>>>> >>> http://lists.ovirt.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users(a)ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>>

Michal Skrivanek

5:38 a.m.

...

On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: Tomas, what fields are needed in a VM to pass the check that causes the following error? >>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS

to match the OS and VM Display type;-) Configuration is in osinfo….e.g. if that is import from older releases on Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE VMs

...

Thanks. On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: > Hi Martin, > >> >> just as a random comment, do you still have the database backup from >> the bare metal -> VM attempt? It might be possible to just try again >> using it. Or in the worst case.. update the offending value there >> before restoring it to the new engine instance. > > I still have the backup. I'd rather do the latter, as re-running the > HE deployment is quite lengthy and involved (I have to re-initialise > the FC storage each time). Do you know what the offending value(s) > would be? Would it be in the Postgres DB or in a config file > somewhere? > > Cheers, > > Cam > >> Regards >> >> Martin Sivak >> >> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>> Hi Yanir, >>> >>> Thanks for the reply. >>> >>>> First of all, maybe a chain reaction of : >>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>> is causing the hosted engine vm not to be set up correctly and further >>>> actions were made when the hosted engine vm wasnt in a stable state. >>>> >>>> As for now, are you trying to revert back to a previous/initial state ? >>> >>> I'm not trying to revert it to a previous state for now. This was a >>> migration from a bare metal engine, and it didn't report any error >>> during the migration. I'd had some problems on my first attempts at >>> this migration, whereby it never completed (due to a proxy issue) but >>> I managed to resolve this. Do you know of a way to get the Hosted >>> Engine VM into a stable state, without rebuilding the entire cluster >>> from scratch (since I have a lot of VMs on it)? >>> >>> Thanks for any help. >>> >>> Regards, >>> >>> Cam >>> >>>> Regards, >>>> Yanir >>>> >>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>> Hi Jenny/Martin, >>>>> >>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>> host into maintenance, e.g., to upgrade it that I created it on (which >>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>> migrated to another host, and I will not be able to manage the >>>>> cluster. It seems to be a very dangerous position to be in. >>>>> >>>>> Thanks, >>>>> >>>>> Cam >>>>> >>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>> >>>>>> I get these errors in the engine.log on the engine: >>>>>> >>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>>>>> failed for user SYST >>>>>> EM. Reasons: >>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>> sharedLocks= >>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>> Engine VM >>>>>> >>>>>> The sanlock.log reports conflicts on that same host, and a different >>>>>> error on the other hosts, not sure if they are related. >>>>>> >>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host >>>>>> which I deployed the hosted engine VM on: >>>>>> >>>>>> MainThread::ERROR::2017-06-19 >>>>>> >>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Unable to extract HEVM OVF >>>>>> MainThread::ERROR::2017-06-19 >>>>>> >>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back to >>>>>> initial vm.conf >>>>>> >>>>>> I've seen some of these issues reported in bugzilla, but they were for >>>>>> older versions of oVirt (and appear to be resolved). >>>>>> >>>>>> I will install that package on the other two hosts, for which I will >>>>>> put them in maintenance as vdsm is installed as an upgrade. I guess >>>>>> restarting vdsm is a good idea after that? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Campbell >>>>>> >>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> you do not have to install it on all hosts. But you should have more >>>>>>> than one and ideally all hosted engine enabled nodes should belong to >>>>>>> the same engine cluster. >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Martin Sivak >>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> Hi Jenny, >>>>>>>> >>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all hosts? >>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> Hi Jenny, >>>>>>>>> >>>>>>>>> Logs are attached. I can see errors in there, but am unsure how they >>>>>>>>> arose. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Campbell >>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar <etokar(a)redhat.com> >>>>>>>>> wrote: >>>>>>>>>> From the output it looks like the agent is down, try starting it by >>>>>>>>>> running: >>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>> >>>>>>>>>> The engine is supposed to see the hosted engine storage domain and >>>>>>>>>> import it >>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>> >>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>> and the engine log from the engine vm >>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Jenny >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Jenny, >>>>>>>>>>> >>>>>>>>>>>> What version are you running? >>>>>>>>>>> >>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>> >>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>> engine, you >>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>> >>>>>>>>>>> To provide a bit more detail: this was a migration of a bare-metal >>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>> cluster. >>>>>>>>>>> As part of this migration, I built an entirely new host and ran >>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>> I restored the backup from the engine and it completed without any >>>>>>>>>>> errors. I didn't see any instructions regarding a master storage >>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>> storage >>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO domain, >>>>>>>>>>> which >>>>>>>>>>> is currently offline. >>>>>>>>>>> >>>>>>>>>>>> What do you mean the hosted engine commands are failing? What >>>>>>>>>>>> happens >>>>>>>>>>>> when >>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>> >>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with no >>>>>>>>>>> output >>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>> >>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>> >>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>> Status up-to-date : False >>>>>>>>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>> Host ID : 1 >>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>> Score : 0 >>>>>>>>>>> stopped : True >>>>>>>>>>> Local maintenance : False >>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>> host-id=1 >>>>>>>>>>> score=0 >>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>> maintenance=False >>>>>>>>>>> state=AgentStopped >>>>>>>>>>> stopped=True >>>>>>>>>>> >>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due to >>>>>>>>>>> being >>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm need >>>>>>>>>>> to >>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>> >>>>>>>>>>> Thanks for the help, >>>>>>>>>>> >>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. There >>>>>>>>>>>>> were >>>>>>>>>>>>> no errors during the install, however, the hosted engine did not >>>>>>>>>>>>> get >>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>> >>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>> >>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit code >>>>>>>>>>>>> is 1 >>>>>>>>>>>>> however). I could not ping it either. So I tried starting it via >>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>> >>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>> >>>>>>>>>>>>> But it then became available. I logged into it successfully. It >>>>>>>>>>>>> is not >>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>> >>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it is not >>>>>>>>>>>>> in >>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> Users(a)ovirt.org >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users(a)ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> _______________________________________________ Users mailing list Users(a)ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Tomas Jelinek

5:40 a.m.

On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek < michal.skrivanek(a)redhat.com> wrote:

...

> On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: > > Tomas, what fields are needed in a VM to pass the check that causes > the following error? > >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_ TYPE_IS_NOT_SUPPORTED_BY_OS to match the OS and VM Display type;-) Configuration is in osinfo….e.g. if that is import from older releases on Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE VMs

yep, the default supported combinations for 4.0+ is this: os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus

...

> > Thanks. > > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >> Hi Martin, >> >>> >>> just as a random comment, do you still have the database backup from >>> the bare metal -> VM attempt? It might be possible to just try again >>> using it. Or in the worst case.. update the offending value there >>> before restoring it to the new engine instance. >> >> I still have the backup. I'd rather do the latter, as re-running the >> HE deployment is quite lengthy and involved (I have to re-initialise >> the FC storage each time). Do you know what the offending value(s) >> would be? Would it be in the Postgres DB or in a config file >> somewhere? >> >> Cheers, >> >> Cam >> >>> Regards >>> >>> Martin Sivak >>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>> Hi Yanir, >>>> >>>> Thanks for the reply. >>>> >>>>> First of all, maybe a chain reaction of : >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_ TYPE_IS_NOT_SUPPORTED_BY_OS >>>>> is causing the hosted engine vm not to be set up correctly and further >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>> >>>>> As for now, are you trying to revert back to a previous/initial state ? >>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>> migration from a bare metal engine, and it didn't report any error >>>> during the migration. I'd had some problems on my first attempts at >>>> this migration, whereby it never completed (due to a proxy issue) but >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>> from scratch (since I have a lot of VMs on it)? >>>> >>>> Thanks for any help. >>>> >>>> Regards, >>>> >>>> Cam >>>> >>>>> Regards, >>>>> Yanir >>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >>>>>> Hi Jenny/Martin, >>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>> host into maintenance, e.g., to upgrade it that I created it on (which >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>> migrated to another host, and I will not be able to manage the >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm' >>>>>>> failed for user SYST >>>>>>> EM. Reasons: >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>> sharedLocks= >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>> Engine VM >>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a different >>>>>>> error on the other hosts, not sure if they are related. >>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host >>>>>>> which I deployed the hosted engine VM on: >>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>> Unable to extract HEVM OVF >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back to >>>>>>> initial vm.conf >>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were for >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>> >>>>>>> I will install that package on the other two hosts, for which I will >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I guess >>>>>>> restarting vdsm is a good idea after that? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Campbell >>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have more >>>>>>>> than one and ideally all hosted engine enabled nodes should belong to >>>>>>>> the same engine cluster. >>>>>>>> >>>>>>>> Best regards >>>>>>>> >>>>>>>> Martin Sivak >>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> Hi Jenny, >>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all hosts? >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Cam >>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> Hi Jenny, >>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how they >>>>>>>>>> arose. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar < etokar(a)redhat.com> >>>>>>>>>> wrote: >>>>>>>>>>> From the output it looks like the agent is down, try starting it by >>>>>>>>>>> running: >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain and >>>>>>>>>>> import it >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Jenny >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>> engine, you >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a bare-metal >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>> cluster. >>>>>>>>>>>> As part of this migration, I built an entirely new host and ran >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/ chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/). >>>>>>>>>>>> I restored the backup from the engine and it completed without any >>>>>>>>>>>> errors. I didn't see any instructions regarding a master storage >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>> storage >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO domain, >>>>>>>>>>>> which >>>>>>>>>>>> is currently offline. >>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? What >>>>>>>>>>>>> happens >>>>>>>>>>>>> when >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with no >>>>>>>>>>>> output >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>> Hostname : kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>> Score : 0 >>>>>>>>>>>> stopped : True >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>> host-id=1 >>>>>>>>>>>> score=0 >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>> maintenance=False >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>> stopped=True >>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due to >>>>>>>>>>>> being >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm need >>>>>>>>>>>> to >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. There >>>>>>>>>>>>>> were >>>>>>>>>>>>>> no errors during the install, however, the hosted engine did not >>>>>>>>>>>>>> get >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit code >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting it via >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it successfully. It >>>>>>>>>>>>>> is not >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it is not >>>>>>>>>>>>>> in >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list >>>>>>>>> Users(a)ovirt.org >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users(a)ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> > _______________________________________________ > Users mailing list > Users(a)ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >

cmc

7:07 a.m.

...

On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek <michal.skrivanek(a)redhat.com> wrote: > > > > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: > > > > Tomas, what fields are needed in a VM to pass the check that causes > > the following error? > > > >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action > >>>>> 'ImportVm' > >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT > >>>>> > >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS > > to match the OS and VM Display type;-) > Configuration is in osinfo….e.g. if that is import from older releases on > Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE > VMs yep, the default supported combinations for 4.0+ is this: os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus > > > > > > Thanks. > > > > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: > >> Hi Martin, > >> > >>> > >>> just as a random comment, do you still have the database backup from > >>> the bare metal -> VM attempt? It might be possible to just try again > >>> using it. Or in the worst case.. update the offending value there > >>> before restoring it to the new engine instance. > >> > >> I still have the backup. I'd rather do the latter, as re-running the > >> HE deployment is quite lengthy and involved (I have to re-initialise > >> the FC storage each time). Do you know what the offending value(s) > >> would be? Would it be in the Postgres DB or in a config file > >> somewhere? > >> > >> Cheers, > >> > >> Cam > >> > >>> Regards > >>> > >>> Martin Sivak > >>> > >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: > >>>> Hi Yanir, > >>>> > >>>> Thanks for the reply. > >>>> > >>>>> First of all, maybe a chain reaction of : > >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action > >>>>> 'ImportVm' > >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT > >>>>> > >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS > >>>>> is causing the hosted engine vm not to be set up correctly and > >>>>> further > >>>>> actions were made when the hosted engine vm wasnt in a stable state. > >>>>> > >>>>> As for now, are you trying to revert back to a previous/initial > >>>>> state ? > >>>> > >>>> I'm not trying to revert it to a previous state for now. This was a > >>>> migration from a bare metal engine, and it didn't report any error > >>>> during the migration. I'd had some problems on my first attempts at > >>>> this migration, whereby it never completed (due to a proxy issue) but > >>>> I managed to resolve this. Do you know of a way to get the Hosted > >>>> Engine VM into a stable state, without rebuilding the entire cluster > >>>> from scratch (since I have a lot of VMs on it)? > >>>> > >>>> Thanks for any help. > >>>> > >>>> Regards, > >>>> > >>>> Cam > >>>> > >>>>> Regards, > >>>>> Yanir > >>>>> > >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: > >>>>>> > >>>>>> Hi Jenny/Martin, > >>>>>> > >>>>>> Any idea what I can do here? The hosted engine VM has no log on any > >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the > >>>>>> host into maintenance, e.g., to upgrade it that I created it on > >>>>>> (which > >>>>>> I think is hosting it), or if it fails for any reason, it won't get > >>>>>> migrated to another host, and I will not be able to manage the > >>>>>> cluster. It seems to be a very dangerous position to be in. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Cam > >>>>>> > >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: > >>>>>>> Thanks Martin. The hosts are all part of the same cluster. > >>>>>>> > >>>>>>> I get these errors in the engine.log on the engine: > >>>>>>> > >>>>>>> 2017-06-19 03:28:05,030Z WARN > >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action > >>>>>>> 'ImportVm' > >>>>>>> failed for user SYST > >>>>>>> EM. Reasons: > >>>>>>> > >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS > >>>>>>> 2017-06-19 03:28:05,030Z INFO > >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] > >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object > >>>>>>> 'EngineLock:{exclusiveLocks='[a > >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, > >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, > >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', > >>>>>>> sharedLocks= > >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, > >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' > >>>>>>> 2017-06-19 03:28:05,030Z ERROR > >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] > >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted > >>>>>>> Engine VM > >>>>>>> > >>>>>>> The sanlock.log reports conflicts on that same host, and a > >>>>>>> different > >>>>>>> error on the other hosts, not sure if they are related. > >>>>>>> > >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the > >>>>>>> host > >>>>>>> which I deployed the hosted engine VM on: > >>>>>>> > >>>>>>> MainThread::ERROR::2017-06-19 > >>>>>>> > >>>>>>> > >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) > >>>>>>> Unable to extract HEVM OVF > >>>>>>> MainThread::ERROR::2017-06-19 > >>>>>>> > >>>>>>> > >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back > >>>>>>> to > >>>>>>> initial vm.conf > >>>>>>> > >>>>>>> I've seen some of these issues reported in bugzilla, but they were > >>>>>>> for > >>>>>>> older versions of oVirt (and appear to be resolved). > >>>>>>> > >>>>>>> I will install that package on the other two hosts, for which I > >>>>>>> will > >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I > >>>>>>> guess > >>>>>>> restarting vdsm is a good idea after that? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Campbell > >>>>>>> > >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> > >>>>>>> wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> you do not have to install it on all hosts. But you should have > >>>>>>>> more > >>>>>>>> than one and ideally all hosted engine enabled nodes should > >>>>>>>> belong to > >>>>>>>> the same engine cluster. > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> Martin Sivak > >>>>>>>> > >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: > >>>>>>>>> Hi Jenny, > >>>>>>>>> > >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all > >>>>>>>>> hosts? > >>>>>>>>> Could that be the reason it is failing to see it properly? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> Cam > >>>>>>>>> > >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: > >>>>>>>>>> Hi Jenny, > >>>>>>>>>> > >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how > >>>>>>>>>> they > >>>>>>>>>> arose. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> > >>>>>>>>>> Campbell > >>>>>>>>>> > >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar > >>>>>>>>>> <etokar(a)redhat.com> > >>>>>>>>>> wrote: > >>>>>>>>>>> From the output it looks like the agent is down, try starting > >>>>>>>>>>> it by > >>>>>>>>>>> running: > >>>>>>>>>>> systemctl start ovirt-ha-agent. > >>>>>>>>>>> > >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain > >>>>>>>>>>> and > >>>>>>>>>>> import it > >>>>>>>>>>> to the system, then it should import the hosted engine vm. > >>>>>>>>>>> > >>>>>>>>>>> Can you attach the agent log from the host > >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) > >>>>>>>>>>> and the engine log from the engine vm > >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Jenny > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Jenny, > >>>>>>>>>>>> > >>>>>>>>>>>>> What version are you running? > >>>>>>>>>>>> > >>>>>>>>>>>> 4.1.2.2-1.el7.centos > >>>>>>>>>>>> > >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the > >>>>>>>>>>>>> engine, you > >>>>>>>>>>>>> must first create a master storage domain. > >>>>>>>>>>>> > >>>>>>>>>>>> To provide a bit more detail: this was a migration of a > >>>>>>>>>>>> bare-metal > >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that > >>>>>>>>>>>> cluster. > >>>>>>>>>>>> As part of this migration, I built an entirely new host and > >>>>>>>>>>>> ran > >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). > >>>>>>>>>>>> I restored the backup from the engine and it completed > >>>>>>>>>>>> without any > >>>>>>>>>>>> errors. I didn't see any instructions regarding a master > >>>>>>>>>>>> storage > >>>>>>>>>>>> domain in the page above. The cluster has two existing master > >>>>>>>>>>>> storage > >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO > >>>>>>>>>>>> domain, > >>>>>>>>>>>> which > >>>>>>>>>>>> is currently offline. > >>>>>>>>>>>> > >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? > >>>>>>>>>>>>> What > >>>>>>>>>>>>> happens > >>>>>>>>>>>>> when > >>>>>>>>>>>>> you run hosted-engine --vm-status now? > >>>>>>>>>>>> > >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with > >>>>>>>>>>>> no > >>>>>>>>>>>> output > >>>>>>>>>>>> and a return code of '1', it now reports: > >>>>>>>>>>>> > >>>>>>>>>>>> --== Host 1 status ==-- > >>>>>>>>>>>> > >>>>>>>>>>>> conf_on_shared_storage : True > >>>>>>>>>>>> Status up-to-date : False > >>>>>>>>>>>> Hostname : > >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk > >>>>>>>>>>>> Host ID : 1 > >>>>>>>>>>>> Engine status : unknown stale-data > >>>>>>>>>>>> Score : 0 > >>>>>>>>>>>> stopped : True > >>>>>>>>>>>> Local maintenance : False > >>>>>>>>>>>> crc32 : 0217f07b > >>>>>>>>>>>> local_conf_timestamp : 2911 > >>>>>>>>>>>> Host timestamp : 2897 > >>>>>>>>>>>> Extra metadata (valid at timestamp): > >>>>>>>>>>>> metadata_parse_version=1 > >>>>>>>>>>>> metadata_feature_version=1 > >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) > >>>>>>>>>>>> host-id=1 > >>>>>>>>>>>> score=0 > >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) > >>>>>>>>>>>> conf_on_shared_storage=True > >>>>>>>>>>>> maintenance=False > >>>>>>>>>>>> state=AgentStopped > >>>>>>>>>>>> stopped=True > >>>>>>>>>>>> > >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due > >>>>>>>>>>>> to > >>>>>>>>>>>> being > >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm > >>>>>>>>>>>> need > >>>>>>>>>>>> to > >>>>>>>>>>>> be installed across all nodes in the cluster, btw? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for the help, > >>>>>>>>>>>> > >>>>>>>>>>>> Cam > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Jenny Tokar > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. > >>>>>>>>>>>>>> There > >>>>>>>>>>>>>> were > >>>>>>>>>>>>>> no errors during the install, however, the hosted engine > >>>>>>>>>>>>>> did not > >>>>>>>>>>>>>> get > >>>>>>>>>>>>>> started. I tried running: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> hosted-engine --status > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit > >>>>>>>>>>>>>> code > >>>>>>>>>>>>>> is 1 > >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting > >>>>>>>>>>>>>> it via > >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Virtual machine does not exist > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> But it then became available. I logged into it > >>>>>>>>>>>>>> successfully. It > >>>>>>>>>>>>>> is not > >>>>>>>>>>>>>> in the list of VMs however. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it > >>>>>>>>>>>>>> is not > >>>>>>>>>>>>>> in > >>>>>>>>>>>>>> the list of virtual machines? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for any help, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cam > >>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>> Users mailing list > >>>>>>>>>>>>>> Users(a)ovirt.org > >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Users mailing list > >>>>>>>>> Users(a)ovirt.org > >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users > >>>>>> _______________________________________________ > >>>>>> Users mailing list > >>>>>> Users(a)ovirt.org > >>>>>> http://lists.ovirt.org/mailman/listinfo/users > >>>>> > >>>>> > > _______________________________________________ > > Users mailing list > > Users(a)ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > >

cmc

Tuesday, 27 June Tue, 27 Jun

6:23 a.m.

I changed the 'os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols as 4 and the hosted engine now appears in the list of VMs. I am guessing the compatibility version was causing it to use the 3.6 version. However, I am still unable to migrate the engine VM to another host. When I try putting the host it is currently on into maintenance, it reports: Error while executing action: Cannot switch the Host(s) to Maintenance mode. There are no available hosts capable of running the engine VM. Running 'hosted-engine --vm-status' still shows 'Engine status: unknown stale-data'. The ovirt-ha-broker service is only running on one host. It was set to 'disabled' in systemd. It won't start as there is no /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. Should it be? It was not in the instructions for the migration from bare-metal to Hosted VM Thanks, Cam On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote:

...

Hi Tomas, So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my engine VM, I have: os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl That seems to match - I assume since this is 4.1, the 3.6 should not apply Is there somewhere else I should be looking? Thanks, Cam On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: > > > On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek > <michal.skrivanek(a)redhat.com> wrote: >> >> >> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >> > >> > Tomas, what fields are needed in a VM to pass the check that causes >> > the following error? >> > >> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >> >>>>> 'ImportVm' >> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >> >>>>> >> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >> >> to match the OS and VM Display type;-) >> Configuration is in osinfo….e.g. if that is import from older releases on >> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >> VMs > > > yep, the default supported combinations for 4.0+ is this: > os.other.devices.display.protocols.value = > spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus > >> >> >> > >> > Thanks. >> > >> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >> >> Hi Martin, >> >> >> >>> >> >>> just as a random comment, do you still have the database backup from >> >>> the bare metal -> VM attempt? It might be possible to just try again >> >>> using it. Or in the worst case.. update the offending value there >> >>> before restoring it to the new engine instance. >> >> >> >> I still have the backup. I'd rather do the latter, as re-running the >> >> HE deployment is quite lengthy and involved (I have to re-initialise >> >> the FC storage each time). Do you know what the offending value(s) >> >> would be? Would it be in the Postgres DB or in a config file >> >> somewhere? >> >> >> >> Cheers, >> >> >> >> Cam >> >> >> >>> Regards >> >>> >> >>> Martin Sivak >> >>> >> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >> >>>> Hi Yanir, >> >>>> >> >>>> Thanks for the reply. >> >>>> >> >>>>> First of all, maybe a chain reaction of : >> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >> >>>>> 'ImportVm' >> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >> >>>>> >> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >> >>>>> is causing the hosted engine vm not to be set up correctly and >> >>>>> further >> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >> >>>>> >> >>>>> As for now, are you trying to revert back to a previous/initial >> >>>>> state ? >> >>>> >> >>>> I'm not trying to revert it to a previous state for now. This was a >> >>>> migration from a bare metal engine, and it didn't report any error >> >>>> during the migration. I'd had some problems on my first attempts at >> >>>> this migration, whereby it never completed (due to a proxy issue) but >> >>>> I managed to resolve this. Do you know of a way to get the Hosted >> >>>> Engine VM into a stable state, without rebuilding the entire cluster >> >>>> from scratch (since I have a lot of VMs on it)? >> >>>> >> >>>> Thanks for any help. >> >>>> >> >>>> Regards, >> >>>> >> >>>> Cam >> >>>> >> >>>>> Regards, >> >>>>> Yanir >> >>>>> >> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >> >>>>>> >> >>>>>> Hi Jenny/Martin, >> >>>>>> >> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >> >>>>>> (which >> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >> >>>>>> migrated to another host, and I will not be able to manage the >> >>>>>> cluster. It seems to be a very dangerous position to be in. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> >> >>>>>> Cam >> >>>>>> >> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >> >>>>>>> >> >>>>>>> I get these errors in the engine.log on the engine: >> >>>>>>> >> >>>>>>> 2017-06-19 03:28:05,030Z WARN >> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >> >>>>>>> 'ImportVm' >> >>>>>>> failed for user SYST >> >>>>>>> EM. Reasons: >> >>>>>>> >> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >> >>>>>>> 2017-06-19 03:28:05,030Z INFO >> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >> >>>>>>> 'EngineLock:{exclusiveLocks='[a >> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >> >>>>>>> sharedLocks= >> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >> >>>>>>> Engine VM >> >>>>>>> >> >>>>>>> The sanlock.log reports conflicts on that same host, and a >> >>>>>>> different >> >>>>>>> error on the other hosts, not sure if they are related. >> >>>>>>> >> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >> >>>>>>> host >> >>>>>>> which I deployed the hosted engine VM on: >> >>>>>>> >> >>>>>>> MainThread::ERROR::2017-06-19 >> >>>>>>> >> >>>>>>> >> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >> >>>>>>> Unable to extract HEVM OVF >> >>>>>>> MainThread::ERROR::2017-06-19 >> >>>>>>> >> >>>>>>> >> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >> >>>>>>> to >> >>>>>>> initial vm.conf >> >>>>>>> >> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >> >>>>>>> for >> >>>>>>> older versions of oVirt (and appear to be resolved). >> >>>>>>> >> >>>>>>> I will install that package on the other two hosts, for which I >> >>>>>>> will >> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >> >>>>>>> guess >> >>>>>>> restarting vdsm is a good idea after that? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> Campbell >> >>>>>>> >> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >> >>>>>>> wrote: >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> you do not have to install it on all hosts. But you should have >> >>>>>>>> more >> >>>>>>>> than one and ideally all hosted engine enabled nodes should >> >>>>>>>> belong to >> >>>>>>>> the same engine cluster. >> >>>>>>>> >> >>>>>>>> Best regards >> >>>>>>>> >> >>>>>>>> Martin Sivak >> >>>>>>>> >> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >> >>>>>>>>> Hi Jenny, >> >>>>>>>>> >> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >> >>>>>>>>> hosts? >> >>>>>>>>> Could that be the reason it is failing to see it properly? >> >>>>>>>>> >> >>>>>>>>> Thanks, >> >>>>>>>>> >> >>>>>>>>> Cam >> >>>>>>>>> >> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >> >>>>>>>>>> Hi Jenny, >> >>>>>>>>>> >> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >> >>>>>>>>>> they >> >>>>>>>>>> arose. >> >>>>>>>>>> >> >>>>>>>>>> Thanks, >> >>>>>>>>>> >> >>>>>>>>>> Campbell >> >>>>>>>>>> >> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >> >>>>>>>>>> <etokar(a)redhat.com> >> >>>>>>>>>> wrote: >> >>>>>>>>>>> From the output it looks like the agent is down, try starting >> >>>>>>>>>>> it by >> >>>>>>>>>>> running: >> >>>>>>>>>>> systemctl start ovirt-ha-agent. >> >>>>>>>>>>> >> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >> >>>>>>>>>>> and >> >>>>>>>>>>> import it >> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >> >>>>>>>>>>> >> >>>>>>>>>>> Can you attach the agent log from the host >> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >> >>>>>>>>>>> and the engine log from the engine vm >> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >> >>>>>>>>>>> >> >>>>>>>>>>> Thanks, >> >>>>>>>>>>> Jenny >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>> Hi Jenny, >> >>>>>>>>>>>> >> >>>>>>>>>>>>> What version are you running? >> >>>>>>>>>>>> >> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >> >>>>>>>>>>>> >> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >> >>>>>>>>>>>>> engine, you >> >>>>>>>>>>>>> must first create a master storage domain. >> >>>>>>>>>>>> >> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >> >>>>>>>>>>>> bare-metal >> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >> >>>>>>>>>>>> cluster. >> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >> >>>>>>>>>>>> ran >> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >> >>>>>>>>>>>> I restored the backup from the engine and it completed >> >>>>>>>>>>>> without any >> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >> >>>>>>>>>>>> storage >> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >> >>>>>>>>>>>> storage >> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >> >>>>>>>>>>>> domain, >> >>>>>>>>>>>> which >> >>>>>>>>>>>> is currently offline. >> >>>>>>>>>>>> >> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >> >>>>>>>>>>>>> What >> >>>>>>>>>>>>> happens >> >>>>>>>>>>>>> when >> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >> >>>>>>>>>>>> >> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >> >>>>>>>>>>>> no >> >>>>>>>>>>>> output >> >>>>>>>>>>>> and a return code of '1', it now reports: >> >>>>>>>>>>>> >> >>>>>>>>>>>> --== Host 1 status ==-- >> >>>>>>>>>>>> >> >>>>>>>>>>>> conf_on_shared_storage : True >> >>>>>>>>>>>> Status up-to-date : False >> >>>>>>>>>>>> Hostname : >> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >> >>>>>>>>>>>> Host ID : 1 >> >>>>>>>>>>>> Engine status : unknown stale-data >> >>>>>>>>>>>> Score : 0 >> >>>>>>>>>>>> stopped : True >> >>>>>>>>>>>> Local maintenance : False >> >>>>>>>>>>>> crc32 : 0217f07b >> >>>>>>>>>>>> local_conf_timestamp : 2911 >> >>>>>>>>>>>> Host timestamp : 2897 >> >>>>>>>>>>>> Extra metadata (valid at timestamp): >> >>>>>>>>>>>> metadata_parse_version=1 >> >>>>>>>>>>>> metadata_feature_version=1 >> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >> >>>>>>>>>>>> host-id=1 >> >>>>>>>>>>>> score=0 >> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >> >>>>>>>>>>>> conf_on_shared_storage=True >> >>>>>>>>>>>> maintenance=False >> >>>>>>>>>>>> state=AgentStopped >> >>>>>>>>>>>> stopped=True >> >>>>>>>>>>>> >> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >> >>>>>>>>>>>> to >> >>>>>>>>>>>> being >> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >> >>>>>>>>>>>> need >> >>>>>>>>>>>> to >> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >> >>>>>>>>>>>> >> >>>>>>>>>>>> Thanks for the help, >> >>>>>>>>>>>> >> >>>>>>>>>>>> Cam >> >>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Jenny Tokar >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >> >>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >> >>>>>>>>>>>>>> There >> >>>>>>>>>>>>>> were >> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >> >>>>>>>>>>>>>> did not >> >>>>>>>>>>>>>> get >> >>>>>>>>>>>>>> started. I tried running: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> hosted-engine --status >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >> >>>>>>>>>>>>>> code >> >>>>>>>>>>>>>> is 1 >> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >> >>>>>>>>>>>>>> it via >> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Virtual machine does not exist >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> But it then became available. I logged into it >> >>>>>>>>>>>>>> successfully. It >> >>>>>>>>>>>>>> is not >> >>>>>>>>>>>>>> in the list of VMs however. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >> >>>>>>>>>>>>>> is not >> >>>>>>>>>>>>>> in >> >>>>>>>>>>>>>> the list of virtual machines? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Thanks for any help, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Cam >> >>>>>>>>>>>>>> _______________________________________________ >> >>>>>>>>>>>>>> Users mailing list >> >>>>>>>>>>>>>> Users(a)ovirt.org >> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> _______________________________________________ >> >>>>>>>>> Users mailing list >> >>>>>>>>> Users(a)ovirt.org >> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >> >>>>>> _______________________________________________ >> >>>>>> Users mailing list >> >>>>>> Users(a)ovirt.org >> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >> >>>>> >> >>>>> >> > _______________________________________________ >> > Users mailing list >> > Users(a)ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> >

Martin Sivak

7:41 a.m.

...

Should it be? It was not in the instructions for the migration from bare-metal to Hosted VM

The hosted engine will only migrate to hosts that have the services running. Please put one other host to maintenance and select Hosted engine action: DEPLOY in the reinstall dialog. Best regards Martin Sivak On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: > I changed the 'os.other.devices.display.protocols.value.3.6 = > spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols > as 4 and the hosted engine now appears in the list of VMs. I am > guessing the compatibility version was causing it to use the 3.6 > version. However, I am still unable to migrate the engine VM to > another host. When I try putting the host it is currently on into > maintenance, it reports: > > Error while executing action: Cannot switch the Host(s) to Maintenance mode. > There are no available hosts capable of running the engine VM. > > Running 'hosted-engine --vm-status' still shows 'Engine status: > unknown stale-data'. > > The ovirt-ha-broker service is only running on one host. It was set to > 'disabled' in systemd. It won't start as there is no > /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.

...

Should it be? It was not in the instructions for the migration from bare-metal to Hosted VM

> > Thanks, > > Cam > > On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >> Hi Tomas, >> >> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >> engine VM, I have: >> >> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >> >> That seems to match - I assume since this is 4.1, the 3.6 should not apply >> >> Is there somewhere else I should be looking? >> >> Thanks, >> >> Cam >> >> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>> >>> >>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>> <michal.skrivanek(a)redhat.com> wrote: >>>> >>>> >>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>> > >>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>> > the following error? >>>> > >>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>> >>>>> 'ImportVm' >>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>> >>>>> >>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>> >>>> to match the OS and VM Display type;-) >>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>> VMs >>> >>> >>> yep, the default supported combinations for 4.0+ is this: >>> os.other.devices.display.protocols.value = >>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>> >>>> >>>> >>>> > >>>> > Thanks. >>>> > >>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >> Hi Martin, >>>> >> >>>> >>> >>>> >>> just as a random comment, do you still have the database backup from >>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>> >>> using it. Or in the worst case.. update the offending value there >>>> >>> before restoring it to the new engine instance. >>>> >> >>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>> >> the FC storage each time). Do you know what the offending value(s) >>>> >> would be? Would it be in the Postgres DB or in a config file >>>> >> somewhere? >>>> >> >>>> >> Cheers, >>>> >> >>>> >> Cam >>>> >> >>>> >>> Regards >>>> >>> >>>> >>> Martin Sivak >>>> >>> >>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>> Hi Yanir, >>>> >>>> >>>> >>>> Thanks for the reply. >>>> >>>> >>>> >>>>> First of all, maybe a chain reaction of : >>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>> >>>>> 'ImportVm' >>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>> >>>>> >>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>> >>>>> further >>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>> >>>>> >>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>> >>>>> state ? >>>> >>>> >>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>> >>>> migration from a bare metal engine, and it didn't report any error >>>> >>>> during the migration. I'd had some problems on my first attempts at >>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>> >>>> from scratch (since I have a lot of VMs on it)? >>>> >>>> >>>> >>>> Thanks for any help. >>>> >>>> >>>> >>>> Regards, >>>> >>>> >>>> >>>> Cam >>>> >>>> >>>> >>>>> Regards, >>>> >>>>> Yanir >>>> >>>>> >>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>> >>>> >>>>>> Hi Jenny/Martin, >>>> >>>>>> >>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>> >>>>>> (which >>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>> >>>>>> migrated to another host, and I will not be able to manage the >>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>> >>>>>> >>>> >>>>>> Thanks, >>>> >>>>>> >>>> >>>>>> Cam >>>> >>>>>> >>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>> >>>>>>> >>>> >>>>>>> I get these errors in the engine.log on the engine: >>>> >>>>>>> >>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>> >>>>>>> 'ImportVm' >>>> >>>>>>> failed for user SYST >>>> >>>>>>> EM. Reasons: >>>> >>>>>>> >>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>> >>>>>>> sharedLocks= >>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>> >>>>>>> Engine VM >>>> >>>>>>> >>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>> >>>>>>> different >>>> >>>>>>> error on the other hosts, not sure if they are related. >>>> >>>>>>> >>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>> >>>>>>> host >>>> >>>>>>> which I deployed the hosted engine VM on: >>>> >>>>>>> >>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>> >>>>>>> Unable to extract HEVM OVF >>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>> >>>>>>> to >>>> >>>>>>> initial vm.conf >>>> >>>>>>> >>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>> >>>>>>> for >>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>> >>>>>>> >>>> >>>>>>> I will install that package on the other two hosts, for which I >>>> >>>>>>> will >>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>> >>>>>>> guess >>>> >>>>>>> restarting vdsm is a good idea after that? >>>> >>>>>>> >>>> >>>>>>> Thanks, >>>> >>>>>>> >>>> >>>>>>> Campbell >>>> >>>>>>> >>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>> >>>>>>> wrote: >>>> >>>>>>>> Hi, >>>> >>>>>>>> >>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>> >>>>>>>> more >>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>> >>>>>>>> belong to >>>> >>>>>>>> the same engine cluster. >>>> >>>>>>>> >>>> >>>>>>>> Best regards >>>> >>>>>>>> >>>> >>>>>>>> Martin Sivak >>>> >>>>>>>> >>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>>>>> Hi Jenny, >>>> >>>>>>>>> >>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>> >>>>>>>>> hosts? >>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>> >>>>>>>>> >>>> >>>>>>>>> Thanks, >>>> >>>>>>>>> >>>> >>>>>>>>> Cam >>>> >>>>>>>>> >>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>>>>>> Hi Jenny, >>>> >>>>>>>>>> >>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>> >>>>>>>>>> they >>>> >>>>>>>>>> arose. >>>> >>>>>>>>>> >>>> >>>>>>>>>> Thanks, >>>> >>>>>>>>>> >>>> >>>>>>>>>> Campbell >>>> >>>>>>>>>> >>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>> >>>>>>>>>> <etokar(a)redhat.com> >>>> >>>>>>>>>> wrote: >>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>> >>>>>>>>>>> it by >>>> >>>>>>>>>>> running: >>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>> >>>>>>>>>>> and >>>> >>>>>>>>>>> import it >>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Can you attach the agent log from the host >>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>> >>>>>>>>>>> and the engine log from the engine vm >>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Thanks, >>>> >>>>>>>>>>> Jenny >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>> >>>>>>>>>>> wrote: >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Hi Jenny, >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> What version are you running? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>> >>>>>>>>>>>>> engine, you >>>> >>>>>>>>>>>>> must first create a master storage domain. >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>> >>>>>>>>>>>> bare-metal >>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>> >>>>>>>>>>>> cluster. >>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>> >>>>>>>>>>>> ran >>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>> >>>>>>>>>>>> without any >>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>> >>>>>>>>>>>> storage >>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>> >>>>>>>>>>>> storage >>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>> >>>>>>>>>>>> domain, >>>> >>>>>>>>>>>> which >>>> >>>>>>>>>>>> is currently offline. >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>> >>>>>>>>>>>>> What >>>> >>>>>>>>>>>>> happens >>>> >>>>>>>>>>>>> when >>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>> >>>>>>>>>>>> no >>>> >>>>>>>>>>>> output >>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>> >>>>>>>>>>>> Status up-to-date : False >>>> >>>>>>>>>>>> Hostname : >>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>> >>>>>>>>>>>> Host ID : 1 >>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>> >>>>>>>>>>>> Score : 0 >>>> >>>>>>>>>>>> stopped : True >>>> >>>>>>>>>>>> Local maintenance : False >>>> >>>>>>>>>>>> crc32 : 0217f07b >>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>> >>>>>>>>>>>> Host timestamp : 2897 >>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>> >>>>>>>>>>>> metadata_parse_version=1 >>>> >>>>>>>>>>>> metadata_feature_version=1 >>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>> >>>>>>>>>>>> host-id=1 >>>> >>>>>>>>>>>> score=0 >>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>> >>>>>>>>>>>> maintenance=False >>>> >>>>>>>>>>>> state=AgentStopped >>>> >>>>>>>>>>>> stopped=True >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>> >>>>>>>>>>>> to >>>> >>>>>>>>>>>> being >>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>> >>>>>>>>>>>> need >>>> >>>>>>>>>>>> to >>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Thanks for the help, >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Cam >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> Jenny Tokar >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>> >>>>>>>>>>>>> wrote: >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Hi, >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>> >>>>>>>>>>>>>> There >>>> >>>>>>>>>>>>>> were >>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>> >>>>>>>>>>>>>> did not >>>> >>>>>>>>>>>>>> get >>>> >>>>>>>>>>>>>> started. I tried running: >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> hosted-engine --status >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>> >>>>>>>>>>>>>> code >>>> >>>>>>>>>>>>>> is 1 >>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>> >>>>>>>>>>>>>> it via >>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>> >>>>>>>>>>>>>> successfully. It >>>> >>>>>>>>>>>>>> is not >>>> >>>>>>>>>>>>>> in the list of VMs however. >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>> >>>>>>>>>>>>>> is not >>>> >>>>>>>>>>>>>> in >>>> >>>>>>>>>>>>>> the list of virtual machines? >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Thanks for any help, >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Cam >>>> >>>>>>>>>>>>>> _______________________________________________ >>>> >>>>>>>>>>>>>> Users mailing list >>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>> _______________________________________________ >>>> >>>>>>>>> Users mailing list >>>> >>>>>>>>> Users(a)ovirt.org >>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>>>> _______________________________________________ >>>> >>>>>> Users mailing list >>>> >>>>>> Users(a)ovirt.org >>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>>> >>>> >>>>> >>>> > _______________________________________________ >>>> > Users mailing list >>>> > Users(a)ovirt.org >>>> > http://lists.ovirt.org/mailman/listinfo/users >>>> > >>>> > >>>> >>>

cmc

11:21 a.m.

...

> Should it be? It was not in the instructions for the migration from > bare-metal to Hosted VM The hosted engine will only migrate to hosts that have the services running. Please put one other host to maintenance and select Hosted engine action: DEPLOY in the reinstall dialog. Best regards Martin Sivak On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: > I changed the 'os.other.devices.display.protocols.value.3.6 = > spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols > as 4 and the hosted engine now appears in the list of VMs. I am > guessing the compatibility version was causing it to use the 3.6 > version. However, I am still unable to migrate the engine VM to > another host. When I try putting the host it is currently on into > maintenance, it reports: > > Error while executing action: Cannot switch the Host(s) to Maintenance mode. > There are no available hosts capable of running the engine VM. > > Running 'hosted-engine --vm-status' still shows 'Engine status: > unknown stale-data'. > > The ovirt-ha-broker service is only running on one host. It was set to > 'disabled' in systemd. It won't start as there is no > /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. > Should it be? It was not in the instructions for the migration from > bare-metal to Hosted VM > > Thanks, > > Cam > > On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >> Hi Tomas, >> >> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >> engine VM, I have: >> >> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >> >> That seems to match - I assume since this is 4.1, the 3.6 should not apply >> >> Is there somewhere else I should be looking? >> >> Thanks, >> >> Cam >> >> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>> >>> >>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>> <michal.skrivanek(a)redhat.com> wrote: >>>> >>>> >>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>> > >>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>> > the following error? >>>> > >>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>> >>>>> 'ImportVm' >>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>> >>>>> >>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>> >>>> to match the OS and VM Display type;-) >>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>> VMs >>> >>> >>> yep, the default supported combinations for 4.0+ is this: >>> os.other.devices.display.protocols.value = >>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>> >>>> >>>> >>>> > >>>> > Thanks. >>>> > >>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >> Hi Martin, >>>> >> >>>> >>> >>>> >>> just as a random comment, do you still have the database backup from >>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>> >>> using it. Or in the worst case.. update the offending value there >>>> >>> before restoring it to the new engine instance. >>>> >> >>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>> >> the FC storage each time). Do you know what the offending value(s) >>>> >> would be? Would it be in the Postgres DB or in a config file >>>> >> somewhere? >>>> >> >>>> >> Cheers, >>>> >> >>>> >> Cam >>>> >> >>>> >>> Regards >>>> >>> >>>> >>> Martin Sivak >>>> >>> >>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>> Hi Yanir, >>>> >>>> >>>> >>>> Thanks for the reply. >>>> >>>> >>>> >>>>> First of all, maybe a chain reaction of : >>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>> >>>>> 'ImportVm' >>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>> >>>>> >>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>> >>>>> further >>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>> >>>>> >>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>> >>>>> state ? >>>> >>>> >>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>> >>>> migration from a bare metal engine, and it didn't report any error >>>> >>>> during the migration. I'd had some problems on my first attempts at >>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>> >>>> from scratch (since I have a lot of VMs on it)? >>>> >>>> >>>> >>>> Thanks for any help. >>>> >>>> >>>> >>>> Regards, >>>> >>>> >>>> >>>> Cam >>>> >>>> >>>> >>>>> Regards, >>>> >>>>> Yanir >>>> >>>>> >>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>> >>>> >>>>>> Hi Jenny/Martin, >>>> >>>>>> >>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>> >>>>>> (which >>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>> >>>>>> migrated to another host, and I will not be able to manage the >>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>> >>>>>> >>>> >>>>>> Thanks, >>>> >>>>>> >>>> >>>>>> Cam >>>> >>>>>> >>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>> >>>>>>> >>>> >>>>>>> I get these errors in the engine.log on the engine: >>>> >>>>>>> >>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>> >>>>>>> 'ImportVm' >>>> >>>>>>> failed for user SYST >>>> >>>>>>> EM. Reasons: >>>> >>>>>>> >>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>> >>>>>>> sharedLocks= >>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>> >>>>>>> Engine VM >>>> >>>>>>> >>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>> >>>>>>> different >>>> >>>>>>> error on the other hosts, not sure if they are related. >>>> >>>>>>> >>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>> >>>>>>> host >>>> >>>>>>> which I deployed the hosted engine VM on: >>>> >>>>>>> >>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>> >>>>>>> Unable to extract HEVM OVF >>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>> >>>>>>> to >>>> >>>>>>> initial vm.conf >>>> >>>>>>> >>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>> >>>>>>> for >>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>> >>>>>>> >>>> >>>>>>> I will install that package on the other two hosts, for which I >>>> >>>>>>> will >>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>> >>>>>>> guess >>>> >>>>>>> restarting vdsm is a good idea after that? >>>> >>>>>>> >>>> >>>>>>> Thanks, >>>> >>>>>>> >>>> >>>>>>> Campbell >>>> >>>>>>> >>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>> >>>>>>> wrote: >>>> >>>>>>>> Hi, >>>> >>>>>>>> >>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>> >>>>>>>> more >>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>> >>>>>>>> belong to >>>> >>>>>>>> the same engine cluster. >>>> >>>>>>>> >>>> >>>>>>>> Best regards >>>> >>>>>>>> >>>> >>>>>>>> Martin Sivak >>>> >>>>>>>> >>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>>>>> Hi Jenny, >>>> >>>>>>>>> >>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>> >>>>>>>>> hosts? >>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>> >>>>>>>>> >>>> >>>>>>>>> Thanks, >>>> >>>>>>>>> >>>> >>>>>>>>> Cam >>>> >>>>>>>>> >>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>> >>>>>>>>>> Hi Jenny, >>>> >>>>>>>>>> >>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>> >>>>>>>>>> they >>>> >>>>>>>>>> arose. >>>> >>>>>>>>>> >>>> >>>>>>>>>> Thanks, >>>> >>>>>>>>>> >>>> >>>>>>>>>> Campbell >>>> >>>>>>>>>> >>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>> >>>>>>>>>> <etokar(a)redhat.com> >>>> >>>>>>>>>> wrote: >>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>> >>>>>>>>>>> it by >>>> >>>>>>>>>>> running: >>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>> >>>>>>>>>>> and >>>> >>>>>>>>>>> import it >>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Can you attach the agent log from the host >>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>> >>>>>>>>>>> and the engine log from the engine vm >>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Thanks, >>>> >>>>>>>>>>> Jenny >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>> >>>>>>>>>>> wrote: >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Hi Jenny, >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> What version are you running? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>> >>>>>>>>>>>>> engine, you >>>> >>>>>>>>>>>>> must first create a master storage domain. >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>> >>>>>>>>>>>> bare-metal >>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>> >>>>>>>>>>>> cluster. >>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>> >>>>>>>>>>>> ran >>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>> >>>>>>>>>>>> without any >>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>> >>>>>>>>>>>> storage >>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>> >>>>>>>>>>>> storage >>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>> >>>>>>>>>>>> domain, >>>> >>>>>>>>>>>> which >>>> >>>>>>>>>>>> is currently offline. >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>> >>>>>>>>>>>>> What >>>> >>>>>>>>>>>>> happens >>>> >>>>>>>>>>>>> when >>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>> >>>>>>>>>>>> no >>>> >>>>>>>>>>>> output >>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>> >>>>>>>>>>>> Status up-to-date : False >>>> >>>>>>>>>>>> Hostname : >>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>> >>>>>>>>>>>> Host ID : 1 >>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>> >>>>>>>>>>>> Score : 0 >>>> >>>>>>>>>>>> stopped : True >>>> >>>>>>>>>>>> Local maintenance : False >>>> >>>>>>>>>>>> crc32 : 0217f07b >>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>> >>>>>>>>>>>> Host timestamp : 2897 >>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>> >>>>>>>>>>>> metadata_parse_version=1 >>>> >>>>>>>>>>>> metadata_feature_version=1 >>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>> >>>>>>>>>>>> host-id=1 >>>> >>>>>>>>>>>> score=0 >>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>> >>>>>>>>>>>> maintenance=False >>>> >>>>>>>>>>>> state=AgentStopped >>>> >>>>>>>>>>>> stopped=True >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>> >>>>>>>>>>>> to >>>> >>>>>>>>>>>> being >>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>> >>>>>>>>>>>> need >>>> >>>>>>>>>>>> to >>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Thanks for the help, >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> Cam >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> Jenny Tokar >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>> >>>>>>>>>>>>> wrote: >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Hi, >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>> >>>>>>>>>>>>>> There >>>> >>>>>>>>>>>>>> were >>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>> >>>>>>>>>>>>>> did not >>>> >>>>>>>>>>>>>> get >>>> >>>>>>>>>>>>>> started. I tried running: >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> hosted-engine --status >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>> >>>>>>>>>>>>>> code >>>> >>>>>>>>>>>>>> is 1 >>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>> >>>>>>>>>>>>>> it via >>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>> >>>>>>>>>>>>>> successfully. It >>>> >>>>>>>>>>>>>> is not >>>> >>>>>>>>>>>>>> in the list of VMs however. >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>> >>>>>>>>>>>>>> is not >>>> >>>>>>>>>>>>>> in >>>> >>>>>>>>>>>>>> the list of virtual machines? >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Thanks for any help, >>>> >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> Cam >>>> >>>>>>>>>>>>>> _______________________________________________ >>>> >>>>>>>>>>>>>> Users mailing list >>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>> _______________________________________________ >>>> >>>>>>>>> Users mailing list >>>> >>>>>>>>> Users(a)ovirt.org >>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>>>> _______________________________________________ >>>> >>>>>> Users mailing list >>>> >>>>>> Users(a)ovirt.org >>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>>> >>>> >>>>> >>>> > _______________________________________________ >>>> > Users mailing list >>>> > Users(a)ovirt.org >>>> > http://lists.ovirt.org/mailman/listinfo/users >>>> > >>>> > >>>> >>>

cmc

11:26 a.m.

...

Hi Martin, Thanks for the reply. I have done this, and the deployment completed without error. However, it still will not allow the Hosted Engine migrate to another host. The /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host I re-installed, but the ovirt-ha-broker.service, though it starts, reports: --------------------8<------------------- Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR Failed to read metadata from /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 129, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' --------------------8<------------------- I checked the path, and it exists. I can run 'less -f' on it fine. The perms are slightly different on the host that is running the VM vs the one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is this a san locking issue? Thanks for any help, Cam On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >> Should it be? It was not in the instructions for the migration from >> bare-metal to Hosted VM > > The hosted engine will only migrate to hosts that have the services > running. Please put one other host to maintenance and select Hosted > engine action: DEPLOY in the reinstall dialog. > > Best regards > > Martin Sivak > > On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >> I changed the 'os.other.devices.display.protocols.value.3.6 = >> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >> as 4 and the hosted engine now appears in the list of VMs. I am >> guessing the compatibility version was causing it to use the 3.6 >> version. However, I am still unable to migrate the engine VM to >> another host. When I try putting the host it is currently on into >> maintenance, it reports: >> >> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >> There are no available hosts capable of running the engine VM. >> >> Running 'hosted-engine --vm-status' still shows 'Engine status: >> unknown stale-data'. >> >> The ovirt-ha-broker service is only running on one host. It was set to >> 'disabled' in systemd. It won't start as there is no >> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >> Should it be? It was not in the instructions for the migration from >> bare-metal to Hosted VM >> >> Thanks, >> >> Cam >> >> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>> Hi Tomas, >>> >>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>> engine VM, I have: >>> >>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>> >>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>> >>> Is there somewhere else I should be looking? >>> >>> Thanks, >>> >>> Cam >>> >>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>> >>>> >>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>> <michal.skrivanek(a)redhat.com> wrote: >>>>> >>>>> >>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>> > >>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>> > the following error? >>>>> > >>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>> >>>>> 'ImportVm' >>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>> >>>>> >>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>> >>>>> to match the OS and VM Display type;-) >>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>> VMs >>>> >>>> >>>> yep, the default supported combinations for 4.0+ is this: >>>> os.other.devices.display.protocols.value = >>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>> >>>>> >>>>> >>>>> > >>>>> > Thanks. >>>>> > >>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >> Hi Martin, >>>>> >> >>>>> >>> >>>>> >>> just as a random comment, do you still have the database backup from >>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>> >>> using it. Or in the worst case.. update the offending value there >>>>> >>> before restoring it to the new engine instance. >>>>> >> >>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>> >> somewhere? >>>>> >> >>>>> >> Cheers, >>>>> >> >>>>> >> Cam >>>>> >> >>>>> >>> Regards >>>>> >>> >>>>> >>> Martin Sivak >>>>> >>> >>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>> Hi Yanir, >>>>> >>>> >>>>> >>>> Thanks for the reply. >>>>> >>>> >>>>> >>>>> First of all, maybe a chain reaction of : >>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>> >>>>> 'ImportVm' >>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>> >>>>> >>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>> >>>>> further >>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>> >>>>> >>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>> >>>>> state ? >>>>> >>>> >>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>> >>>> >>>>> >>>> Thanks for any help. >>>>> >>>> >>>>> >>>> Regards, >>>>> >>>> >>>>> >>>> Cam >>>>> >>>> >>>>> >>>>> Regards, >>>>> >>>>> Yanir >>>>> >>>>> >>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>>> >>>>> >>>>>> Hi Jenny/Martin, >>>>> >>>>>> >>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>> >>>>>> (which >>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>> >>>>>> >>>>> >>>>>> Thanks, >>>>> >>>>>> >>>>> >>>>>> Cam >>>>> >>>>>> >>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>> >>>>>>> >>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>> >>>>>>> >>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>> >>>>>>> 'ImportVm' >>>>> >>>>>>> failed for user SYST >>>>> >>>>>>> EM. Reasons: >>>>> >>>>>>> >>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>> >>>>>>> sharedLocks= >>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>> >>>>>>> Engine VM >>>>> >>>>>>> >>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>> >>>>>>> different >>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>> >>>>>>> >>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>> >>>>>>> host >>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>> >>>>>>> >>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>> >>>>>>> Unable to extract HEVM OVF >>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>> >>>>>>> to >>>>> >>>>>>> initial vm.conf >>>>> >>>>>>> >>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>> >>>>>>> for >>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>> >>>>>>> >>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>> >>>>>>> will >>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>> >>>>>>> guess >>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>> >>>>>>> >>>>> >>>>>>> Thanks, >>>>> >>>>>>> >>>>> >>>>>>> Campbell >>>>> >>>>>>> >>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>> >>>>>>> wrote: >>>>> >>>>>>>> Hi, >>>>> >>>>>>>> >>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>> >>>>>>>> more >>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>> >>>>>>>> belong to >>>>> >>>>>>>> the same engine cluster. >>>>> >>>>>>>> >>>>> >>>>>>>> Best regards >>>>> >>>>>>>> >>>>> >>>>>>>> Martin Sivak >>>>> >>>>>>>> >>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>>>>>> Hi Jenny, >>>>> >>>>>>>>> >>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>> >>>>>>>>> hosts? >>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>> >>>>>>>>> >>>>> >>>>>>>>> Thanks, >>>>> >>>>>>>>> >>>>> >>>>>>>>> Cam >>>>> >>>>>>>>> >>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> >>>>>>>>>> Hi Jenny, >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>> >>>>>>>>>> they >>>>> >>>>>>>>>> arose. >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> Thanks, >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> Campbell >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>> >>>>>>>>>> wrote: >>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>> >>>>>>>>>>> it by >>>>> >>>>>>>>>>> running: >>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>> >>>>>>>>>>> and >>>>> >>>>>>>>>>> import it >>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Thanks, >>>>> >>>>>>>>>>> Jenny >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>> >>>>>>>>>>> wrote: >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Hi Jenny, >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>>> What version are you running? >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>> >>>>>>>>>>>>> engine, you >>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>> >>>>>>>>>>>> bare-metal >>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>> >>>>>>>>>>>> cluster. >>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>> >>>>>>>>>>>> ran >>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>> >>>>>>>>>>>> without any >>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>> >>>>>>>>>>>> storage >>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>> >>>>>>>>>>>> storage >>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>> >>>>>>>>>>>> domain, >>>>> >>>>>>>>>>>> which >>>>> >>>>>>>>>>>> is currently offline. >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>> >>>>>>>>>>>>> What >>>>> >>>>>>>>>>>>> happens >>>>> >>>>>>>>>>>>> when >>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>> >>>>>>>>>>>> no >>>>> >>>>>>>>>>>> output >>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>> >>>>>>>>>>>> Status up-to-date : False >>>>> >>>>>>>>>>>> Hostname : >>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>> >>>>>>>>>>>> Host ID : 1 >>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>> >>>>>>>>>>>> Score : 0 >>>>> >>>>>>>>>>>> stopped : True >>>>> >>>>>>>>>>>> Local maintenance : False >>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>> >>>>>>>>>>>> host-id=1 >>>>> >>>>>>>>>>>> score=0 >>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>> >>>>>>>>>>>> maintenance=False >>>>> >>>>>>>>>>>> state=AgentStopped >>>>> >>>>>>>>>>>> stopped=True >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>> >>>>>>>>>>>> to >>>>> >>>>>>>>>>>> being >>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>> >>>>>>>>>>>> need >>>>> >>>>>>>>>>>> to >>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Thanks for the help, >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Cam >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> Jenny Tokar >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>> >>>>>>>>>>>>> wrote: >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> Hi, >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>> >>>>>>>>>>>>>> There >>>>> >>>>>>>>>>>>>> were >>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>> >>>>>>>>>>>>>> did not >>>>> >>>>>>>>>>>>>> get >>>>> >>>>>>>>>>>>>> started. I tried running: >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>> >>>>>>>>>>>>>> code >>>>> >>>>>>>>>>>>>> is 1 >>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>> >>>>>>>>>>>>>> it via >>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>> >>>>>>>>>>>>>> successfully. It >>>>> >>>>>>>>>>>>>> is not >>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>> >>>>>>>>>>>>>> is not >>>>> >>>>>>>>>>>>>> in >>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> Cam >>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>> >>>>>>>>>>>>>> Users mailing list >>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> _______________________________________________ >>>>> >>>>>>>>> Users mailing list >>>>> >>>>>>>>> Users(a)ovirt.org >>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>>> _______________________________________________ >>>>> >>>>>> Users mailing list >>>>> >>>>>> Users(a)ovirt.org >>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>>> >>>>> > _______________________________________________ >>>>> > Users mailing list >>>>> > Users(a)ovirt.org >>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>> > >>>>> > >>>>> >>>>

cmc

11:32 a.m.

On the host that has the Hosted Engine VM, the sanlock.log reports: 2017-06-27 17:30:20+0100 1043742 [7307]: add_lockspace 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 conflicts with name of list1 s5 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 Again, I'm not sure what has happened here. On Tue, Jun 27, 2017 at 5:26 PM, cmc <iucounu(a)gmail.com> wrote:

...

I see this on the host it is trying to migrate in /var/log/sanlock: 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 The sanlock service is running. Why would this occur? Thanks, C On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: > Hi Martin, > > Thanks for the reply. I have done this, and the deployment completed > without error. However, it still will not allow the Hosted Engine > migrate to another host. The > /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host > I re-installed, but the ovirt-ha-broker.service, though it starts, > reports: > > --------------------8<------------------- > > Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine > High Availability Communications Broker... > Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker > ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR > Failed to read metadata from > /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata > Traceback (most > recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", > line 129, in get_raw_stats_for_service_type > f = > os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) > OSError: [Errno 2] > No such file or directory: > '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' > > --------------------8<------------------- > > I checked the path, and it exists. I can run 'less -f' on it fine. The > perms are slightly different on the host that is running the VM vs the > one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is > this a san locking issue? > > Thanks for any help, > > Cam > > On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>> Should it be? It was not in the instructions for the migration from >>> bare-metal to Hosted VM >> >> The hosted engine will only migrate to hosts that have the services >> running. Please put one other host to maintenance and select Hosted >> engine action: DEPLOY in the reinstall dialog. >> >> Best regards >> >> Martin Sivak >> >> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>> as 4 and the hosted engine now appears in the list of VMs. I am >>> guessing the compatibility version was causing it to use the 3.6 >>> version. However, I am still unable to migrate the engine VM to >>> another host. When I try putting the host it is currently on into >>> maintenance, it reports: >>> >>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>> There are no available hosts capable of running the engine VM. >>> >>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>> unknown stale-data'. >>> >>> The ovirt-ha-broker service is only running on one host. It was set to >>> 'disabled' in systemd. It won't start as there is no >>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>> Should it be? It was not in the instructions for the migration from >>> bare-metal to Hosted VM >>> >>> Thanks, >>> >>> Cam >>> >>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Hi Tomas, >>>> >>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>> engine VM, I have: >>>> >>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>> >>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>> >>>> Is there somewhere else I should be looking? >>>> >>>> Thanks, >>>> >>>> Cam >>>> >>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>> >>>>> >>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>> >>>>>> >>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>> > >>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>> > the following error? >>>>>> > >>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>> >>>>> 'ImportVm' >>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>> >>>>> >>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>> >>>>>> to match the OS and VM Display type;-) >>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>> VMs >>>>> >>>>> >>>>> yep, the default supported combinations for 4.0+ is this: >>>>> os.other.devices.display.protocols.value = >>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>> >>>>>> >>>>>> >>>>>> > >>>>>> > Thanks. >>>>>> > >>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >> Hi Martin, >>>>>> >> >>>>>> >>> >>>>>> >>> just as a random comment, do you still have the database backup from >>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>> >>> before restoring it to the new engine instance. >>>>>> >> >>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>> >> somewhere? >>>>>> >> >>>>>> >> Cheers, >>>>>> >> >>>>>> >> Cam >>>>>> >> >>>>>> >>> Regards >>>>>> >>> >>>>>> >>> Martin Sivak >>>>>> >>> >>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >>>> Hi Yanir, >>>>>> >>>> >>>>>> >>>> Thanks for the reply. >>>>>> >>>> >>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>> >>>>> 'ImportVm' >>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>> >>>>> >>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>> >>>>> further >>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>> >>>>> >>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>> >>>>> state ? >>>>>> >>>> >>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>> >>>> >>>>>> >>>> Thanks for any help. >>>>>> >>>> >>>>>> >>>> Regards, >>>>>> >>>> >>>>>> >>>> Cam >>>>>> >>>> >>>>>> >>>>> Regards, >>>>>> >>>>> Yanir >>>>>> >>>>> >>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Jenny/Martin, >>>>>> >>>>>> >>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>> >>>>>> (which >>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> >>>>>> Cam >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>> >>>>>>> >>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>> >>>>>>> >>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>> >>>>>>> 'ImportVm' >>>>>> >>>>>>> failed for user SYST >>>>>> >>>>>>> EM. Reasons: >>>>>> >>>>>>> >>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>> >>>>>>> sharedLocks= >>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>> >>>>>>> Engine VM >>>>>> >>>>>>> >>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>> >>>>>>> different >>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>> >>>>>>> >>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>> >>>>>>> host >>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>> >>>>>>> >>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>> >>>>>>> to >>>>>> >>>>>>> initial vm.conf >>>>>> >>>>>>> >>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>> >>>>>>> for >>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>> >>>>>>> >>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>> >>>>>>> will >>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>> >>>>>>> guess >>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>> >>>>>>> >>>>>> >>>>>>> Thanks, >>>>>> >>>>>>> >>>>>> >>>>>>> Campbell >>>>>> >>>>>>> >>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>> >>>>>>> wrote: >>>>>> >>>>>>>> Hi, >>>>>> >>>>>>>> >>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>> >>>>>>>> more >>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>> >>>>>>>> belong to >>>>>> >>>>>>>> the same engine cluster. >>>>>> >>>>>>>> >>>>>> >>>>>>>> Best regards >>>>>> >>>>>>>> >>>>>> >>>>>>>> Martin Sivak >>>>>> >>>>>>>> >>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >>>>>>>>> Hi Jenny, >>>>>> >>>>>>>>> >>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>> >>>>>>>>> hosts? >>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>> >>>>>>>>> >>>>>> >>>>>>>>> Thanks, >>>>>> >>>>>>>>> >>>>>> >>>>>>>>> Cam >>>>>> >>>>>>>>> >>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> >>>>>>>>>> Hi Jenny, >>>>>> >>>>>>>>>> >>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>> >>>>>>>>>> they >>>>>> >>>>>>>>>> arose. >>>>>> >>>>>>>>>> >>>>>> >>>>>>>>>> Thanks, >>>>>> >>>>>>>>>> >>>>>> >>>>>>>>>> Campbell >>>>>> >>>>>>>>>> >>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>> >>>>>>>>>> wrote: >>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>> >>>>>>>>>>> it by >>>>>> >>>>>>>>>>> running: >>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>> >>>>>>>>>>> >>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>> >>>>>>>>>>> and >>>>>> >>>>>>>>>>> import it >>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>> >>>>>>>>>>> >>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>> >>>>>>>>>>> >>>>>> >>>>>>>>>>> Thanks, >>>>>> >>>>>>>>>>> Jenny >>>>>> >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>> >>>>>>>>>>> wrote: >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> What version are you running? >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>> >>>>>>>>>>>>> engine, you >>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>> >>>>>>>>>>>> bare-metal >>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>> >>>>>>>>>>>> cluster. >>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>> >>>>>>>>>>>> ran >>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>> >>>>>>>>>>>> without any >>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>> >>>>>>>>>>>> storage >>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>> >>>>>>>>>>>> storage >>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>> >>>>>>>>>>>> domain, >>>>>> >>>>>>>>>>>> which >>>>>> >>>>>>>>>>>> is currently offline. >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>> >>>>>>>>>>>>> What >>>>>> >>>>>>>>>>>>> happens >>>>>> >>>>>>>>>>>>> when >>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>> >>>>>>>>>>>> no >>>>>> >>>>>>>>>>>> output >>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>> >>>>>>>>>>>> Hostname : >>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>> >>>>>>>>>>>> Score : 0 >>>>>> >>>>>>>>>>>> stopped : True >>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>> >>>>>>>>>>>> host-id=1 >>>>>> >>>>>>>>>>>> score=0 >>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>> >>>>>>>>>>>> maintenance=False >>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>> >>>>>>>>>>>> stopped=True >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>> >>>>>>>>>>>> to >>>>>> >>>>>>>>>>>> being >>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>> >>>>>>>>>>>> need >>>>>> >>>>>>>>>>>> to >>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> Cam >>>>>> >>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>> >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>> >>>>>>>>>>>>> wrote: >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> Hi, >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>> >>>>>>>>>>>>>> There >>>>>> >>>>>>>>>>>>>> were >>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>> >>>>>>>>>>>>>> did not >>>>>> >>>>>>>>>>>>>> get >>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>> >>>>>>>>>>>>>> code >>>>>> >>>>>>>>>>>>>> is 1 >>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>> >>>>>>>>>>>>>> it via >>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>> >>>>>>>>>>>>>> successfully. It >>>>>> >>>>>>>>>>>>>> is not >>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>> >>>>>>>>>>>>>> is not >>>>>> >>>>>>>>>>>>>> in >>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>> >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> Cam >>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> >>>>>>>>> _______________________________________________ >>>>>> >>>>>>>>> Users mailing list >>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>> _______________________________________________ >>>>>> >>>>>> Users mailing list >>>>>> >>>>>> Users(a)ovirt.org >>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>> >>>>>> >>>>> >>>>>> > _______________________________________________ >>>>>> > Users mailing list >>>>>> > Users(a)ovirt.org >>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>> > >>>>>> > >>>>>> >>>>>

Martin Sivak

Wednesday, 28 June Wed, 28 Jun

3:25 a.m.

...

cmc

5:25 a.m.

...

Hi, can you please check the contents of /etc/ovirt-hosted-engine/hosted-engine.conf or /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is right now) and search for host-id? Make sure the IDs are different. If they are not, then there is a bug somewhere. Martin On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: > I see this on the host it is trying to migrate in /var/log/sanlock: > > 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace > 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 > 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 > busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 > 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 > > The sanlock service is running. Why would this occur? > > Thanks, > > C > > On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >> Hi Martin, >> >> Thanks for the reply. I have done this, and the deployment completed >> without error. However, it still will not allow the Hosted Engine >> migrate to another host. The >> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >> I re-installed, but the ovirt-ha-broker.service, though it starts, >> reports: >> >> --------------------8<------------------- >> >> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >> High Availability Communications Broker... >> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >> Failed to read metadata from >> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >> Traceback (most >> recent call last): >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >> line 129, in get_raw_stats_for_service_type >> f = >> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >> OSError: [Errno 2] >> No such file or directory: >> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >> >> --------------------8<------------------- >> >> I checked the path, and it exists. I can run 'less -f' on it fine. The >> perms are slightly different on the host that is running the VM vs the >> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >> this a san locking issue? >> >> Thanks for any help, >> >> Cam >> >> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>> Should it be? It was not in the instructions for the migration from >>>> bare-metal to Hosted VM >>> >>> The hosted engine will only migrate to hosts that have the services >>> running. Please put one other host to maintenance and select Hosted >>> engine action: DEPLOY in the reinstall dialog. >>> >>> Best regards >>> >>> Martin Sivak >>> >>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>> guessing the compatibility version was causing it to use the 3.6 >>>> version. However, I am still unable to migrate the engine VM to >>>> another host. When I try putting the host it is currently on into >>>> maintenance, it reports: >>>> >>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>> There are no available hosts capable of running the engine VM. >>>> >>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>> unknown stale-data'. >>>> >>>> The ovirt-ha-broker service is only running on one host. It was set to >>>> 'disabled' in systemd. It won't start as there is no >>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>> Should it be? It was not in the instructions for the migration from >>>> bare-metal to Hosted VM >>>> >>>> Thanks, >>>> >>>> Cam >>>> >>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Hi Tomas, >>>>> >>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>> engine VM, I have: >>>>> >>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>> >>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>> >>>>> Is there somewhere else I should be looking? >>>>> >>>>> Thanks, >>>>> >>>>> Cam >>>>> >>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>> >>>>>> >>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>> >>>>>>> >>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>> > >>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>> > the following error? >>>>>>> > >>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>> >>>>> 'ImportVm' >>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>> >>>>> >>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>> >>>>>>> to match the OS and VM Display type;-) >>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>> VMs >>>>>> >>>>>> >>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>> os.other.devices.display.protocols.value = >>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>> >>>>>>> >>>>>>> >>>>>>> > >>>>>>> > Thanks. >>>>>>> > >>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> >> Hi Martin, >>>>>>> >> >>>>>>> >>> >>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>> >>> before restoring it to the new engine instance. >>>>>>> >> >>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>> >> somewhere? >>>>>>> >> >>>>>>> >> Cheers, >>>>>>> >> >>>>>>> >> Cam >>>>>>> >> >>>>>>> >>> Regards >>>>>>> >>> >>>>>>> >>> Martin Sivak >>>>>>> >>> >>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> >>>> Hi Yanir, >>>>>>> >>>> >>>>>>> >>>> Thanks for the reply. >>>>>>> >>>> >>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>> >>>>> 'ImportVm' >>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>> >>>>> >>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>> >>>>> further >>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>> >>>>> >>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>> >>>>> state ? >>>>>>> >>>> >>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>> >>>> >>>>>>> >>>> Thanks for any help. >>>>>>> >>>> >>>>>>> >>>> Regards, >>>>>>> >>>> >>>>>>> >>>> Cam >>>>>>> >>>> >>>>>>> >>>>> Regards, >>>>>>> >>>>> Yanir >>>>>>> >>>>> >>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>> >>>>>> >>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>> >>>>>> (which >>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>> >>>>>> >>>>>>> >>>>>> Thanks, >>>>>>> >>>>>> >>>>>>> >>>>>> Cam >>>>>>> >>>>>> >>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>> >>>>>>> 'ImportVm' >>>>>>> >>>>>>> failed for user SYST >>>>>>> >>>>>>> EM. Reasons: >>>>>>> >>>>>>> >>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>> >>>>>>> sharedLocks= >>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>> >>>>>>> Engine VM >>>>>>> >>>>>>> >>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>> >>>>>>> different >>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>> >>>>>>> >>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>> >>>>>>> host >>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>> >>>>>>> >>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>> >>>>>>> to >>>>>>> >>>>>>> initial vm.conf >>>>>>> >>>>>>> >>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>> >>>>>>> for >>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>> >>>>>>> >>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>> >>>>>>> will >>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>> >>>>>>> guess >>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Campbell >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>> >>>>>>>> more >>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>> >>>>>>>> belong to >>>>>>> >>>>>>>> the same engine cluster. >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Best regards >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Martin Sivak >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> >>>>>>>>> Hi Jenny, >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>> >>>>>>>>> hosts? >>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Thanks, >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Cam >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>> >>>>>>>>>> they >>>>>>> >>>>>>>>>> arose. >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> Thanks, >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> Campbell >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>> >>>>>>>>>> wrote: >>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>> >>>>>>>>>>> it by >>>>>>> >>>>>>>>>>> running: >>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>> >>>>>>>>>>> and >>>>>>> >>>>>>>>>>> import it >>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> Thanks, >>>>>>> >>>>>>>>>>> Jenny >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>> >>>>>>>>>>> wrote: >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>> >>>>>>>>>>>>> engine, you >>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>> >>>>>>>>>>>> bare-metal >>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>> >>>>>>>>>>>> cluster. >>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>> >>>>>>>>>>>> ran >>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>> >>>>>>>>>>>> without any >>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>> >>>>>>>>>>>> storage >>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>> >>>>>>>>>>>> storage >>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>> >>>>>>>>>>>> domain, >>>>>>> >>>>>>>>>>>> which >>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>> >>>>>>>>>>>>> What >>>>>>> >>>>>>>>>>>>> happens >>>>>>> >>>>>>>>>>>>> when >>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>> >>>>>>>>>>>> no >>>>>>> >>>>>>>>>>>> output >>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>> >>>>>>>>>>>> Hostname : >>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>> >>>>>>>>>>>> stopped : True >>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>> >>>>>>>>>>>> score=0 >>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>> >>>>>>>>>>>> stopped=True >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>> >>>>>>>>>>>> to >>>>>>> >>>>>>>>>>>> being >>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>> >>>>>>>>>>>> need >>>>>>> >>>>>>>>>>>> to >>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> Cam >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>> >>>>>>>>>>>>> wrote: >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>> >>>>>>>>>>>>>> There >>>>>>> >>>>>>>>>>>>>> were >>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>> >>>>>>>>>>>>>> did not >>>>>>> >>>>>>>>>>>>>> get >>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>> >>>>>>>>>>>>>> code >>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>> >>>>>>>>>>>>>> it via >>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>> >>>>>>>>>>>>>> is not >>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>> >>>>>>>>>>>>>> is not >>>>>>> >>>>>>>>>>>>>> in >>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> Cam >>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>> _______________________________________________ >>>>>>> >>>>>>>>> Users mailing list >>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>> _______________________________________________ >>>>>>> >>>>>> Users mailing list >>>>>>> >>>>>> Users(a)ovirt.org >>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> > _______________________________________________ >>>>>>> > Users mailing list >>>>>>> > Users(a)ovirt.org >>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>> > >>>>>>> > >>>>>>> >>>>>>

cmc

Thursday, 29 June Thu, 29 Jun

5:46 a.m.

...

From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:

MainThread::ERROR::2017-06-19 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-19 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-19 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 443, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 823, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::ERROR::2017-06-19 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Shutting down the agent because of 3 failures in a row!

...

From sanlock.log:

2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 conflicts with name of list1 s5 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0

...

From the two other hosts:

host 2: vdsm.log 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] Internal server error (__init__:570) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 565, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod result = fn(*methodArgs) File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies 'current_values': v.getIoTune()} File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune result = self.getIoTuneResponse() File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse res = self._dom.blockIoTune( File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__ % self.vmid) NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not started yet or was shut down /var/log/ovirt-hosted-engine-ha/agent.log MainThread::INFO::2017-06-29 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 MainThread::INFO::2017-06-29 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2017-06-29 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 MainThread::INFO::2017-06-29 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2017-06-29 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2017-06-29 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 MainThread::INFO::2017-06-29 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2017-06-29 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain /var/log/messages: Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! host 1: /var/log/messages also in sanlock.log Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 678326 [24159]: s4531 add_lockspace fail result -262 /var/log/ovirt-hosted-engine-ha/agent.log: MainThread::ERROR::2017-06-27 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-27 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-27 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 443, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 823, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::ERROR::2017-06-27 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Shutting down the agent because of 3 failures in a row! MainThread::INFO::2017-06-27 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-06-27 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Failed to stop monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' MainThread::INFO::2017-06-27 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down Thanks for any help, Cam On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote:

...

Hi Martin, yes, on two of the machines they have the same host_id. The other has a different host_id. To update since yesterday: I reinstalled and deployed Hosted Engine on the other host (so all three hosts in the cluster now have it installed). The second one I deployed said it was able to host the engine (unlike the first I reinstalled), so I tried putting the host with the Hosted Engine on it into maintenance to see if it would migrate over. It managed to move all hosts but the Hosted Engine. And now the host that said it was able to host the engine says 'unavailable due to HA score'. The host that it was trying to move from is now in 'preparing for maintenance' for the last 12 hours. The summary is: kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled with 'Deploy Hosted Engine'. No icon saying it can host the Hosted Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. 'add_lockspace' fails in sanlock.log kvm-ldn-02 - the other host that was pre-existing before Hosted Engine was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon saying that it was able to host the Hosted Engine, but after migration was attempted when putting kvm-ldn-03 into maintenance, it reports: 'unavailable due to HA score'. It has a host_id of '1' in /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was not part of the original cluster. I restored the bare-metal engine backup in the Hosted Engine on this host when deploying it, without error. It currently has the Hosted Engine on it (as the only VM after I put that host into maintenance to test the HA of Hosted Engine). Sanlock log shows conflicts I will look through all the logs for any other errors. Please let me know if you need any logs or other clarification/information. Thanks, Campbell On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: > Hi, > > can you please check the contents of > /etc/ovirt-hosted-engine/hosted-engine.conf or > /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is > right now) and search for host-id? > > Make sure the IDs are different. If they are not, then there is a bug somewhere. > > Martin > > On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >> I see this on the host it is trying to migrate in /var/log/sanlock: >> >> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >> >> The sanlock service is running. Why would this occur? >> >> Thanks, >> >> C >> >> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>> Hi Martin, >>> >>> Thanks for the reply. I have done this, and the deployment completed >>> without error. However, it still will not allow the Hosted Engine >>> migrate to another host. The >>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>> reports: >>> >>> --------------------8<------------------- >>> >>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>> High Availability Communications Broker... >>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>> Failed to read metadata from >>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>> Traceback (most >>> recent call last): >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>> line 129, in get_raw_stats_for_service_type >>> f = >>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>> OSError: [Errno 2] >>> No such file or directory: >>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>> >>> --------------------8<------------------- >>> >>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>> perms are slightly different on the host that is running the VM vs the >>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>> this a san locking issue? >>> >>> Thanks for any help, >>> >>> Cam >>> >>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>> Should it be? It was not in the instructions for the migration from >>>>> bare-metal to Hosted VM >>>> >>>> The hosted engine will only migrate to hosts that have the services >>>> running. Please put one other host to maintenance and select Hosted >>>> engine action: DEPLOY in the reinstall dialog. >>>> >>>> Best regards >>>> >>>> Martin Sivak >>>> >>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>> guessing the compatibility version was causing it to use the 3.6 >>>>> version. However, I am still unable to migrate the engine VM to >>>>> another host. When I try putting the host it is currently on into >>>>> maintenance, it reports: >>>>> >>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>> There are no available hosts capable of running the engine VM. >>>>> >>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>> unknown stale-data'. >>>>> >>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>> 'disabled' in systemd. It won't start as there is no >>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>> Should it be? It was not in the instructions for the migration from >>>>> bare-metal to Hosted VM >>>>> >>>>> Thanks, >>>>> >>>>> Cam >>>>> >>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Hi Tomas, >>>>>> >>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>> engine VM, I have: >>>>>> >>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>> >>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>> >>>>>> Is there somewhere else I should be looking? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>> > >>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>> > the following error? >>>>>>>> > >>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>> >>>>> 'ImportVm' >>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>> >>>>> >>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>> >>>>>>>> to match the OS and VM Display type;-) >>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>> VMs >>>>>>> >>>>>>> >>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>> os.other.devices.display.protocols.value = >>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > >>>>>>>> > Thanks. >>>>>>>> > >>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> >> Hi Martin, >>>>>>>> >> >>>>>>>> >>> >>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>> >> >>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>> >> somewhere? >>>>>>>> >> >>>>>>>> >> Cheers, >>>>>>>> >> >>>>>>>> >> Cam >>>>>>>> >> >>>>>>>> >>> Regards >>>>>>>> >>> >>>>>>>> >>> Martin Sivak >>>>>>>> >>> >>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> >>>> Hi Yanir, >>>>>>>> >>>> >>>>>>>> >>>> Thanks for the reply. >>>>>>>> >>>> >>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>> >>>>> 'ImportVm' >>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>> >>>>> >>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>> >>>>> further >>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>> >>>>> >>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>> >>>>> state ? >>>>>>>> >>>> >>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>> >>>> >>>>>>>> >>>> Thanks for any help. >>>>>>>> >>>> >>>>>>>> >>>> Regards, >>>>>>>> >>>> >>>>>>>> >>>> Cam >>>>>>>> >>>> >>>>>>>> >>>>> Regards, >>>>>>>> >>>>> Yanir >>>>>>>> >>>>> >>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> >>>>>> >>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>> >>>>>> >>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>> >>>>>> (which >>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>> >>>>>> >>>>>>>> >>>>>> Thanks, >>>>>>>> >>>>>> >>>>>>>> >>>>>> Cam >>>>>>>> >>>>>> >>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>> >>>>>>> 'ImportVm' >>>>>>>> >>>>>>> failed for user SYST >>>>>>>> >>>>>>> EM. Reasons: >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>> >>>>>>> sharedLocks= >>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>> >>>>>>> Engine VM >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>> >>>>>>> different >>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>> >>>>>>> host >>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>> >>>>>>> to >>>>>>>> >>>>>>> initial vm.conf >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>> >>>>>>> for >>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>> >>>>>>> will >>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>> >>>>>>> guess >>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> Thanks, >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> Campbell >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>> >>>>>>>> more >>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>> >>>>>>>> belong to >>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Best regards >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Martin Sivak >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>> >>>>>>>>> hosts? >>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> Cam >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>> >>>>>>>>>> they >>>>>>>> >>>>>>>>>> arose. >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>>> Thanks, >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>>> Campbell >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>> >>>>>>>>>> wrote: >>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>> >>>>>>>>>>> it by >>>>>>>> >>>>>>>>>>> running: >>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>> >>>>>>>>>>> and >>>>>>>> >>>>>>>>>>> import it >>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>> >>>>>>>>>>> Jenny >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>> >>>>>>>>>>> wrote: >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>> >>>>>>>>>>>> ran >>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>> >>>>>>>>>>>> without any >>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>> >>>>>>>>>>>> storage >>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>> >>>>>>>>>>>> storage >>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>> >>>>>>>>>>>> domain, >>>>>>>> >>>>>>>>>>>> which >>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>> >>>>>>>>>>>>> What >>>>>>>> >>>>>>>>>>>>> happens >>>>>>>> >>>>>>>>>>>>> when >>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>> >>>>>>>>>>>> no >>>>>>>> >>>>>>>>>>>> output >>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>> >>>>>>>>>>>> to >>>>>>>> >>>>>>>>>>>> being >>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>> >>>>>>>>>>>> need >>>>>>>> >>>>>>>>>>>> to >>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> Cam >>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>> >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>> >>>>>>>>>>>>>> There >>>>>>>> >>>>>>>>>>>>>> were >>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>> >>>>>>>>>>>>>> get >>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>> >>>>>>>>>>>>>> code >>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>> >>>>>>>>>>>>>> in >>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>> >>>>>>>>> Users mailing list >>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>> _______________________________________________ >>>>>>>> >>>>>> Users mailing list >>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> > _______________________________________________ >>>>>>>> > Users mailing list >>>>>>>> > Users(a)ovirt.org >>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>>

Martin Sivak

6:22 a.m.

...

Is there any way of recovering from this situation? I'd prefer to fix the issue rather than re-deploy, but if there is no recovery path, I could perhaps try re-deploying the hosted engine. In which case, would the best option be to take a backup of the Hosted Engine, and then shut it down, re-initialise the SAN partition (or use another partition) and retry the deployment? Would it be better to use the older backup from the bare metal engine that I originally used, or use a backup from the Hosted Engine? I'm not sure if any VMs have been added since switching to Hosted Engine. Unfortunately I have very little time left to get this working before I have to hand it over for eval (by end of Friday). Here are some log snippets from the cluster that are current In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: 3) (clusterlock:282) 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] Error acquiring host id 3 for domain 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId self.domain.acquireHostId(self.hostId, async=True) File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId self._manifest.acquireHostId(hostId, async) File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId self._domainLock.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 297, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: MainThread::ERROR::2017-06-19 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-19 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-19 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 443, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 823, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::ERROR::2017-06-19 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Shutting down the agent because of 3 failures in a row! From sanlock.log: 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 conflicts with name of list1 s5 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 From the two other hosts: host 2: vdsm.log 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] Internal server error (__init__:570) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 565, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod result = fn(*methodArgs) File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies 'current_values': v.getIoTune()} File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune result = self.getIoTuneResponse() File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse res = self._dom.blockIoTune( File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__ % self.vmid) NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not started yet or was shut down /var/log/ovirt-hosted-engine-ha/agent.log MainThread::INFO::2017-06-29 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 MainThread::INFO::2017-06-29 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2017-06-29 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 MainThread::INFO::2017-06-29 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2017-06-29 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2017-06-29 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 MainThread::INFO::2017-06-29 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2017-06-29 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain /var/log/messages: Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! host 1: /var/log/messages also in sanlock.log Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 678326 [24159]: s4531 add_lockspace fail result -262 /var/log/ovirt-hosted-engine-ha/agent.log: MainThread::ERROR::2017-06-27 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-27 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::WARNING::2017-06-27 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 443, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 823, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition MainThread::ERROR::2017-06-27 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Shutting down the agent because of 3 failures in a row! MainThread::INFO::2017-06-27 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-06-27 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Failed to stop monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' MainThread::INFO::2017-06-27 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down Thanks for any help, Cam On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: > Hi Martin, > > yes, on two of the machines they have the same host_id. The other has > a different host_id. > > To update since yesterday: I reinstalled and deployed Hosted Engine on > the other host (so all three hosts in the cluster now have it > installed). The second one I deployed said it was able to host the > engine (unlike the first I reinstalled), so I tried putting the host > with the Hosted Engine on it into maintenance to see if it would > migrate over. It managed to move all hosts but the Hosted Engine. And > now the host that said it was able to host the engine says > 'unavailable due to HA score'. The host that it was trying to move > from is now in 'preparing for maintenance' for the last 12 hours. > > The summary is: > > kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled > with 'Deploy Hosted Engine'. No icon saying it can host the Hosted > Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. > 'add_lockspace' fails in sanlock.log > > kvm-ldn-02 - the other host that was pre-existing before Hosted Engine > was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon > saying that it was able to host the Hosted Engine, but after migration > was attempted when putting kvm-ldn-03 into maintenance, it reports: > 'unavailable due to HA score'. It has a host_id of '1' in > /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log > > kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was > not part of the original cluster. I restored the bare-metal engine > backup in the Hosted Engine on this host when deploying it, without > error. It currently has the Hosted Engine on it (as the only VM after > I put that host into maintenance to test the HA of Hosted Engine). > Sanlock log shows conflicts > > I will look through all the logs for any other errors. Please let me > know if you need any logs or other clarification/information. > > Thanks, > > Campbell > > On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >> Hi, >> >> can you please check the contents of >> /etc/ovirt-hosted-engine/hosted-engine.conf or >> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >> right now) and search for host-id? >> >> Make sure the IDs are different. If they are not, then there is a bug somewhere. >> >> Martin >> >> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>> I see this on the host it is trying to migrate in /var/log/sanlock: >>> >>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>> >>> The sanlock service is running. Why would this occur? >>> >>> Thanks, >>> >>> C >>> >>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Hi Martin, >>>> >>>> Thanks for the reply. I have done this, and the deployment completed >>>> without error. However, it still will not allow the Hosted Engine >>>> migrate to another host. The >>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>> reports: >>>> >>>> --------------------8<------------------- >>>> >>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>> High Availability Communications Broker... >>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>> Failed to read metadata from >>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>> Traceback (most >>>> recent call last): >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>> line 129, in get_raw_stats_for_service_type >>>> f = >>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>> OSError: [Errno 2] >>>> No such file or directory: >>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>> >>>> --------------------8<------------------- >>>> >>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>> perms are slightly different on the host that is running the VM vs the >>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>> this a san locking issue? >>>> >>>> Thanks for any help, >>>> >>>> Cam >>>> >>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>> Should it be? It was not in the instructions for the migration from >>>>>> bare-metal to Hosted VM >>>>> >>>>> The hosted engine will only migrate to hosts that have the services >>>>> running. Please put one other host to maintenance and select Hosted >>>>> engine action: DEPLOY in the reinstall dialog. >>>>> >>>>> Best regards >>>>> >>>>> Martin Sivak >>>>> >>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>> version. However, I am still unable to migrate the engine VM to >>>>>> another host. When I try putting the host it is currently on into >>>>>> maintenance, it reports: >>>>>> >>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>> There are no available hosts capable of running the engine VM. >>>>>> >>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>> unknown stale-data'. >>>>>> >>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>> 'disabled' in systemd. It won't start as there is no >>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>> Should it be? It was not in the instructions for the migration from >>>>>> bare-metal to Hosted VM >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Hi Tomas, >>>>>>> >>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>> engine VM, I have: >>>>>>> >>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>> >>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>> >>>>>>> Is there somewhere else I should be looking? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>> > >>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>> > the following error? >>>>>>>>> > >>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>> >>>>> 'ImportVm' >>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>> >>>>> >>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>> >>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>> VMs >>>>>>>> >>>>>>>> >>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>> os.other.devices.display.protocols.value = >>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> > >>>>>>>>> > Thanks. >>>>>>>>> > >>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> >> Hi Martin, >>>>>>>>> >> >>>>>>>>> >>> >>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>> >> >>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>> >> somewhere? >>>>>>>>> >> >>>>>>>>> >> Cheers, >>>>>>>>> >> >>>>>>>>> >> Cam >>>>>>>>> >> >>>>>>>>> >>> Regards >>>>>>>>> >>> >>>>>>>>> >>> Martin Sivak >>>>>>>>> >>> >>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> >>>> Hi Yanir, >>>>>>>>> >>>> >>>>>>>>> >>>> Thanks for the reply. >>>>>>>>> >>>> >>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>> >>>>> 'ImportVm' >>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>> >>>>> >>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>> >>>>> further >>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>> >>>>> >>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>> >>>>> state ? >>>>>>>>> >>>> >>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>> >>>> >>>>>>>>> >>>> Thanks for any help. >>>>>>>>> >>>> >>>>>>>>> >>>> Regards, >>>>>>>>> >>>> >>>>>>>>> >>>> Cam >>>>>>>>> >>>> >>>>>>>>> >>>>> Regards, >>>>>>>>> >>>>> Yanir >>>>>>>>> >>>>> >>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>> >>>>>> (which >>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Thanks, >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Cam >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>> >>>>>>> Engine VM >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>> >>>>>>> different >>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>> >>>>>>> host >>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>> >>>>>>> to >>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>> >>>>>>> for >>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>> >>>>>>> will >>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>> >>>>>>> guess >>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> Thanks, >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> Campbell >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>> Hi, >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>> >>>>>>>> more >>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>> >>>>>>>> belong to >>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Best regards >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>> >>>>>>>>> hosts? >>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Cam >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>> >>>>>>>>>> they >>>>>>>>> >>>>>>>>>> arose. >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>> >>>>>>>>>>> it by >>>>>>>>> >>>>>>>>>>> running: >>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>> >>>>>>>>>>> and >>>>>>>>> >>>>>>>>>>> import it >>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>> >>>>>>>>>>>> which >>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>> >>>>>>>>>>>> no >>>>>>>>> >>>>>>>>>>>> output >>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>> >>>>>>>>>>>> to >>>>>>>>> >>>>>>>>>>>> being >>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>> >>>>>>>>>>>> need >>>>>>>>> >>>>>>>>>>>> to >>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>> >>>>>> Users mailing list >>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>>>> > _______________________________________________ >>>>>>>>> > Users mailing list >>>>>>>>> > Users(a)ovirt.org >>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>> > >>>>>>>>> > >>>>>>>>> >>>>>>>>

cmc

6:33 a.m.

...

Change the ids so they are distinct. I need to check if there is a way to read the SPM ids from the engine as using the same numbers would be the best. Martin On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: > Is there any way of recovering from this situation? I'd prefer to fix > the issue rather than re-deploy, but if there is no recovery path, I > could perhaps try re-deploying the hosted engine. In which case, would > the best option be to take a backup of the Hosted Engine, and then > shut it down, re-initialise the SAN partition (or use another > partition) and retry the deployment? Would it be better to use the > older backup from the bare metal engine that I originally used, or use > a backup from the Hosted Engine? I'm not sure if any VMs have been > added since switching to Hosted Engine. > > Unfortunately I have very little time left to get this working before > I have to hand it over for eval (by end of Friday). > > Here are some log snippets from the cluster that are current > > In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: > > 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] > Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: > 3) (clusterlock:282) > 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] > Error acquiring host id 3 for domain > 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId > self.domain.acquireHostId(self.hostId, async=True) > File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId > self._manifest.acquireHostId(hostId, async) > File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId > self._domainLock.acquireHostId(hostId, async) > File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", > line 297, in acquireHostId > raise se.AcquireHostIdFailure(self._sdUUID, e) > AcquireHostIdFailure: Cannot acquire host id: > ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock > lockspace add failure', 'Invalid argument')) > > From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: > > MainThread::ERROR::2017-06-19 > 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) > Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-19 > 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Error while monitoring engine: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-19 > 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Unexpected error > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 443, in start_monitoring > self._initialize_domain_monitor() > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 823, in _initialize_domain_monitor > raise Exception(msg) > Exception: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::ERROR::2017-06-19 > 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Shutting down the agent because of 3 failures in a row! > > From sanlock.log: > > 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace > 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 > conflicts with name of list1 s5 > 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 > > From the two other hosts: > > host 2: > > vdsm.log > > 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] > Internal server error (__init__:570) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line > 565, in _handle_request > res = method(**params) > File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line > 202, in _dynamicMethod > result = fn(*methodArgs) > File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies > io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() > File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies > 'current_values': v.getIoTune()} > File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune > result = self.getIoTuneResponse() > File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse > res = self._dom.blockIoTune( > File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line > 47, in __getattr__ > % self.vmid) > NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not > started yet or was shut down > > /var/log/ovirt-hosted-engine-ha/agent.log > > MainThread::INFO::2017-06-29 > 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, > volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 > MainThread::INFO::2017-06-29 > 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) > Extracting Engine VM OVF from the OVF_STORE > MainThread::INFO::2017-06-29 > 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) > OVF_STORE volume path: > /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 > MainThread::INFO::2017-06-29 > 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Found an OVF for HE VM, trying to convert > MainThread::INFO::2017-06-29 > 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Got vm.conf from OVF_STORE > MainThread::INFO::2017-06-29 > 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) > Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 > MainThread::INFO::2017-06-29 > 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Current state EngineUnexpectedlyDown (score: 0) > MainThread::INFO::2017-06-29 > 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) > Reloading vm.conf from the shared storage domain > > /var/log/messages: > > Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! > > > host 1: > > /var/log/messages also in sanlock.log > > Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 > 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 > 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 > Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 > 678326 [24159]: s4531 add_lockspace fail result -262 > > /var/log/ovirt-hosted-engine-ha/agent.log: > > MainThread::ERROR::2017-06-27 > 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) > Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-27 > 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Error while monitoring engine: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-27 > 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Unexpected error > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 443, in start_monitoring > self._initialize_domain_monitor() > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 823, in _initialize_domain_monitor > raise Exception(msg) > Exception: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::ERROR::2017-06-27 > 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Shutting down the agent because of 3 failures in a row! > MainThread::INFO::2017-06-27 > 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) > VDSM domain monitor status: PENDING > MainThread::INFO::2017-06-27 > 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) > Failed to stop monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is > member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' > MainThread::INFO::2017-06-27 > 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > Agent shutting down > > > Thanks for any help, > > > Cam > > > On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >> Hi Martin, >> >> yes, on two of the machines they have the same host_id. The other has >> a different host_id. >> >> To update since yesterday: I reinstalled and deployed Hosted Engine on >> the other host (so all three hosts in the cluster now have it >> installed). The second one I deployed said it was able to host the >> engine (unlike the first I reinstalled), so I tried putting the host >> with the Hosted Engine on it into maintenance to see if it would >> migrate over. It managed to move all hosts but the Hosted Engine. And >> now the host that said it was able to host the engine says >> 'unavailable due to HA score'. The host that it was trying to move >> from is now in 'preparing for maintenance' for the last 12 hours. >> >> The summary is: >> >> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >> 'add_lockspace' fails in sanlock.log >> >> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >> saying that it was able to host the Hosted Engine, but after migration >> was attempted when putting kvm-ldn-03 into maintenance, it reports: >> 'unavailable due to HA score'. It has a host_id of '1' in >> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >> >> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >> not part of the original cluster. I restored the bare-metal engine >> backup in the Hosted Engine on this host when deploying it, without >> error. It currently has the Hosted Engine on it (as the only VM after >> I put that host into maintenance to test the HA of Hosted Engine). >> Sanlock log shows conflicts >> >> I will look through all the logs for any other errors. Please let me >> know if you need any logs or other clarification/information. >> >> Thanks, >> >> Campbell >> >> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>> Hi, >>> >>> can you please check the contents of >>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>> right now) and search for host-id? >>> >>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>> >>> Martin >>> >>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>> >>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>> >>>> The sanlock service is running. Why would this occur? >>>> >>>> Thanks, >>>> >>>> C >>>> >>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Hi Martin, >>>>> >>>>> Thanks for the reply. I have done this, and the deployment completed >>>>> without error. However, it still will not allow the Hosted Engine >>>>> migrate to another host. The >>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>> reports: >>>>> >>>>> --------------------8<------------------- >>>>> >>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>> High Availability Communications Broker... >>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>> Failed to read metadata from >>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>> Traceback (most >>>>> recent call last): >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>> line 129, in get_raw_stats_for_service_type >>>>> f = >>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>> OSError: [Errno 2] >>>>> No such file or directory: >>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>> >>>>> --------------------8<------------------- >>>>> >>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>> perms are slightly different on the host that is running the VM vs the >>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>> this a san locking issue? >>>>> >>>>> Thanks for any help, >>>>> >>>>> Cam >>>>> >>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>> bare-metal to Hosted VM >>>>>> >>>>>> The hosted engine will only migrate to hosts that have the services >>>>>> running. Please put one other host to maintenance and select Hosted >>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>> >>>>>> Best regards >>>>>> >>>>>> Martin Sivak >>>>>> >>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>> another host. When I try putting the host it is currently on into >>>>>>> maintenance, it reports: >>>>>>> >>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>> There are no available hosts capable of running the engine VM. >>>>>>> >>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>> unknown stale-data'. >>>>>>> >>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>> bare-metal to Hosted VM >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> Hi Tomas, >>>>>>>> >>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>> engine VM, I have: >>>>>>>> >>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>> >>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>> >>>>>>>> Is there somewhere else I should be looking? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>> > >>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>> > the following error? >>>>>>>>>> > >>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>> >>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>> VMs >>>>>>>>> >>>>>>>>> >>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > Thanks. >>>>>>>>>> > >>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> >> Hi Martin, >>>>>>>>>> >> >>>>>>>>>> >>> >>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>> >> >>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>> >> somewhere? >>>>>>>>>> >> >>>>>>>>>> >> Cheers, >>>>>>>>>> >> >>>>>>>>>> >> Cam >>>>>>>>>> >> >>>>>>>>>> >>> Regards >>>>>>>>>> >>> >>>>>>>>>> >>> Martin Sivak >>>>>>>>>> >>> >>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>> >>>> >>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>> >>>> >>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>> >>>>> further >>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>> >>>>> state ? >>>>>>>>>> >>>> >>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>> >>>> >>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>> >>>> >>>>>>>>>> >>>> Regards, >>>>>>>>>> >>>> >>>>>>>>>> >>>> Cam >>>>>>>>>> >>>> >>>>>>>>>> >>>>> Regards, >>>>>>>>>> >>>>> Yanir >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>> >>>>>> (which >>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Thanks, >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Cam >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>> >>>>>>> different >>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>> >>>>>>> host >>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>> >>>>>>> to >>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>> >>>>>>> for >>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>> >>>>>>> will >>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>> >>>>>>> guess >>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> Campbell >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>> >>>>>>> wrote: >>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>> >>>>>>>> more >>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>> >>>>>>>> belong to >>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>> >>>>>>>>>> they >>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> > _______________________________________________ >>>>>>>>>> > Users mailing list >>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>

Martin Sivak

7:25 a.m.

...

Thanks Martin, do I have to restart anything? When I try to use the 'migrate' operation, it complains that the other two hosts 'did not satisfy internal filter HA because it is not a Hosted Engine host..' (even though I reinstalled both these hosts with the 'deploy hosted engine' option, which suggests that something needs restarting. Should I worry about the sanlock errors, or will that be resolved by the change in host_id? Kind regards, Cam On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: > Change the ids so they are distinct. I need to check if there is a way > to read the SPM ids from the engine as using the same numbers would be > the best. > > Martin > > > > On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >> Is there any way of recovering from this situation? I'd prefer to fix >> the issue rather than re-deploy, but if there is no recovery path, I >> could perhaps try re-deploying the hosted engine. In which case, would >> the best option be to take a backup of the Hosted Engine, and then >> shut it down, re-initialise the SAN partition (or use another >> partition) and retry the deployment? Would it be better to use the >> older backup from the bare metal engine that I originally used, or use >> a backup from the Hosted Engine? I'm not sure if any VMs have been >> added since switching to Hosted Engine. >> >> Unfortunately I have very little time left to get this working before >> I have to hand it over for eval (by end of Friday). >> >> Here are some log snippets from the cluster that are current >> >> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >> >> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >> 3) (clusterlock:282) >> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >> Error acquiring host id 3 for domain >> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >> self.domain.acquireHostId(self.hostId, async=True) >> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >> self._manifest.acquireHostId(hostId, async) >> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >> self._domainLock.acquireHostId(hostId, async) >> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >> line 297, in acquireHostId >> raise se.AcquireHostIdFailure(self._sdUUID, e) >> AcquireHostIdFailure: Cannot acquire host id: >> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >> lockspace add failure', 'Invalid argument')) >> >> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >> >> MainThread::ERROR::2017-06-19 >> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >> Failed to start monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >> during domain acquisition >> MainThread::WARNING::2017-06-19 >> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Error while monitoring engine: Failed to start monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >> during domain acquisition >> MainThread::WARNING::2017-06-19 >> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Unexpected error >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 443, in start_monitoring >> self._initialize_domain_monitor() >> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 823, in _initialize_domain_monitor >> raise Exception(msg) >> Exception: Failed to start monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >> during domain acquisition >> MainThread::ERROR::2017-06-19 >> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Shutting down the agent because of 3 failures in a row! >> >> From sanlock.log: >> >> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >> conflicts with name of list1 s5 >> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >> >> From the two other hosts: >> >> host 2: >> >> vdsm.log >> >> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >> Internal server error (__init__:570) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >> 565, in _handle_request >> res = method(**params) >> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >> 202, in _dynamicMethod >> result = fn(*methodArgs) >> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >> 'current_values': v.getIoTune()} >> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >> result = self.getIoTuneResponse() >> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >> res = self._dom.blockIoTune( >> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >> 47, in __getattr__ >> % self.vmid) >> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >> started yet or was shut down >> >> /var/log/ovirt-hosted-engine-ha/agent.log >> >> MainThread::INFO::2017-06-29 >> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >> MainThread::INFO::2017-06-29 >> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >> Extracting Engine VM OVF from the OVF_STORE >> MainThread::INFO::2017-06-29 >> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >> OVF_STORE volume path: >> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >> MainThread::INFO::2017-06-29 >> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Found an OVF for HE VM, trying to convert >> MainThread::INFO::2017-06-29 >> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Got vm.conf from OVF_STORE >> MainThread::INFO::2017-06-29 >> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >> MainThread::INFO::2017-06-29 >> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Current state EngineUnexpectedlyDown (score: 0) >> MainThread::INFO::2017-06-29 >> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >> Reloading vm.conf from the shared storage domain >> >> /var/log/messages: >> >> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >> >> >> host 1: >> >> /var/log/messages also in sanlock.log >> >> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >> 678326 [24159]: s4531 add_lockspace fail result -262 >> >> /var/log/ovirt-hosted-engine-ha/agent.log: >> >> MainThread::ERROR::2017-06-27 >> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >> Failed to start monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >> during domain acquisition >> MainThread::WARNING::2017-06-27 >> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Error while monitoring engine: Failed to start monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >> during domain acquisition >> MainThread::WARNING::2017-06-27 >> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Unexpected error >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 443, in start_monitoring >> self._initialize_domain_monitor() >> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 823, in _initialize_domain_monitor >> raise Exception(msg) >> Exception: Failed to start monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >> during domain acquisition >> MainThread::ERROR::2017-06-27 >> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Shutting down the agent because of 3 failures in a row! >> MainThread::INFO::2017-06-27 >> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >> VDSM domain monitor status: PENDING >> MainThread::INFO::2017-06-27 >> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >> Failed to stop monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >> MainThread::INFO::2017-06-27 >> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> Agent shutting down >> >> >> Thanks for any help, >> >> >> Cam >> >> >> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>> Hi Martin, >>> >>> yes, on two of the machines they have the same host_id. The other has >>> a different host_id. >>> >>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>> the other host (so all three hosts in the cluster now have it >>> installed). The second one I deployed said it was able to host the >>> engine (unlike the first I reinstalled), so I tried putting the host >>> with the Hosted Engine on it into maintenance to see if it would >>> migrate over. It managed to move all hosts but the Hosted Engine. And >>> now the host that said it was able to host the engine says >>> 'unavailable due to HA score'. The host that it was trying to move >>> from is now in 'preparing for maintenance' for the last 12 hours. >>> >>> The summary is: >>> >>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>> 'add_lockspace' fails in sanlock.log >>> >>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>> saying that it was able to host the Hosted Engine, but after migration >>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>> 'unavailable due to HA score'. It has a host_id of '1' in >>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>> >>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>> not part of the original cluster. I restored the bare-metal engine >>> backup in the Hosted Engine on this host when deploying it, without >>> error. It currently has the Hosted Engine on it (as the only VM after >>> I put that host into maintenance to test the HA of Hosted Engine). >>> Sanlock log shows conflicts >>> >>> I will look through all the logs for any other errors. Please let me >>> know if you need any logs or other clarification/information. >>> >>> Thanks, >>> >>> Campbell >>> >>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>> Hi, >>>> >>>> can you please check the contents of >>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>> right now) and search for host-id? >>>> >>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>> >>>> Martin >>>> >>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>> >>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>> >>>>> The sanlock service is running. Why would this occur? >>>>> >>>>> Thanks, >>>>> >>>>> C >>>>> >>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Hi Martin, >>>>>> >>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>> without error. However, it still will not allow the Hosted Engine >>>>>> migrate to another host. The >>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>> reports: >>>>>> >>>>>> --------------------8<------------------- >>>>>> >>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>> High Availability Communications Broker... >>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>> Failed to read metadata from >>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>> Traceback (most >>>>>> recent call last): >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>> line 129, in get_raw_stats_for_service_type >>>>>> f = >>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>> OSError: [Errno 2] >>>>>> No such file or directory: >>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>> >>>>>> --------------------8<------------------- >>>>>> >>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>> perms are slightly different on the host that is running the VM vs the >>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>> this a san locking issue? >>>>>> >>>>>> Thanks for any help, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>> bare-metal to Hosted VM >>>>>>> >>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Martin Sivak >>>>>>> >>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>> maintenance, it reports: >>>>>>>> >>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>> >>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>> unknown stale-data'. >>>>>>>> >>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>> bare-metal to Hosted VM >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> Hi Tomas, >>>>>>>>> >>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>> engine VM, I have: >>>>>>>>> >>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>> >>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>> >>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Cam >>>>>>>>> >>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>> > >>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>> > the following error? >>>>>>>>>>> > >>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>> >>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>> VMs >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> > >>>>>>>>>>> > Thanks. >>>>>>>>>>> > >>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> >> Hi Martin, >>>>>>>>>>> >> >>>>>>>>>>> >>> >>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>> >> >>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>> >> somewhere? >>>>>>>>>>> >> >>>>>>>>>>> >> Cheers, >>>>>>>>>>> >> >>>>>>>>>>> >> Cam >>>>>>>>>>> >> >>>>>>>>>>> >>> Regards >>>>>>>>>>> >>> >>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>> >>> >>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>> >>>>> further >>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>> >>>>> state ? >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Regards, >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Cam >>>>>>>>>>> >>>> >>>>>>>>>>> >>>>> Regards, >>>>>>>>>>> >>>>> Yanir >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>> >>>>>> (which >>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> Cam >>>>>>>>>>> >>>>>> >>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>> >>>>>>> different >>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>> >>>>>>> host >>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>> >>>>>>> to >>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>> >>>>>>> for >>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>> >>>>>>> will >>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>> >>>>>>> guess >>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>> >>>>>>> >>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>> >>>>>>>> more >>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> > _______________________________________________ >>>>>>>>>>> > Users mailing list >>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>

cmc

9:42 a.m.

...

Hi, yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. The scheduling message just means that the host has score 0 or is not reporting score at all. Martin On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: > Thanks Martin, do I have to restart anything? When I try to use the > 'migrate' operation, it complains that the other two hosts 'did not > satisfy internal filter HA because it is not a Hosted Engine host..' > (even though I reinstalled both these hosts with the 'deploy hosted > engine' option, which suggests that something needs restarting. Should > I worry about the sanlock errors, or will that be resolved by the > change in host_id? > > Kind regards, > > Cam > > On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >> Change the ids so they are distinct. I need to check if there is a way >> to read the SPM ids from the engine as using the same numbers would be >> the best. >> >> Martin >> >> >> >> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>> Is there any way of recovering from this situation? I'd prefer to fix >>> the issue rather than re-deploy, but if there is no recovery path, I >>> could perhaps try re-deploying the hosted engine. In which case, would >>> the best option be to take a backup of the Hosted Engine, and then >>> shut it down, re-initialise the SAN partition (or use another >>> partition) and retry the deployment? Would it be better to use the >>> older backup from the bare metal engine that I originally used, or use >>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>> added since switching to Hosted Engine. >>> >>> Unfortunately I have very little time left to get this working before >>> I have to hand it over for eval (by end of Friday). >>> >>> Here are some log snippets from the cluster that are current >>> >>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>> >>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>> 3) (clusterlock:282) >>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>> Error acquiring host id 3 for domain >>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>> Traceback (most recent call last): >>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>> self.domain.acquireHostId(self.hostId, async=True) >>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>> self._manifest.acquireHostId(hostId, async) >>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>> self._domainLock.acquireHostId(hostId, async) >>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>> line 297, in acquireHostId >>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>> AcquireHostIdFailure: Cannot acquire host id: >>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>> lockspace add failure', 'Invalid argument')) >>> >>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>> >>> MainThread::ERROR::2017-06-19 >>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>> Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> MainThread::WARNING::2017-06-19 >>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>> Error while monitoring engine: Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> MainThread::WARNING::2017-06-19 >>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>> Unexpected error >>> Traceback (most recent call last): >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 443, in start_monitoring >>> self._initialize_domain_monitor() >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 823, in _initialize_domain_monitor >>> raise Exception(msg) >>> Exception: Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> MainThread::ERROR::2017-06-19 >>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>> Shutting down the agent because of 3 failures in a row! >>> >>> From sanlock.log: >>> >>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>> conflicts with name of list1 s5 >>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>> >>> From the two other hosts: >>> >>> host 2: >>> >>> vdsm.log >>> >>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>> Internal server error (__init__:570) >>> Traceback (most recent call last): >>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>> 565, in _handle_request >>> res = method(**params) >>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>> 202, in _dynamicMethod >>> result = fn(*methodArgs) >>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>> 'current_values': v.getIoTune()} >>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>> result = self.getIoTuneResponse() >>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>> res = self._dom.blockIoTune( >>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>> 47, in __getattr__ >>> % self.vmid) >>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>> started yet or was shut down >>> >>> /var/log/ovirt-hosted-engine-ha/agent.log >>> >>> MainThread::INFO::2017-06-29 >>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>> MainThread::INFO::2017-06-29 >>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>> Extracting Engine VM OVF from the OVF_STORE >>> MainThread::INFO::2017-06-29 >>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>> OVF_STORE volume path: >>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>> MainThread::INFO::2017-06-29 >>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Found an OVF for HE VM, trying to convert >>> MainThread::INFO::2017-06-29 >>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Got vm.conf from OVF_STORE >>> MainThread::INFO::2017-06-29 >>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>> MainThread::INFO::2017-06-29 >>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>> Current state EngineUnexpectedlyDown (score: 0) >>> MainThread::INFO::2017-06-29 >>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>> Reloading vm.conf from the shared storage domain >>> >>> /var/log/messages: >>> >>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>> >>> >>> host 1: >>> >>> /var/log/messages also in sanlock.log >>> >>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>> 678326 [24159]: s4531 add_lockspace fail result -262 >>> >>> /var/log/ovirt-hosted-engine-ha/agent.log: >>> >>> MainThread::ERROR::2017-06-27 >>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>> Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> MainThread::WARNING::2017-06-27 >>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>> Error while monitoring engine: Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> MainThread::WARNING::2017-06-27 >>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>> Unexpected error >>> Traceback (most recent call last): >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 443, in start_monitoring >>> self._initialize_domain_monitor() >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 823, in _initialize_domain_monitor >>> raise Exception(msg) >>> Exception: Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> MainThread::ERROR::2017-06-27 >>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>> Shutting down the agent because of 3 failures in a row! >>> MainThread::INFO::2017-06-27 >>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>> VDSM domain monitor status: PENDING >>> MainThread::INFO::2017-06-27 >>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>> Failed to stop monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>> MainThread::INFO::2017-06-27 >>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>> Agent shutting down >>> >>> >>> Thanks for any help, >>> >>> >>> Cam >>> >>> >>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>> Hi Martin, >>>> >>>> yes, on two of the machines they have the same host_id. The other has >>>> a different host_id. >>>> >>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>> the other host (so all three hosts in the cluster now have it >>>> installed). The second one I deployed said it was able to host the >>>> engine (unlike the first I reinstalled), so I tried putting the host >>>> with the Hosted Engine on it into maintenance to see if it would >>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>> now the host that said it was able to host the engine says >>>> 'unavailable due to HA score'. The host that it was trying to move >>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>> >>>> The summary is: >>>> >>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>> 'add_lockspace' fails in sanlock.log >>>> >>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>> saying that it was able to host the Hosted Engine, but after migration >>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>> >>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>> not part of the original cluster. I restored the bare-metal engine >>>> backup in the Hosted Engine on this host when deploying it, without >>>> error. It currently has the Hosted Engine on it (as the only VM after >>>> I put that host into maintenance to test the HA of Hosted Engine). >>>> Sanlock log shows conflicts >>>> >>>> I will look through all the logs for any other errors. Please let me >>>> know if you need any logs or other clarification/information. >>>> >>>> Thanks, >>>> >>>> Campbell >>>> >>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>> Hi, >>>>> >>>>> can you please check the contents of >>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>> right now) and search for host-id? >>>>> >>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>> >>>>> Martin >>>>> >>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>> >>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>> >>>>>> The sanlock service is running. Why would this occur? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> C >>>>>> >>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Hi Martin, >>>>>>> >>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>> migrate to another host. The >>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>> reports: >>>>>>> >>>>>>> --------------------8<------------------- >>>>>>> >>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>> High Availability Communications Broker... >>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>> Failed to read metadata from >>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>> Traceback (most >>>>>>> recent call last): >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>> f = >>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>> OSError: [Errno 2] >>>>>>> No such file or directory: >>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>> >>>>>>> --------------------8<------------------- >>>>>>> >>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>> this a san locking issue? >>>>>>> >>>>>>> Thanks for any help, >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>> bare-metal to Hosted VM >>>>>>>> >>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>> >>>>>>>> Best regards >>>>>>>> >>>>>>>> Martin Sivak >>>>>>>> >>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>> maintenance, it reports: >>>>>>>>> >>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>> >>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>> unknown stale-data'. >>>>>>>>> >>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>> bare-metal to Hosted VM >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Cam >>>>>>>>> >>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> Hi Tomas, >>>>>>>>>> >>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>> engine VM, I have: >>>>>>>>>> >>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>> >>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>> >>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>> > the following error? >>>>>>>>>>>> > >>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>> >>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>> VMs >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> > >>>>>>>>>>>> > Thanks. >>>>>>>>>>>> > >>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>> >> >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>> >> >>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>> >> somewhere? >>>>>>>>>>>> >> >>>>>>>>>>>> >> Cheers, >>>>>>>>>>>> >> >>>>>>>>>>>> >> Cam >>>>>>>>>>>> >> >>>>>>>>>>>> >>> Regards >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>> >>>>> further >>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> Cam >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>> >>>>>> >>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>> >>>>>>> >>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>> > Users mailing list >>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>

Martin Sivak

10 a.m.

...

I've restarted those two services across all hosts, have taken the Hosted Engine host out of maintenance, and when I try to migrate the Hosted Engine over to another host, it reports that all three hosts 'did not satisfy internal filter HA because it is not a Hosted Engine host'. On the host that the Hosted Engine is currently on it reports in the agent.log: ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Connection closed: Connection closed Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception getting service path: Connection closed Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 691, in _initialize_sanlock constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 162, in get_service_path .format(str(e))) RequestError: Failed to get service path: Connection closed Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: > Hi, > > yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. > > The scheduling message just means that the host has score 0 or is not > reporting score at all. > > Martin > > On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >> Thanks Martin, do I have to restart anything? When I try to use the >> 'migrate' operation, it complains that the other two hosts 'did not >> satisfy internal filter HA because it is not a Hosted Engine host..' >> (even though I reinstalled both these hosts with the 'deploy hosted >> engine' option, which suggests that something needs restarting. Should >> I worry about the sanlock errors, or will that be resolved by the >> change in host_id? >> >> Kind regards, >> >> Cam >> >> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>> Change the ids so they are distinct. I need to check if there is a way >>> to read the SPM ids from the engine as using the same numbers would be >>> the best. >>> >>> Martin >>> >>> >>> >>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Is there any way of recovering from this situation? I'd prefer to fix >>>> the issue rather than re-deploy, but if there is no recovery path, I >>>> could perhaps try re-deploying the hosted engine. In which case, would >>>> the best option be to take a backup of the Hosted Engine, and then >>>> shut it down, re-initialise the SAN partition (or use another >>>> partition) and retry the deployment? Would it be better to use the >>>> older backup from the bare metal engine that I originally used, or use >>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>> added since switching to Hosted Engine. >>>> >>>> Unfortunately I have very little time left to get this working before >>>> I have to hand it over for eval (by end of Friday). >>>> >>>> Here are some log snippets from the cluster that are current >>>> >>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>> >>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>> 3) (clusterlock:282) >>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>> Error acquiring host id 3 for domain >>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>> Traceback (most recent call last): >>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>> self.domain.acquireHostId(self.hostId, async=True) >>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>> self._manifest.acquireHostId(hostId, async) >>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>> self._domainLock.acquireHostId(hostId, async) >>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>> line 297, in acquireHostId >>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>> AcquireHostIdFailure: Cannot acquire host id: >>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>> lockspace add failure', 'Invalid argument')) >>>> >>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>> >>>> MainThread::ERROR::2017-06-19 >>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>> Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> MainThread::WARNING::2017-06-19 >>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>> Error while monitoring engine: Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> MainThread::WARNING::2017-06-19 >>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>> Unexpected error >>>> Traceback (most recent call last): >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 443, in start_monitoring >>>> self._initialize_domain_monitor() >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 823, in _initialize_domain_monitor >>>> raise Exception(msg) >>>> Exception: Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> MainThread::ERROR::2017-06-19 >>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>> Shutting down the agent because of 3 failures in a row! >>>> >>>> From sanlock.log: >>>> >>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>> conflicts with name of list1 s5 >>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>> >>>> From the two other hosts: >>>> >>>> host 2: >>>> >>>> vdsm.log >>>> >>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>> Internal server error (__init__:570) >>>> Traceback (most recent call last): >>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>> 565, in _handle_request >>>> res = method(**params) >>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>> 202, in _dynamicMethod >>>> result = fn(*methodArgs) >>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>> 'current_values': v.getIoTune()} >>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>> result = self.getIoTuneResponse() >>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>> res = self._dom.blockIoTune( >>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>> 47, in __getattr__ >>>> % self.vmid) >>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>> started yet or was shut down >>>> >>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>> >>>> MainThread::INFO::2017-06-29 >>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>> MainThread::INFO::2017-06-29 >>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>> Extracting Engine VM OVF from the OVF_STORE >>>> MainThread::INFO::2017-06-29 >>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>> OVF_STORE volume path: >>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>> MainThread::INFO::2017-06-29 >>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Found an OVF for HE VM, trying to convert >>>> MainThread::INFO::2017-06-29 >>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Got vm.conf from OVF_STORE >>>> MainThread::INFO::2017-06-29 >>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>> MainThread::INFO::2017-06-29 >>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>> Current state EngineUnexpectedlyDown (score: 0) >>>> MainThread::INFO::2017-06-29 >>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>> Reloading vm.conf from the shared storage domain >>>> >>>> /var/log/messages: >>>> >>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>> >>>> >>>> host 1: >>>> >>>> /var/log/messages also in sanlock.log >>>> >>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>> >>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>> >>>> MainThread::ERROR::2017-06-27 >>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>> Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> MainThread::WARNING::2017-06-27 >>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>> Error while monitoring engine: Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> MainThread::WARNING::2017-06-27 >>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>> Unexpected error >>>> Traceback (most recent call last): >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 443, in start_monitoring >>>> self._initialize_domain_monitor() >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 823, in _initialize_domain_monitor >>>> raise Exception(msg) >>>> Exception: Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> MainThread::ERROR::2017-06-27 >>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>> Shutting down the agent because of 3 failures in a row! >>>> MainThread::INFO::2017-06-27 >>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>> VDSM domain monitor status: PENDING >>>> MainThread::INFO::2017-06-27 >>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>> Failed to stop monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>> MainThread::INFO::2017-06-27 >>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> Agent shutting down >>>> >>>> >>>> Thanks for any help, >>>> >>>> >>>> Cam >>>> >>>> >>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>> Hi Martin, >>>>> >>>>> yes, on two of the machines they have the same host_id. The other has >>>>> a different host_id. >>>>> >>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>> the other host (so all three hosts in the cluster now have it >>>>> installed). The second one I deployed said it was able to host the >>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>> with the Hosted Engine on it into maintenance to see if it would >>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>> now the host that said it was able to host the engine says >>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>> >>>>> The summary is: >>>>> >>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>> 'add_lockspace' fails in sanlock.log >>>>> >>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>> saying that it was able to host the Hosted Engine, but after migration >>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>> >>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>> not part of the original cluster. I restored the bare-metal engine >>>>> backup in the Hosted Engine on this host when deploying it, without >>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>> Sanlock log shows conflicts >>>>> >>>>> I will look through all the logs for any other errors. Please let me >>>>> know if you need any logs or other clarification/information. >>>>> >>>>> Thanks, >>>>> >>>>> Campbell >>>>> >>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>> Hi, >>>>>> >>>>>> can you please check the contents of >>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>> right now) and search for host-id? >>>>>> >>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>> >>>>>> Martin >>>>>> >>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>> >>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>> >>>>>>> The sanlock service is running. Why would this occur? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> C >>>>>>> >>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> Hi Martin, >>>>>>>> >>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>> migrate to another host. The >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>> reports: >>>>>>>> >>>>>>>> --------------------8<------------------- >>>>>>>> >>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>> High Availability Communications Broker... >>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>> Failed to read metadata from >>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>> Traceback (most >>>>>>>> recent call last): >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>> f = >>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>> OSError: [Errno 2] >>>>>>>> No such file or directory: >>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>> >>>>>>>> --------------------8<------------------- >>>>>>>> >>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>> this a san locking issue? >>>>>>>> >>>>>>>> Thanks for any help, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>> bare-metal to Hosted VM >>>>>>>>> >>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> >>>>>>>>> Martin Sivak >>>>>>>>> >>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>> maintenance, it reports: >>>>>>>>>> >>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>> >>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>> unknown stale-data'. >>>>>>>>>> >>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> Hi Tomas, >>>>>>>>>>> >>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>> engine VM, I have: >>>>>>>>>>> >>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>> >>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>> >>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>> > the following error? >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>> >>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>> VMs >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > >>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>> >> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Cam >>>>>>>>>>>>> >> >>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>

cmc

11:03 a.m.

...

Hi, please make sure that both ovirt-ha-agent and ovirt-ha-broker services are restarted and up. The error says the agent can't talk to the broker. Is there anything in the broker.log? Best regards Martin Sivak On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: > I've restarted those two services across all hosts, have taken the > Hosted Engine host out of maintenance, and when I try to migrate the > Hosted Engine over to another host, it reports that all three hosts > 'did not satisfy internal filter HA because it is not a Hosted Engine > host'. > > On the host that the Hosted Engine is currently on it reports in the agent.log: > > ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR > Connection closed: Connection closed > Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent > ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception > getting service path: Connection closed > Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent > call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 191, in _run_agent > return action(he) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 64, in action_proper > return > he.start_monitoring() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 411, in start_monitoring > self._initialize_sanlock() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 691, in _initialize_sanlock > > constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 162, in get_service_path > .format(str(e))) > RequestError: Failed > to get service path: Connection closed > Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent > > On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >> Hi, >> >> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >> >> The scheduling message just means that the host has score 0 or is not >> reporting score at all. >> >> Martin >> >> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>> Thanks Martin, do I have to restart anything? When I try to use the >>> 'migrate' operation, it complains that the other two hosts 'did not >>> satisfy internal filter HA because it is not a Hosted Engine host..' >>> (even though I reinstalled both these hosts with the 'deploy hosted >>> engine' option, which suggests that something needs restarting. Should >>> I worry about the sanlock errors, or will that be resolved by the >>> change in host_id? >>> >>> Kind regards, >>> >>> Cam >>> >>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>> Change the ids so they are distinct. I need to check if there is a way >>>> to read the SPM ids from the engine as using the same numbers would be >>>> the best. >>>> >>>> Martin >>>> >>>> >>>> >>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>> the best option be to take a backup of the Hosted Engine, and then >>>>> shut it down, re-initialise the SAN partition (or use another >>>>> partition) and retry the deployment? Would it be better to use the >>>>> older backup from the bare metal engine that I originally used, or use >>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>> added since switching to Hosted Engine. >>>>> >>>>> Unfortunately I have very little time left to get this working before >>>>> I have to hand it over for eval (by end of Friday). >>>>> >>>>> Here are some log snippets from the cluster that are current >>>>> >>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>> >>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>> 3) (clusterlock:282) >>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>> Error acquiring host id 3 for domain >>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>> Traceback (most recent call last): >>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>> self._manifest.acquireHostId(hostId, async) >>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>> self._domainLock.acquireHostId(hostId, async) >>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>> line 297, in acquireHostId >>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>> lockspace add failure', 'Invalid argument')) >>>>> >>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>> >>>>> MainThread::ERROR::2017-06-19 >>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>> Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> MainThread::WARNING::2017-06-19 >>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Error while monitoring engine: Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> MainThread::WARNING::2017-06-19 >>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Unexpected error >>>>> Traceback (most recent call last): >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 443, in start_monitoring >>>>> self._initialize_domain_monitor() >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 823, in _initialize_domain_monitor >>>>> raise Exception(msg) >>>>> Exception: Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> MainThread::ERROR::2017-06-19 >>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Shutting down the agent because of 3 failures in a row! >>>>> >>>>> From sanlock.log: >>>>> >>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>> conflicts with name of list1 s5 >>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>> >>>>> From the two other hosts: >>>>> >>>>> host 2: >>>>> >>>>> vdsm.log >>>>> >>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>> Internal server error (__init__:570) >>>>> Traceback (most recent call last): >>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>> 565, in _handle_request >>>>> res = method(**params) >>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>> 202, in _dynamicMethod >>>>> result = fn(*methodArgs) >>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>> 'current_values': v.getIoTune()} >>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>> result = self.getIoTuneResponse() >>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>> res = self._dom.blockIoTune( >>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>> 47, in __getattr__ >>>>> % self.vmid) >>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>> started yet or was shut down >>>>> >>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>> >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>> Extracting Engine VM OVF from the OVF_STORE >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>> OVF_STORE volume path: >>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Found an OVF for HE VM, trying to convert >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Got vm.conf from OVF_STORE >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>> MainThread::INFO::2017-06-29 >>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>> Reloading vm.conf from the shared storage domain >>>>> >>>>> /var/log/messages: >>>>> >>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>> >>>>> >>>>> host 1: >>>>> >>>>> /var/log/messages also in sanlock.log >>>>> >>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>> >>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>> >>>>> MainThread::ERROR::2017-06-27 >>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>> Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> MainThread::WARNING::2017-06-27 >>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Error while monitoring engine: Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> MainThread::WARNING::2017-06-27 >>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Unexpected error >>>>> Traceback (most recent call last): >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 443, in start_monitoring >>>>> self._initialize_domain_monitor() >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 823, in _initialize_domain_monitor >>>>> raise Exception(msg) >>>>> Exception: Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> MainThread::ERROR::2017-06-27 >>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>> Shutting down the agent because of 3 failures in a row! >>>>> MainThread::INFO::2017-06-27 >>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>> VDSM domain monitor status: PENDING >>>>> MainThread::INFO::2017-06-27 >>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>> Failed to stop monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>> MainThread::INFO::2017-06-27 >>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>> Agent shutting down >>>>> >>>>> >>>>> Thanks for any help, >>>>> >>>>> >>>>> Cam >>>>> >>>>> >>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Hi Martin, >>>>>> >>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>> a different host_id. >>>>>> >>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>> the other host (so all three hosts in the cluster now have it >>>>>> installed). The second one I deployed said it was able to host the >>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>> now the host that said it was able to host the engine says >>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>> >>>>>> The summary is: >>>>>> >>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>> 'add_lockspace' fails in sanlock.log >>>>>> >>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>> >>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>> Sanlock log shows conflicts >>>>>> >>>>>> I will look through all the logs for any other errors. Please let me >>>>>> know if you need any logs or other clarification/information. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Campbell >>>>>> >>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> can you please check the contents of >>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>> right now) and search for host-id? >>>>>>> >>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>> >>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>> >>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> C >>>>>>>> >>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> Hi Martin, >>>>>>>>> >>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>> migrate to another host. The >>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>> reports: >>>>>>>>> >>>>>>>>> --------------------8<------------------- >>>>>>>>> >>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>> High Availability Communications Broker... >>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>> Failed to read metadata from >>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>> Traceback (most >>>>>>>>> recent call last): >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>> f = >>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>> OSError: [Errno 2] >>>>>>>>> No such file or directory: >>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>> >>>>>>>>> --------------------8<------------------- >>>>>>>>> >>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>> this a san locking issue? >>>>>>>>> >>>>>>>>> Thanks for any help, >>>>>>>>> >>>>>>>>> Cam >>>>>>>>> >>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>> >>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>> >>>>>>>>>> Best regards >>>>>>>>>> >>>>>>>>>> Martin Sivak >>>>>>>>>> >>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>> maintenance, it reports: >>>>>>>>>>> >>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>> >>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>> unknown stale-data'. >>>>>>>>>>> >>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>> >>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>> >>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>> >>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>> >>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>> >>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>> VMs >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >>>>>>>>>>>>>

cmc

11:10 a.m.

...

Both services are up on all three hosts. The broke logs just report: Thread-6549::INFO::2017-06-29 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-6549::INFO::2017-06-29 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thanks, Cam On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: > Hi, > > please make sure that both ovirt-ha-agent and ovirt-ha-broker services > are restarted and up. The error says the agent can't talk to the > broker. Is there anything in the broker.log? > > Best regards > > Martin Sivak > > On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >> I've restarted those two services across all hosts, have taken the >> Hosted Engine host out of maintenance, and when I try to migrate the >> Hosted Engine over to another host, it reports that all three hosts >> 'did not satisfy internal filter HA because it is not a Hosted Engine >> host'. >> >> On the host that the Hosted Engine is currently on it reports in the agent.log: >> >> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >> Connection closed: Connection closed >> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >> getting service path: Connection closed >> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >> call last): >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> line 191, in _run_agent >> return action(he) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> line 64, in action_proper >> return >> he.start_monitoring() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 411, in start_monitoring >> self._initialize_sanlock() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> line 691, in _initialize_sanlock >> >> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 162, in get_service_path >> .format(str(e))) >> RequestError: Failed >> to get service path: Connection closed >> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >> >> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>> Hi, >>> >>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>> >>> The scheduling message just means that the host has score 0 or is not >>> reporting score at all. >>> >>> Martin >>> >>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Thanks Martin, do I have to restart anything? When I try to use the >>>> 'migrate' operation, it complains that the other two hosts 'did not >>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>> engine' option, which suggests that something needs restarting. Should >>>> I worry about the sanlock errors, or will that be resolved by the >>>> change in host_id? >>>> >>>> Kind regards, >>>> >>>> Cam >>>> >>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>> Change the ids so they are distinct. I need to check if there is a way >>>>> to read the SPM ids from the engine as using the same numbers would be >>>>> the best. >>>>> >>>>> Martin >>>>> >>>>> >>>>> >>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>> partition) and retry the deployment? Would it be better to use the >>>>>> older backup from the bare metal engine that I originally used, or use >>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>> added since switching to Hosted Engine. >>>>>> >>>>>> Unfortunately I have very little time left to get this working before >>>>>> I have to hand it over for eval (by end of Friday). >>>>>> >>>>>> Here are some log snippets from the cluster that are current >>>>>> >>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>> >>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>> 3) (clusterlock:282) >>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>> Error acquiring host id 3 for domain >>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>> Traceback (most recent call last): >>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>> self._manifest.acquireHostId(hostId, async) >>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>> line 297, in acquireHostId >>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>> lockspace add failure', 'Invalid argument')) >>>>>> >>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>> >>>>>> MainThread::ERROR::2017-06-19 >>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>> Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> MainThread::WARNING::2017-06-19 >>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> MainThread::WARNING::2017-06-19 >>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Unexpected error >>>>>> Traceback (most recent call last): >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 443, in start_monitoring >>>>>> self._initialize_domain_monitor() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 823, in _initialize_domain_monitor >>>>>> raise Exception(msg) >>>>>> Exception: Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> MainThread::ERROR::2017-06-19 >>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Shutting down the agent because of 3 failures in a row! >>>>>> >>>>>> From sanlock.log: >>>>>> >>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>> conflicts with name of list1 s5 >>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>> >>>>>> From the two other hosts: >>>>>> >>>>>> host 2: >>>>>> >>>>>> vdsm.log >>>>>> >>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>> Internal server error (__init__:570) >>>>>> Traceback (most recent call last): >>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>> 565, in _handle_request >>>>>> res = method(**params) >>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>> 202, in _dynamicMethod >>>>>> result = fn(*methodArgs) >>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>> 'current_values': v.getIoTune()} >>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>> result = self.getIoTuneResponse() >>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>> res = self._dom.blockIoTune( >>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>> 47, in __getattr__ >>>>>> % self.vmid) >>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>> started yet or was shut down >>>>>> >>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>> >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>> MainThread::INFO::2017-06-29 >>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> >>>>>> /var/log/messages: >>>>>> >>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>> >>>>>> >>>>>> host 1: >>>>>> >>>>>> /var/log/messages also in sanlock.log >>>>>> >>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>> >>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>> >>>>>> MainThread::ERROR::2017-06-27 >>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>> Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> MainThread::WARNING::2017-06-27 >>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> MainThread::WARNING::2017-06-27 >>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Unexpected error >>>>>> Traceback (most recent call last): >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 443, in start_monitoring >>>>>> self._initialize_domain_monitor() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 823, in _initialize_domain_monitor >>>>>> raise Exception(msg) >>>>>> Exception: Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> MainThread::ERROR::2017-06-27 >>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Shutting down the agent because of 3 failures in a row! >>>>>> MainThread::INFO::2017-06-27 >>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>> VDSM domain monitor status: PENDING >>>>>> MainThread::INFO::2017-06-27 >>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>> Failed to stop monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>> MainThread::INFO::2017-06-27 >>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>> Agent shutting down >>>>>> >>>>>> >>>>>> Thanks for any help, >>>>>> >>>>>> >>>>>> Cam >>>>>> >>>>>> >>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Hi Martin, >>>>>>> >>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>> a different host_id. >>>>>>> >>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>> installed). The second one I deployed said it was able to host the >>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>> now the host that said it was able to host the engine says >>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>> >>>>>>> The summary is: >>>>>>> >>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>> >>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>> >>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>> Sanlock log shows conflicts >>>>>>> >>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>> know if you need any logs or other clarification/information. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Campbell >>>>>>> >>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> can you please check the contents of >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>> right now) and search for host-id? >>>>>>>> >>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>> >>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>> >>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> C >>>>>>>>> >>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> Hi Martin, >>>>>>>>>> >>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>> migrate to another host. The >>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>> reports: >>>>>>>>>> >>>>>>>>>> --------------------8<------------------- >>>>>>>>>> >>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>> High Availability Communications Broker... >>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>> Failed to read metadata from >>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>> Traceback (most >>>>>>>>>> recent call last): >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>> f = >>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>> OSError: [Errno 2] >>>>>>>>>> No such file or directory: >>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>> >>>>>>>>>> --------------------8<------------------- >>>>>>>>>> >>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>> this a san locking issue? >>>>>>>>>> >>>>>>>>>> Thanks for any help, >>>>>>>>>> >>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>> >>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>> >>>>>>>>>>> Best regards >>>>>>>>>>> >>>>>>>>>>> Martin Sivak >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>> >>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>> >>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>> >>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>> >>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>> >>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>> >>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>> >>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >>>>>>>>>>>>>>

cmc

12:10 p.m.

...

Sorry, I am mistaken, two hosts failed for the agent with the following error: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Shutting down the agent because of 3 failures in a row! What could cause these timeouts? Some other service not running? On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: > Both services are up on all three hosts. The broke logs just report: > > Thread-6549::INFO::2017-06-29 > 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) > Connection established > Thread-6549::INFO::2017-06-29 > 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) > Connection closed > > Thanks, > > Cam > > On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >> Hi, >> >> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >> are restarted and up. The error says the agent can't talk to the >> broker. Is there anything in the broker.log? >> >> Best regards >> >> Martin Sivak >> >> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>> I've restarted those two services across all hosts, have taken the >>> Hosted Engine host out of maintenance, and when I try to migrate the >>> Hosted Engine over to another host, it reports that all three hosts >>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>> host'. >>> >>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>> >>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>> Connection closed: Connection closed >>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>> getting service path: Connection closed >>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>> call last): >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 191, in _run_agent >>> return action(he) >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 64, in action_proper >>> return >>> he.start_monitoring() >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 411, in start_monitoring >>> self._initialize_sanlock() >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 691, in _initialize_sanlock >>> >>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>> line 162, in get_service_path >>> .format(str(e))) >>> RequestError: Failed >>> to get service path: Connection closed >>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>> >>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>> Hi, >>>> >>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>> >>>> The scheduling message just means that the host has score 0 or is not >>>> reporting score at all. >>>> >>>> Martin >>>> >>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>> engine' option, which suggests that something needs restarting. Should >>>>> I worry about the sanlock errors, or will that be resolved by the >>>>> change in host_id? >>>>> >>>>> Kind regards, >>>>> >>>>> Cam >>>>> >>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>> the best. >>>>>> >>>>>> Martin >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>> added since switching to Hosted Engine. >>>>>>> >>>>>>> Unfortunately I have very little time left to get this working before >>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>> >>>>>>> Here are some log snippets from the cluster that are current >>>>>>> >>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>> >>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>> 3) (clusterlock:282) >>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>> Error acquiring host id 3 for domain >>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>> line 297, in acquireHostId >>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>> >>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>> Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-19 >>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-19 >>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Unexpected error >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 443, in start_monitoring >>>>>>> self._initialize_domain_monitor() >>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 823, in _initialize_domain_monitor >>>>>>> raise Exception(msg) >>>>>>> Exception: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>> >>>>>>> From sanlock.log: >>>>>>> >>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>> conflicts with name of list1 s5 >>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>> >>>>>>> From the two other hosts: >>>>>>> >>>>>>> host 2: >>>>>>> >>>>>>> vdsm.log >>>>>>> >>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>> Internal server error (__init__:570) >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>> 565, in _handle_request >>>>>>> res = method(**params) >>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>> 202, in _dynamicMethod >>>>>>> result = fn(*methodArgs) >>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>> 'current_values': v.getIoTune()} >>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>> result = self.getIoTuneResponse() >>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>> res = self._dom.blockIoTune( >>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>> 47, in __getattr__ >>>>>>> % self.vmid) >>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>> started yet or was shut down >>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>> >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>> OVF_STORE volume path: >>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>> Found an OVF for HE VM, trying to convert >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>> Got vm.conf from OVF_STORE >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>> Reloading vm.conf from the shared storage domain >>>>>>> >>>>>>> /var/log/messages: >>>>>>> >>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>> >>>>>>> >>>>>>> host 1: >>>>>>> >>>>>>> /var/log/messages also in sanlock.log >>>>>>> >>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>> >>>>>>> MainThread::ERROR::2017-06-27 >>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>> Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-27 >>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-27 >>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Unexpected error >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 443, in start_monitoring >>>>>>> self._initialize_domain_monitor() >>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 823, in _initialize_domain_monitor >>>>>>> raise Exception(msg) >>>>>>> Exception: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::ERROR::2017-06-27 >>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>> MainThread::INFO::2017-06-27 >>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>> VDSM domain monitor status: PENDING >>>>>>> MainThread::INFO::2017-06-27 >>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>> Failed to stop monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>> MainThread::INFO::2017-06-27 >>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>> Agent shutting down >>>>>>> >>>>>>> >>>>>>> Thanks for any help, >>>>>>> >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> Hi Martin, >>>>>>>> >>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>> a different host_id. >>>>>>>> >>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>> now the host that said it was able to host the engine says >>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>> >>>>>>>> The summary is: >>>>>>>> >>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>> >>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>> >>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>> Sanlock log shows conflicts >>>>>>>> >>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>> know if you need any logs or other clarification/information. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Campbell >>>>>>>> >>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> can you please check the contents of >>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>> right now) and search for host-id? >>>>>>>>> >>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>> >>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>> >>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> C >>>>>>>>>> >>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> Hi Martin, >>>>>>>>>>> >>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>> migrate to another host. The >>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>> reports: >>>>>>>>>>> >>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>> >>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>> Failed to read metadata from >>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>> Traceback (most >>>>>>>>>>> recent call last): >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>> f = >>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>> No such file or directory: >>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>> >>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>> >>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>> this a san locking issue? >>>>>>>>>>> >>>>>>>>>>> Thanks for any help, >>>>>>>>>>> >>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>> >>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>> >>>>>>>>>>>> Best regards >>>>>>>>>>>> >>>>>>>>>>>> Martin Sivak >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>> >>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>> >>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>> >>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>> >>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>> >>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>> >>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>

cmc

1:27 p.m.

...

Actually, it looks like sanlock problems: "SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit" On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote: > Sorry, I am mistaken, two hosts failed for the agent with the following error: > > ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine > ERROR Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine > ERROR Shutting down the agent because of 3 failures in a row! > > What could cause these timeouts? Some other service not running? > > On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: >> Both services are up on all three hosts. The broke logs just report: >> >> Thread-6549::INFO::2017-06-29 >> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >> Connection established >> Thread-6549::INFO::2017-06-29 >> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >> Connection closed >> >> Thanks, >> >> Cam >> >> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>> Hi, >>> >>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>> are restarted and up. The error says the agent can't talk to the >>> broker. Is there anything in the broker.log? >>> >>> Best regards >>> >>> Martin Sivak >>> >>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>>> I've restarted those two services across all hosts, have taken the >>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>> Hosted Engine over to another host, it reports that all three hosts >>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>> host'. >>>> >>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>> >>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>> Connection closed: Connection closed >>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>> getting service path: Connection closed >>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>> call last): >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 191, in _run_agent >>>> return action(he) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 64, in action_proper >>>> return >>>> he.start_monitoring() >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 411, in start_monitoring >>>> self._initialize_sanlock() >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 691, in _initialize_sanlock >>>> >>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>> line 162, in get_service_path >>>> .format(str(e))) >>>> RequestError: Failed >>>> to get service path: Connection closed >>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>> >>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>> Hi, >>>>> >>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>> >>>>> The scheduling message just means that the host has score 0 or is not >>>>> reporting score at all. >>>>> >>>>> Martin >>>>> >>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>> engine' option, which suggests that something needs restarting. Should >>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>> change in host_id? >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>> the best. >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>> added since switching to Hosted Engine. >>>>>>>> >>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>> >>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>> >>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>> >>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>> 3) (clusterlock:282) >>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>> Error acquiring host id 3 for domain >>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>> line 297, in acquireHostId >>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>> >>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>> >>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>> Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Unexpected error >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 443, in start_monitoring >>>>>>>> self._initialize_domain_monitor() >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>> raise Exception(msg) >>>>>>>> Exception: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>> >>>>>>>> From sanlock.log: >>>>>>>> >>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>> conflicts with name of list1 s5 >>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>> >>>>>>>> From the two other hosts: >>>>>>>> >>>>>>>> host 2: >>>>>>>> >>>>>>>> vdsm.log >>>>>>>> >>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>> Internal server error (__init__:570) >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>> 565, in _handle_request >>>>>>>> res = method(**params) >>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>> 202, in _dynamicMethod >>>>>>>> result = fn(*methodArgs) >>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>> 'current_values': v.getIoTune()} >>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>> result = self.getIoTuneResponse() >>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>> res = self._dom.blockIoTune( >>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>> 47, in __getattr__ >>>>>>>> % self.vmid) >>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>> started yet or was shut down >>>>>>>> >>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>> >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>> OVF_STORE volume path: >>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>> Got vm.conf from OVF_STORE >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>> >>>>>>>> /var/log/messages: >>>>>>>> >>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>> >>>>>>>> >>>>>>>> host 1: >>>>>>>> >>>>>>>> /var/log/messages also in sanlock.log >>>>>>>> >>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>> >>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>> >>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>> Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Unexpected error >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 443, in start_monitoring >>>>>>>> self._initialize_domain_monitor() >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>> raise Exception(msg) >>>>>>>> Exception: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>> VDSM domain monitor status: PENDING >>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>> Failed to stop monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>> Agent shutting down >>>>>>>> >>>>>>>> >>>>>>>> Thanks for any help, >>>>>>>> >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> Hi Martin, >>>>>>>>> >>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>> a different host_id. >>>>>>>>> >>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>> >>>>>>>>> The summary is: >>>>>>>>> >>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>> >>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>> >>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>> Sanlock log shows conflicts >>>>>>>>> >>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Campbell >>>>>>>>> >>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> can you please check the contents of >>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>> right now) and search for host-id? >>>>>>>>>> >>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>> >>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>> >>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> C >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> Hi Martin, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>> reports: >>>>>>>>>>>> >>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>> >>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>> Traceback (most >>>>>>>>>>>> recent call last): >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>> f = >>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>> No such file or directory: >>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>> >>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>> >>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>> >>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>> >>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards >>>>>>>>>>>>> >>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>

Martin Sivak

Friday, 30 June Fri, 30 Jun

3:47 a.m.

...

Tried running a 'hosted-engine --clean-metadata" as per https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since ovirt-ha-agent was not running anyway, but it fails with the following error: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 67, in action_clean return he.clean(options.force_cleanup) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 345, in clean self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 823, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug. INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote: > Actually, it looks like sanlock problems: > > "SanlockInitializationError: Failed to initialize sanlock, the > number of errors has exceeded the limit" > > > > On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote: >> Sorry, I am mistaken, two hosts failed for the agent with the following error: >> >> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >> ERROR Failed to start monitoring domain >> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >> during domain acquisition >> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >> ERROR Shutting down the agent because of 3 failures in a row! >> >> What could cause these timeouts? Some other service not running? >> >> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: >>> Both services are up on all three hosts. The broke logs just report: >>> >>> Thread-6549::INFO::2017-06-29 >>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>> Connection established >>> Thread-6549::INFO::2017-06-29 >>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>> Connection closed >>> >>> Thanks, >>> >>> Cam >>> >>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>> Hi, >>>> >>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>> are restarted and up. The error says the agent can't talk to the >>>> broker. Is there anything in the broker.log? >>>> >>>> Best regards >>>> >>>> Martin Sivak >>>> >>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> I've restarted those two services across all hosts, have taken the >>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>> Hosted Engine over to another host, it reports that all three hosts >>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>> host'. >>>>> >>>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>>> >>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>> Connection closed: Connection closed >>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>> getting service path: Connection closed >>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>> call last): >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>> line 191, in _run_agent >>>>> return action(he) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>> line 64, in action_proper >>>>> return >>>>> he.start_monitoring() >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 411, in start_monitoring >>>>> self._initialize_sanlock() >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 691, in _initialize_sanlock >>>>> >>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>> line 162, in get_service_path >>>>> .format(str(e))) >>>>> RequestError: Failed >>>>> to get service path: Connection closed >>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>>> >>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>> Hi, >>>>>> >>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>>> >>>>>> The scheduling message just means that the host has score 0 or is not >>>>>> reporting score at all. >>>>>> >>>>>> Martin >>>>>> >>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>> engine' option, which suggests that something needs restarting. Should >>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>> change in host_id? >>>>>>> >>>>>>> Kind regards, >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>>> the best. >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>>> added since switching to Hosted Engine. >>>>>>>>> >>>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>> >>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>> >>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>> >>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>> 3) (clusterlock:282) >>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>> Traceback (most recent call last): >>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>> line 297, in acquireHostId >>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>> >>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>> >>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>> Failed to start monitoring domain >>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>> during domain acquisition >>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>> during domain acquisition >>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>> Unexpected error >>>>>>>>> Traceback (most recent call last): >>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 443, in start_monitoring >>>>>>>>> self._initialize_domain_monitor() >>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>> raise Exception(msg) >>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>> during domain acquisition >>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>> >>>>>>>>> From sanlock.log: >>>>>>>>> >>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>> conflicts with name of list1 s5 >>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>> >>>>>>>>> From the two other hosts: >>>>>>>>> >>>>>>>>> host 2: >>>>>>>>> >>>>>>>>> vdsm.log >>>>>>>>> >>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>>> Internal server error (__init__:570) >>>>>>>>> Traceback (most recent call last): >>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>>> 565, in _handle_request >>>>>>>>> res = method(**params) >>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>>> 202, in _dynamicMethod >>>>>>>>> result = fn(*methodArgs) >>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>> 47, in __getattr__ >>>>>>>>> % self.vmid) >>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>>> started yet or was shut down >>>>>>>>> >>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>> >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>> OVF_STORE volume path: >>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>> >>>>>>>>> /var/log/messages: >>>>>>>>> >>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>>> >>>>>>>>> >>>>>>>>> host 1: >>>>>>>>> >>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>> >>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>> >>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>> >>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>> Failed to start monitoring domain >>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>> during domain acquisition >>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>> during domain acquisition >>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>> Unexpected error >>>>>>>>> Traceback (most recent call last): >>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 443, in start_monitoring >>>>>>>>> self._initialize_domain_monitor() >>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>> raise Exception(msg) >>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>> during domain acquisition >>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>> Failed to stop monitoring domain >>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>> Agent shutting down >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for any help, >>>>>>>>> >>>>>>>>> >>>>>>>>> Cam >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> Hi Martin, >>>>>>>>>> >>>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>>> a different host_id. >>>>>>>>>> >>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>>> >>>>>>>>>> The summary is: >>>>>>>>>> >>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>> >>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>>> >>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>> >>>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>> >>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> can you please check the contents of >>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>> >>>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>>> >>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>>> >>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> C >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>>> reports: >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>> Traceback (most >>>>>>>>>>>>> recent call last): >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>> f = >>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>> >>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>> >>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>

cmc

5:45 a.m.

Ok, Thanks Martin. It should be feasible to get all VMs onto one host, so I can do that (unless you recommend just shutting the entire cluster down at once?). For the engine, I'll shut it down since it won't migrate to another host, before shutting that host down. Will let you know how it goes. Thanks, Cam On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak(a)redhat.com> wrote:

...

Hi, cleaning metadata won't help in this case. Try transferring the spm_ids you got from the engine to the proper hosted engine hosts so the hosted engine ids match the spm_ids. Then restart all hosted engine services. I would actually recommend restarting all hosts after this change, but I have no idea how many VMs you have running. Martin On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu(a)gmail.com> wrote: > Tried running a 'hosted-engine --clean-metadata" as per > https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since > ovirt-ha-agent was not running anyway, but it fails with the following > error: > > ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed > to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent > call last): > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 191, in _run_agent > return action(he) > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 67, in action_clean > return he.clean(options.force_cleanup) > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 345, in clean > self._initialize_domain_monitor() > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 823, in _initialize_domain_monitor > raise Exception(msg) > Exception: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent > WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' > ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors > occurred, giving up. Please review the log and consider filing a bug. > INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down > > On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote: >> Actually, it looks like sanlock problems: >> >> "SanlockInitializationError: Failed to initialize sanlock, the >> number of errors has exceeded the limit" >> >> >> >> On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote: >>> Sorry, I am mistaken, two hosts failed for the agent with the following error: >>> >>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>> ERROR Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>> ERROR Shutting down the agent because of 3 failures in a row! >>> >>> What could cause these timeouts? Some other service not running? >>> >>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Both services are up on all three hosts. The broke logs just report: >>>> >>>> Thread-6549::INFO::2017-06-29 >>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>> Connection established >>>> Thread-6549::INFO::2017-06-29 >>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>> Connection closed >>>> >>>> Thanks, >>>> >>>> Cam >>>> >>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>> Hi, >>>>> >>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>>> are restarted and up. The error says the agent can't talk to the >>>>> broker. Is there anything in the broker.log? >>>>> >>>>> Best regards >>>>> >>>>> Martin Sivak >>>>> >>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> I've restarted those two services across all hosts, have taken the >>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>>> host'. >>>>>> >>>>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>>>> >>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>>> Connection closed: Connection closed >>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>> getting service path: Connection closed >>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>>> call last): >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 191, in _run_agent >>>>>> return action(he) >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 64, in action_proper >>>>>> return >>>>>> he.start_monitoring() >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 411, in start_monitoring >>>>>> self._initialize_sanlock() >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 691, in _initialize_sanlock >>>>>> >>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>> line 162, in get_service_path >>>>>> .format(str(e))) >>>>>> RequestError: Failed >>>>>> to get service path: Connection closed >>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>>>> >>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>>>> >>>>>>> The scheduling message just means that the host has score 0 or is not >>>>>>> reporting score at all. >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>>> engine' option, which suggests that something needs restarting. Should >>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>> change in host_id? >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>>>> the best. >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>> >>>>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>> >>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>> >>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>>> >>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>> line 297, in acquireHostId >>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>> >>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>> >>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>> Failed to start monitoring domain >>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>> during domain acquisition >>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>> during domain acquisition >>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>> Unexpected error >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 443, in start_monitoring >>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>> raise Exception(msg) >>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>> during domain acquisition >>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>> >>>>>>>>>> From sanlock.log: >>>>>>>>>> >>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>> >>>>>>>>>> From the two other hosts: >>>>>>>>>> >>>>>>>>>> host 2: >>>>>>>>>> >>>>>>>>>> vdsm.log >>>>>>>>>> >>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>>>> 565, in _handle_request >>>>>>>>>> res = method(**params) >>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>>> 47, in __getattr__ >>>>>>>>>> % self.vmid) >>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>>>> started yet or was shut down >>>>>>>>>> >>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>> >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>> OVF_STORE volume path: >>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>> >>>>>>>>>> /var/log/messages: >>>>>>>>>> >>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> host 1: >>>>>>>>>> >>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>> >>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>> >>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>> >>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>> Failed to start monitoring domain >>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>> during domain acquisition >>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>> during domain acquisition >>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>> Unexpected error >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 443, in start_monitoring >>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>> raise Exception(msg) >>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>> during domain acquisition >>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>> Agent shutting down >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for any help, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> Hi Martin, >>>>>>>>>>> >>>>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>>>> a different host_id. >>>>>>>>>>> >>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>>>> >>>>>>>>>>> The summary is: >>>>>>>>>>> >>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>> >>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>>>> >>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>> >>>>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Campbell >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>> >>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>>>> >>>>>>>>>>>> Martin >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>>>> >>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> C >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>>>> reports: >>>>>>>>>>>>>> >>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>> Traceback (most >>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>> f = >>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>> >>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>

cmc

5:47 a.m.

Just to clarify: you mean the host_id in /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, correct? On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak(a)redhat.com> wrote:

...

Martin Sivak

6:12 a.m.

Hi,

...

Just to clarify: you mean the host_id in /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, correct?

Exactly. Put the cluster to global maintenance first. Or kill all agents (has the same effect). Martin On Fri, Jun 30, 2017 at 12:47 PM, cmc <iucounu(a)gmail.com> wrote:

...

Just to clarify: you mean the host_id in /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, correct?

> > On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak(a)redhat.com> wrote: >> Hi, >> >> cleaning metadata won't help in this case. Try transferring the >> spm_ids you got from the engine to the proper hosted engine hosts so >> the hosted engine ids match the spm_ids. Then restart all hosted >> engine services. I would actually recommend restarting all hosts after >> this change, but I have no idea how many VMs you have running. >> >> Martin >> >> On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu(a)gmail.com> wrote: >>> Tried running a 'hosted-engine --clean-metadata" as per >>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since >>> ovirt-ha-agent was not running anyway, but it fails with the following >>> error: >>> >>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed >>> to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent >>> call last): >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 191, in _run_agent >>> return action(he) >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 67, in action_clean >>> return he.clean(options.force_cleanup) >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 345, in clean >>> self._initialize_domain_monitor() >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 823, in _initialize_domain_monitor >>> raise Exception(msg) >>> Exception: Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent >>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' >>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors >>> occurred, giving up. Please review the log and consider filing a bug. >>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>> >>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Actually, it looks like sanlock problems: >>>> >>>> "SanlockInitializationError: Failed to initialize sanlock, the >>>> number of errors has exceeded the limit" >>>> >>>> >>>> >>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Sorry, I am mistaken, two hosts failed for the agent with the following error: >>>>> >>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>> ERROR Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>> ERROR Shutting down the agent because of 3 failures in a row! >>>>> >>>>> What could cause these timeouts? Some other service not running? >>>>> >>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Both services are up on all three hosts. The broke logs just report: >>>>>> >>>>>> Thread-6549::INFO::2017-06-29 >>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>>> Connection established >>>>>> Thread-6549::INFO::2017-06-29 >>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>>> Connection closed >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>>>>> are restarted and up. The error says the agent can't talk to the >>>>>>> broker. Is there anything in the broker.log? >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Martin Sivak >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> I've restarted those two services across all hosts, have taken the >>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>>>>> host'. >>>>>>>> >>>>>>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>>>>>> >>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>>>>> Connection closed: Connection closed >>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>>>> getting service path: Connection closed >>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>>>>> call last): >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>> line 191, in _run_agent >>>>>>>> return action(he) >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>> line 64, in action_proper >>>>>>>> return >>>>>>>> he.start_monitoring() >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 411, in start_monitoring >>>>>>>> self._initialize_sanlock() >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 691, in _initialize_sanlock >>>>>>>> >>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>>> line 162, in get_service_path >>>>>>>> .format(str(e))) >>>>>>>> RequestError: Failed >>>>>>>> to get service path: Connection closed >>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>>>>>> >>>>>>>>> The scheduling message just means that the host has score 0 or is not >>>>>>>>> reporting score at all. >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>>>>> engine' option, which suggests that something needs restarting. Should >>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>>>> change in host_id? >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> >>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>>>>>> the best. >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>>>> >>>>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>>>> >>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>>>>> >>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>>>> line 297, in acquireHostId >>>>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>>>> >>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>>>> >>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Unexpected error >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>> >>>>>>>>>>>> From sanlock.log: >>>>>>>>>>>> >>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>> >>>>>>>>>>>> From the two other hosts: >>>>>>>>>>>> >>>>>>>>>>>> host 2: >>>>>>>>>>>> >>>>>>>>>>>> vdsm.log >>>>>>>>>>>> >>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>>>>>> 565, in _handle_request >>>>>>>>>>>> res = method(**params) >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>>>>> 47, in __getattr__ >>>>>>>>>>>> % self.vmid) >>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>>>>>> started yet or was shut down >>>>>>>>>>>> >>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>>>> >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>> OVF_STORE volume path: >>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>>>> >>>>>>>>>>>> /var/log/messages: >>>>>>>>>>>> >>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> host 1: >>>>>>>>>>>> >>>>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>>>> >>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>>>> >>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>>>> >>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Unexpected error >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>>>> Agent shutting down >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>> >>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>>>>>> a different host_id. >>>>>>>>>>>>> >>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>>>>>> >>>>>>>>>>>>> The summary is: >>>>>>>>>>>>> >>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>>>> >>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>>>>>> >>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>>>> >>>>>>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Campbell >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Martin >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> C >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>>>> Traceback (most >>>>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>>>> f = >>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>

cmc

7 a.m.

...

Hi, > Just to clarify: you mean the host_id in > /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, > correct? Exactly. Put the cluster to global maintenance first. Or kill all agents (has the same effect). Martin On Fri, Jun 30, 2017 at 12:47 PM, cmc <iucounu(a)gmail.com> wrote: > Just to clarify: you mean the host_id in > /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, > correct? > > On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak(a)redhat.com> wrote: >> Hi, >> >> cleaning metadata won't help in this case. Try transferring the >> spm_ids you got from the engine to the proper hosted engine hosts so >> the hosted engine ids match the spm_ids. Then restart all hosted >> engine services. I would actually recommend restarting all hosts after >> this change, but I have no idea how many VMs you have running. >> >> Martin >> >> On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu(a)gmail.com> wrote: >>> Tried running a 'hosted-engine --clean-metadata" as per >>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since >>> ovirt-ha-agent was not running anyway, but it fails with the following >>> error: >>> >>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed >>> to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent >>> call last): >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 191, in _run_agent >>> return action(he) >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 67, in action_clean >>> return he.clean(options.force_cleanup) >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 345, in clean >>> self._initialize_domain_monitor() >>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 823, in _initialize_domain_monitor >>> raise Exception(msg) >>> Exception: Failed to start monitoring domain >>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>> during domain acquisition >>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent >>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' >>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors >>> occurred, giving up. Please review the log and consider filing a bug. >>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>> >>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Actually, it looks like sanlock problems: >>>> >>>> "SanlockInitializationError: Failed to initialize sanlock, the >>>> number of errors has exceeded the limit" >>>> >>>> >>>> >>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Sorry, I am mistaken, two hosts failed for the agent with the following error: >>>>> >>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>> ERROR Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>> ERROR Shutting down the agent because of 3 failures in a row! >>>>> >>>>> What could cause these timeouts? Some other service not running? >>>>> >>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Both services are up on all three hosts. The broke logs just report: >>>>>> >>>>>> Thread-6549::INFO::2017-06-29 >>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>>> Connection established >>>>>> Thread-6549::INFO::2017-06-29 >>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>>> Connection closed >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>>>>> are restarted and up. The error says the agent can't talk to the >>>>>>> broker. Is there anything in the broker.log? >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Martin Sivak >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> I've restarted those two services across all hosts, have taken the >>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>>>>> host'. >>>>>>>> >>>>>>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>>>>>> >>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>>>>> Connection closed: Connection closed >>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>>>> getting service path: Connection closed >>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>>>>> call last): >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>> line 191, in _run_agent >>>>>>>> return action(he) >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>> line 64, in action_proper >>>>>>>> return >>>>>>>> he.start_monitoring() >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 411, in start_monitoring >>>>>>>> self._initialize_sanlock() >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 691, in _initialize_sanlock >>>>>>>> >>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>>>> File >>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>>> line 162, in get_service_path >>>>>>>> .format(str(e))) >>>>>>>> RequestError: Failed >>>>>>>> to get service path: Connection closed >>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>>>>>> >>>>>>>>> The scheduling message just means that the host has score 0 or is not >>>>>>>>> reporting score at all. >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>>>>> engine' option, which suggests that something needs restarting. Should >>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>>>> change in host_id? >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> >>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>>>>>> the best. >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>>>> >>>>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>>>> >>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>>>>> >>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>>>> line 297, in acquireHostId >>>>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>>>> >>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>>>> >>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Unexpected error >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>> >>>>>>>>>>>> From sanlock.log: >>>>>>>>>>>> >>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>> >>>>>>>>>>>> From the two other hosts: >>>>>>>>>>>> >>>>>>>>>>>> host 2: >>>>>>>>>>>> >>>>>>>>>>>> vdsm.log >>>>>>>>>>>> >>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>>>>>> 565, in _handle_request >>>>>>>>>>>> res = method(**params) >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>>>>> 47, in __getattr__ >>>>>>>>>>>> % self.vmid) >>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>>>>>> started yet or was shut down >>>>>>>>>>>> >>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>>>> >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>> OVF_STORE volume path: >>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>>>> >>>>>>>>>>>> /var/log/messages: >>>>>>>>>>>> >>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> host 1: >>>>>>>>>>>> >>>>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>>>> >>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>>>> >>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>>>> >>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Unexpected error >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>> during domain acquisition >>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>>>> Agent shutting down >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>> >>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>>>>>> a different host_id. >>>>>>>>>>>>> >>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>>>>>> >>>>>>>>>>>>> The summary is: >>>>>>>>>>>>> >>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>>>> >>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>>>>>> >>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>>>> >>>>>>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Campbell >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Martin >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> C >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>>>> Traceback (most >>>>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>>>> f = >>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>

cmc

9:19 a.m.

...

So I can run from any node: hosted-engine --set-maintenance --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This shouldn't affect the running of any VMs, correct? Sorry for the questions, just want to do it correctly and not make assumptions :) Cheers, C On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <msivak(a)redhat.com> wrote: > Hi, > >> Just to clarify: you mean the host_id in >> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >> correct? > > Exactly. > > Put the cluster to global maintenance first. Or kill all agents (has > the same effect). > > Martin > > On Fri, Jun 30, 2017 at 12:47 PM, cmc <iucounu(a)gmail.com> wrote: >> Just to clarify: you mean the host_id in >> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >> correct? >> >> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>> Hi, >>> >>> cleaning metadata won't help in this case. Try transferring the >>> spm_ids you got from the engine to the proper hosted engine hosts so >>> the hosted engine ids match the spm_ids. Then restart all hosted >>> engine services. I would actually recommend restarting all hosts after >>> this change, but I have no idea how many VMs you have running. >>> >>> Martin >>> >>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>> Tried running a 'hosted-engine --clean-metadata" as per >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since >>>> ovirt-ha-agent was not running anyway, but it fails with the following >>>> error: >>>> >>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed >>>> to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent >>>> call last): >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 191, in _run_agent >>>> return action(he) >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 67, in action_clean >>>> return he.clean(options.force_cleanup) >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 345, in clean >>>> self._initialize_domain_monitor() >>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 823, in _initialize_domain_monitor >>>> raise Exception(msg) >>>> Exception: Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent >>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' >>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors >>>> occurred, giving up. Please review the log and consider filing a bug. >>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>>> >>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Actually, it looks like sanlock problems: >>>>> >>>>> "SanlockInitializationError: Failed to initialize sanlock, the >>>>> number of errors has exceeded the limit" >>>>> >>>>> >>>>> >>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Sorry, I am mistaken, two hosts failed for the agent with the following error: >>>>>> >>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>> ERROR Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>> ERROR Shutting down the agent because of 3 failures in a row! >>>>>> >>>>>> What could cause these timeouts? Some other service not running? >>>>>> >>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Both services are up on all three hosts. The broke logs just report: >>>>>>> >>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>>>> Connection established >>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>>>> Connection closed >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>>>>>> are restarted and up. The error says the agent can't talk to the >>>>>>>> broker. Is there anything in the broker.log? >>>>>>>> >>>>>>>> Best regards >>>>>>>> >>>>>>>> Martin Sivak >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>> I've restarted those two services across all hosts, have taken the >>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>>>>>> host'. >>>>>>>>> >>>>>>>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>>>>>>> >>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>>>>>> Connection closed: Connection closed >>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>>>>> getting service path: Connection closed >>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>>>>>> call last): >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>> line 191, in _run_agent >>>>>>>>> return action(he) >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>> line 64, in action_proper >>>>>>>>> return >>>>>>>>> he.start_monitoring() >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 411, in start_monitoring >>>>>>>>> self._initialize_sanlock() >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 691, in _initialize_sanlock >>>>>>>>> >>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>>>> line 162, in get_service_path >>>>>>>>> .format(str(e))) >>>>>>>>> RequestError: Failed >>>>>>>>> to get service path: Connection closed >>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>>>>>>> >>>>>>>>>> The scheduling message just means that the host has score 0 or is not >>>>>>>>>> reporting score at all. >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>>>>>> engine' option, which suggests that something needs restarting. Should >>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>>>>> change in host_id? >>>>>>>>>>> >>>>>>>>>>> Kind regards, >>>>>>>>>>> >>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>>>>>>> the best. >>>>>>>>>>>> >>>>>>>>>>>> Martin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>>>>> >>>>>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>>>>> >>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>>>>> line 297, in acquireHostId >>>>>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>>>>> >>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>>>>> >>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>> >>>>>>>>>>>>> From sanlock.log: >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>> >>>>>>>>>>>>> From the two other hosts: >>>>>>>>>>>>> >>>>>>>>>>>>> host 2: >>>>>>>>>>>>> >>>>>>>>>>>>> vdsm.log >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>>>>>>> 565, in _handle_request >>>>>>>>>>>>> res = method(**params) >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>>>>>> 47, in __getattr__ >>>>>>>>>>>>> % self.vmid) >>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>>>>>>> started yet or was shut down >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>>>>> >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>> OVF_STORE volume path: >>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/messages: >>>>>>>>>>>>> >>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> host 1: >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>>>>> >>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>>>>> >>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>>>>> Agent shutting down >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>> >>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>>>>>>> a different host_id. >>>>>>>>>>>>>> >>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The summary is: >>>>>>>>>>>>>> >>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>>>>> >>>>>>>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Campbell >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> C >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>>>>> Traceback (most >>>>>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>>>>> f = >>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>

cmc

10:01 a.m.

I've had no other choice but to power up the old bare metal engine to be able to start the VMs. This is probably really bad but I had to get the VMs running. I am guessing now that if the host is shutdown rather than simply rebooted, that the VMs will not restart on powerup of the host. This would not have been such a problem if the Hosted Engine started. So I'm not sure where to go from here... I guess it is start from scratch again? On Fri, Jun 30, 2017 at 3:19 PM, cmc <iucounu(a)gmail.com> wrote:

...

Help! I put the cluster into global maintenance, then powered off and then on all of the nodes I have powered off and powered on all the nodes. I have taken it out of global maintenance. No VM has started, including the hosted engine. This is very bad. I am going to look through logs to see why nothing has started. Help greatly appreciated. Thanks, Cam On Fri, Jun 30, 2017 at 1:00 PM, cmc <iucounu(a)gmail.com> wrote: > So I can run from any node: hosted-engine --set-maintenance > --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This > shouldn't affect the running of any VMs, correct? Sorry for the > questions, just want to do it correctly and not make assumptions :) > > Cheers, > > C > > On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <msivak(a)redhat.com> wrote: >> Hi, >> >>> Just to clarify: you mean the host_id in >>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >>> correct? >> >> Exactly. >> >> Put the cluster to global maintenance first. Or kill all agents (has >> the same effect). >> >> Martin >> >> On Fri, Jun 30, 2017 at 12:47 PM, cmc <iucounu(a)gmail.com> wrote: >>> Just to clarify: you mean the host_id in >>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >>> correct? >>> >>> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>> Hi, >>>> >>>> cleaning metadata won't help in this case. Try transferring the >>>> spm_ids you got from the engine to the proper hosted engine hosts so >>>> the hosted engine ids match the spm_ids. Then restart all hosted >>>> engine services. I would actually recommend restarting all hosts after >>>> this change, but I have no idea how many VMs you have running. >>>> >>>> Martin >>>> >>>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>> Tried running a 'hosted-engine --clean-metadata" as per >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since >>>>> ovirt-ha-agent was not running anyway, but it fails with the following >>>>> error: >>>>> >>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed >>>>> to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent >>>>> call last): >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>> line 191, in _run_agent >>>>> return action(he) >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>> line 67, in action_clean >>>>> return he.clean(options.force_cleanup) >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 345, in clean >>>>> self._initialize_domain_monitor() >>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 823, in _initialize_domain_monitor >>>>> raise Exception(msg) >>>>> Exception: Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent >>>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' >>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors >>>>> occurred, giving up. Please review the log and consider filing a bug. >>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>>>> >>>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>> Actually, it looks like sanlock problems: >>>>>> >>>>>> "SanlockInitializationError: Failed to initialize sanlock, the >>>>>> number of errors has exceeded the limit" >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>> Sorry, I am mistaken, two hosts failed for the agent with the following error: >>>>>>> >>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>>> ERROR Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>>> ERROR Shutting down the agent because of 3 failures in a row! >>>>>>> >>>>>>> What could cause these timeouts? Some other service not running? >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>> Both services are up on all three hosts. The broke logs just report: >>>>>>>> >>>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>>>>> Connection established >>>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>>>>> Connection closed >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>>>>>>> are restarted and up. The error says the agent can't talk to the >>>>>>>>> broker. Is there anything in the broker.log? >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> >>>>>>>>> Martin Sivak >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>> I've restarted those two services across all hosts, have taken the >>>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>>>>>>> host'. >>>>>>>>>> >>>>>>>>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>>>>>>>> >>>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>>>>>>> Connection closed: Connection closed >>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>>>>>> getting service path: Connection closed >>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>>>>>>> call last): >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>>> line 191, in _run_agent >>>>>>>>>> return action(he) >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>>> line 64, in action_proper >>>>>>>>>> return >>>>>>>>>> he.start_monitoring() >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 411, in start_monitoring >>>>>>>>>> self._initialize_sanlock() >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 691, in _initialize_sanlock >>>>>>>>>> >>>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>>>>> line 162, in get_service_path >>>>>>>>>> .format(str(e))) >>>>>>>>>> RequestError: Failed >>>>>>>>>> to get service path: Connection closed >>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>>>>>>>> >>>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>>>>>>>> >>>>>>>>>>> The scheduling message just means that the host has score 0 or is not >>>>>>>>>>> reporting score at all. >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>>>>>>> engine' option, which suggests that something needs restarting. Should >>>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>>>>>> change in host_id? >>>>>>>>>>>> >>>>>>>>>>>> Kind regards, >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>>>>>>>> the best. >>>>>>>>>>>>> >>>>>>>>>>>>> Martin >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>>>>>> >>>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>>>>>> line 297, in acquireHostId >>>>>>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>>>>>> >>>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>>>>>> >>>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>>> >>>>>>>>>>>>>> From sanlock.log: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> From the two other hosts: >>>>>>>>>>>>>> >>>>>>>>>>>>>> host 2: >>>>>>>>>>>>>> >>>>>>>>>>>>>> vdsm.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>>>>>>>> 565, in _handle_request >>>>>>>>>>>>>> res = method(**params) >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>>>>>>> 47, in __getattr__ >>>>>>>>>>>>>> % self.vmid) >>>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>>>>>>>> started yet or was shut down >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>> OVF_STORE volume path: >>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/messages: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> host 1: >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>>>>>> >>>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>>>>>> Agent shutting down >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>>>>>>>> a different host_id. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The summary is: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Campbell >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> C >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>>>>>> Traceback (most >>>>>>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>>>>>> f = >>>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>>>>>> <michal.skrivanek(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak(a)redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak(a)redhat.com> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu(a)gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar(a)redhat.com> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M...). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu(a)gmail.com> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>>>>>> > Users(a)ovirt.org >>>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>

Denis Chaplygin

10:30 a.m.

Hello! On Fri, Jun 30, 2017 at 4:19 PM, cmc <iucounu(a)gmail.com> wrote:

...

Global maintenance mode turns off high availability for the hosted engine vm. You should either cancel global maintenance or start vm manually with hosted-engine --vm-start Global maintenance was added to allow manual maintenance of the engine VM, so in that mode state of the engine VM and engine itself is not managed and you a free to stop engine or vm or both, do whatever you like and hosted engine tools will not interfere. Obviously when engine VM just dies while cluster is in global maintenance (or all nodes reboot, as in your case) there is no one to restart it :)

cmc

10:46 a.m.

I ran 'hosted-engine --vm-start' after trying to ping the engine and running 'hosted-engine --vm-status' (which said it wasn't running) and it reported that it was 'destroying storage' and starting the engine, though it did not start it. I could not see any evidence from 'hosted-engine --vm-status' or logs that it started. By this point I was in a panic to get VMs running. So I had to fire up the old bare metal engine. This has been a very disappointing experience. I still have no idea why the IDs in 'host_id' differed from the spm ID, and why, when I put the cluster into global maintenance and shutdown all the hosts, the Hosted Engine did not come up, nor any of the VMs. I don't feel confident in this any more. If I try the deploying the Hosted Engine again I am not sure if it will result in the same non-functional cluster. It gave no error on deployment, but clearly something was wrong. I have two questions: 1. Why did the VMs (apart from the Hosted Engine VM) not start on power up of the hosts? Is it because the hosts were powered down, that they stay in a down state on power up of the host? 2. Now that I have connected the bare metal engine back to the cluster, is there a way back, or do I have to start from scratch again? I imagine there is no way of getting the Hosted Engine running again. If not, what do I need to 'clean' all the hosts of the remnants of the failed deployment? I can of course reinitialise the LUN that the Hosted Engine was on - anything else? Thanks On Fri, Jun 30, 2017 at 4:30 PM, Denis Chaplygin <dchaplyg(a)redhat.com> wrote:

...

Hello! On Fri, Jun 30, 2017 at 4:19 PM, cmc <iucounu(a)gmail.com> wrote: > > Help! I put the cluster into global maintenance, then powered off and > then on all of the nodes I have powered off and powered on all the > nodes. I have taken it out of global maintenance. No VM has started, > including the hosted engine. This is very bad. I am going to look > through logs to see why nothing has started. Help greatly appreciated. Global maintenance mode turns off high availability for the hosted engine vm. You should either cancel global maintenance or start vm manually with hosted-engine --vm-start Global maintenance was added to allow manual maintenance of the engine VM, so in that mode state of the engine VM and engine itself is not managed and you a free to stop engine or vm or both, do whatever you like and hosted engine tools will not interfere. Obviously when engine VM just dies while cluster is in global maintenance (or all nodes reboot, as in your case) there is no one to restart it :)

Denis Chaplygin

11:03 a.m.

Hello! On Fri, Jun 30, 2017 at 5:46 PM, cmc <iucounu(a)gmail.com> wrote:

...

That sound really strange. I would suspect some storage problems or something. As i told you earlier, output of --vm-status may shed light on that issue.

...

By this point I was in a panic to get VMs running. So I had to fire up the old bare metal engine. This has been a very disappointing experience. I still have no idea why the IDs in 'host_id' differed from the spm ID, and

Did you tried to migrate form bare metal engine to the hosted engine?

...

1. Why did the VMs (apart from the Hosted Engine VM) not start on power up of the hosts? Is it because the hosts were powered down, that they stay in a down state on power up of the host?

Engine is responsible for starting those VMs. As you had no engine, there was no one to start them. Hosted Engine tools are only responsible for the engine VM, not other VMs.

...

2. Now that I have connected the bare metal engine back to the cluster, is there a way back, or do I have to start from scratch again? I imagine there is no way of getting the Hosted Engine running again. If not, what do I need to 'clean' all the hosts of the remnants of the failed deployment? I can of course reinitialise the LUN that the Hosted Engine was on - anything else?

I know, there exists 'bare metal - to - hosted engine' migration procedure, but i doubt i knew it good enough. If i remember correctly, you need to take a backup of your bare metal engine database, run migration preparation script, that will handle spm_id duplications, deploy your first HE host, restore database from the backup, deploy more HE hosts. I'm not sure if those steps are correct and would better ask Martin about migration process.

cmc

1:57 p.m.

Hi Denis,

...

That sound really strange. I would suspect some storage problems or something. As i told you earlier, output of --vm-status may shed light on that issue.

Unfortunately, I can't replicate it at the moment due to the need to keep the VMs up.

...

> Did you tried to migrate form bare metal engine to the hosted engine? >

Yes, I used this procedure: http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_M... Essentially, I used a brand new host not joined to the cluster to deploy the Hosted Engine VM.

...

Engine is responsible for starting those VMs. As you had no engine, there was no one to start them. Hosted Engine tools are only responsible for the engine VM, not other VMs.

I could not find out why the engine would not start from the logs I looked at. I didn't have the time to spend on it as I had to get the VMs up and running

...

I did all these steps as per the URL above, and it did not report any errors during the process. The Hosted Engine VM started fine, but it did not appear in the list of VMs. I think the problem here was that the list of display types was incorrectly written in the hosted engine properties file. I was still left with the issue that the Hosted Engine could not be migrated to any other host. It was suggested to re-install the other hosts with the 'deploy hosted engine' option (which was missing in the official documentation). This didn't fix the issue so it was suggested that the host_id was incorrect (as it did not reflect the SPM ID of the host. I fixed this, then restarted the cluster...with the result that the engine would not start, and no VMs started. I could not see any storage errors in any of the logs I looked at, but it had not been a problem previously when rebooting hosts (though I'd never restarted the whole cluster before). When I used the old bare metal engine, I could get into the GUI to start the VMs, not sure why they didn't come up automatically. I'd like to get it working and will work with the person who takes it over to do this. I'd like to see it succeed so eventually we could use oVirt as a proof of concept to replace VMWare with RHEV. Everyone's help has been great, but unfortunately it hasn't been entirely smooth sailing (for this migration) so far. Thanks again, Cam

Denis Chaplygin

Thursday, 29 June Thu, 29 Jun

8:07 a.m.

Hello! On Thu, Jun 29, 2017 at 1:22 PM, Martin Sivak <msivak(a)redhat.com> wrote:

...

Change the ids so they are distinct. I need to check if there is a way to read the SPM ids from the engine as using the same numbers would be the best.

Host (SPM) ids are not shown in the UI, but you can search on it by typing 'spm_id=<value>' into a search box and it will return you host with the specified id or nothing if that id is not in use

cmc

12:19 p.m.

Hi Denis, I ran the query as you suggested, just by starting at spm_id=1 and on up to 3 (the number of hosts I have), and it identified a different host for each spm_id, indicating that they are indeed unique, so this looks good. Regards, Cam On Thu, Jun 29, 2017 at 2:07 PM, Denis Chaplygin <dchaplyg(a)redhat.com> wrote:

...

Hello! On Thu, Jun 29, 2017 at 1:22 PM, Martin Sivak <msivak(a)redhat.com> wrote: > > Change the ids so they are distinct. I need to check if there is a way > to read the SPM ids from the engine as using the same numbers would be > the best. Host (SPM) ids are not shown in the UI, but you can search on it by typing 'spm_id=<value>' into a search box and it will return you host with the specified id or nothing if that id is not in use

2924

days inactive

2939

days old

users@ovirt.org

Manage subscription

46 comments

7 participants

tags (0)

participants (7)

cmc
Denis Chaplygin
Evgenia Tokar
Martin Sivak
Michal Skrivanek
Tomas Jelinek
Yanir Quinn

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

HostedEngine VM not visible, but running