
Help! I put the cluster into global maintenance, then powered off and then on all of the nodes I have powered off and powered on all the nodes. I have taken it out of global maintenance. No VM has started, including the hosted engine. This is very bad. I am going to look through logs to see why nothing has started. Help greatly appreciated. Thanks, Cam On Fri, Jun 30, 2017 at 1:00 PM, cmc <iucounu@gmail.com> wrote:
So I can run from any node: hosted-engine --set-maintenance --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This shouldn't affect the running of any VMs, correct? Sorry for the questions, just want to do it correctly and not make assumptions :)
Cheers,
C
On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
Just to clarify: you mean the host_id in /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, correct?
Exactly.
Put the cluster to global maintenance first. Or kill all agents (has the same effect).
Martin
On Fri, Jun 30, 2017 at 12:47 PM, cmc <iucounu@gmail.com> wrote:
Just to clarify: you mean the host_id in /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, correct?
On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
cleaning metadata won't help in this case. Try transferring the spm_ids you got from the engine to the proper hosted engine hosts so the hosted engine ids match the spm_ids. Then restart all hosted engine services. I would actually recommend restarting all hosts after this change, but I have no idea how many VMs you have running.
Martin
On Thu, Jun 29, 2017 at 8:27 PM, cmc <iucounu@gmail.com> wrote:
Tried running a 'hosted-engine --clean-metadata" as per https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since ovirt-ha-agent was not running anyway, but it fails with the following error:
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 67, in action_clean return he.clean(options.force_cleanup) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 345, in clean self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 823, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug. INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
On Thu, Jun 29, 2017 at 6:10 PM, cmc <iucounu@gmail.com> wrote:
Actually, it looks like sanlock problems:
"SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit"
On Thu, Jun 29, 2017 at 5:10 PM, cmc <iucounu@gmail.com> wrote: > Sorry, I am mistaken, two hosts failed for the agent with the following error: > > ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine > ERROR Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine > ERROR Shutting down the agent because of 3 failures in a row! > > What could cause these timeouts? Some other service not running? > > On Thu, Jun 29, 2017 at 5:03 PM, cmc <iucounu@gmail.com> wrote: >> Both services are up on all three hosts. The broke logs just report: >> >> Thread-6549::INFO::2017-06-29 >> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >> Connection established >> Thread-6549::INFO::2017-06-29 >> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >> Connection closed >> >> Thanks, >> >> Cam >> >> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msivak@redhat.com> wrote: >>> Hi, >>> >>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>> are restarted and up. The error says the agent can't talk to the >>> broker. Is there anything in the broker.log? >>> >>> Best regards >>> >>> Martin Sivak >>> >>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iucounu@gmail.com> wrote: >>>> I've restarted those two services across all hosts, have taken the >>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>> Hosted Engine over to another host, it reports that all three hosts >>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>> host'. >>>> >>>> On the host that the Hosted Engine is currently on it reports in the agent.log: >>>> >>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>> Connection closed: Connection closed >>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>> getting service path: Connection closed >>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>> call last): >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 191, in _run_agent >>>> return action(he) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 64, in action_proper >>>> return >>>> he.start_monitoring() >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 411, in start_monitoring >>>> self._initialize_sanlock() >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 691, in _initialize_sanlock >>>> >>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>> line 162, in get_service_path >>>> .format(str(e))) >>>> RequestError: Failed >>>> to get service path: Connection closed >>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>> >>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msivak@redhat.com> wrote: >>>>> Hi, >>>>> >>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>>> >>>>> The scheduling message just means that the host has score 0 or is not >>>>> reporting score at all. >>>>> >>>>> Martin >>>>> >>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iucounu@gmail.com> wrote: >>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>> engine' option, which suggests that something needs restarting. Should >>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>> change in host_id? >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> Cam >>>>>> >>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msivak@redhat.com> wrote: >>>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>>> the best. >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>> added since switching to Hosted Engine. >>>>>>>> >>>>>>>> Unfortunately I have very little time left to get this working before >>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>> >>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>> >>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>> >>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>> 3) (clusterlock:282) >>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>>> Error acquiring host id 3 for domain >>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>> line 297, in acquireHostId >>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>> >>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>> >>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>> Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Unexpected error >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 443, in start_monitoring >>>>>>>> self._initialize_domain_monitor() >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>> raise Exception(msg) >>>>>>>> Exception: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>> >>>>>>>> From sanlock.log: >>>>>>>> >>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>> conflicts with name of list1 s5 >>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>> >>>>>>>> From the two other hosts: >>>>>>>> >>>>>>>> host 2: >>>>>>>> >>>>>>>> vdsm.log >>>>>>>> >>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>>> Internal server error (__init__:570) >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>> 565, in _handle_request >>>>>>>> res = method(**params) >>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>> 202, in _dynamicMethod >>>>>>>> result = fn(*methodArgs) >>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies >>>>>>>> 'current_values': v.getIoTune()} >>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>> result = self.getIoTuneResponse() >>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>>> res = self._dom.blockIoTune( >>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>> 47, in __getattr__ >>>>>>>> % self.vmid) >>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>>> started yet or was shut down >>>>>>>> >>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>> >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>> OVF_STORE volume path: >>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>> Got vm.conf from OVF_STORE >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>> >>>>>>>> /var/log/messages: >>>>>>>> >>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! >>>>>>>> >>>>>>>> >>>>>>>> host 1: >>>>>>>> >>>>>>>> /var/log/messages also in sanlock.log >>>>>>>> >>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>> >>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>> >>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>> Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Unexpected error >>>>>>>> Traceback (most recent call last): >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 443, in start_monitoring >>>>>>>> self._initialize_domain_monitor() >>>>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>> raise Exception(msg) >>>>>>>> Exception: Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>> VDSM domain monitor status: PENDING >>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>> Failed to stop monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>> Agent shutting down >>>>>>>> >>>>>>>> >>>>>>>> Thanks for any help, >>>>>>>> >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iucounu@gmail.com> wrote: >>>>>>>>> Hi Martin, >>>>>>>>> >>>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>>> a different host_id. >>>>>>>>> >>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>> >>>>>>>>> The summary is: >>>>>>>>> >>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>> >>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>>> >>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>>> Sanlock log shows conflicts >>>>>>>>> >>>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Campbell >>>>>>>>> >>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msivak@redhat.com> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> can you please check the contents of >>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>>> right now) and search for host-id? >>>>>>>>>> >>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug somewhere. >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>>> >>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262 >>>>>>>>>>> >>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> C >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>> Hi Martin, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>>> reports: >>>>>>>>>>>> >>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>> >>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>> Traceback (most >>>>>>>>>>>> recent call last): >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>> f = >>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>>> No such file or directory: >>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>> >>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>> >>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>>>>>>>>> perms are slightly different on the host that is running the VM vs the >>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>> >>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msivak@redhat.com> wrote: >>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>> >>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards >>>>>>>>>>>>> >>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode. >>>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <tjelinek@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>> <michal.skrivanek@redhat.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msivak@redhat.com> wrote: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older releases on >>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE >>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database backup from >>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try again >>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running the >>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to re-initialise >>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable state. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This was a >>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any error >>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first attempts at >>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy issue) but >>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the Hosted >>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire cluster >>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log on any >>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the >>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it on >>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it won't get >>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}' >>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted >>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling back >>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but they were >>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for which I >>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak <msivak@redhat.com> >>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should have >>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <iucounu@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am unsure how >>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>>> >>>>>>>>>> <etokar@redhat.com> >>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try starting >>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage domain >>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine vm. >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <iucounu@gmail.com> >>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed in the >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for that >>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host and >>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Meta...). >>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing master >>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are failing? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited with >>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017) >>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not HA due >>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the hosted-engine-ha rpm >>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc <iucounu@gmail.com> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted engine. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted engine >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing (exit >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried starting >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and why it >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users@ovirt.org >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>> >>>>>>>>> Users@ovirt.org >>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>> >>>>>> Users@ovirt.org >>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>> > Users@ovirt.org >>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>