
The OVF_STORE volume is going to get periodically recreated by the engine so at least you need a running engine. In order to avoid this kind of issue we have two OVF_STORE disks, in your case: MainThread::INFO::2019-03-06 06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81>MainThread::INFO::2019-03-06 06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4 Can you please check if you have at lest the second copy? Second Copy is empty too:[root@ovirt1 ~]# ll /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429 total 66561 -rw-rw----. 1 vdsm kvm 0 Mar 4 05:23 c3309fc0-8707-4de1-903d-8d4bbb024f81 -rw-rw----. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease -rw-r--r--. 1 vdsm kvm 435 Mar 4 05:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta
And even in the case you lost both, we are storing on the shared storage the initial vm.conf:>MainThread::ERROR::2019-03-06 >06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store) Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf
Can you please check what do you have in /var/run/ovirt-hosted-engine-ha/vm.conf ? It exists and has the following: [root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf # Editing the hosted engine VM is only possible via the manager UI\API # This file was generated at Thu Mar 7 15:37:26 2019
vmId=8474ae07-f172-4a20-b516-375c73903df7 memSize=4096 display=vnc devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk} devices={index:0,iface:virtio,format:raw,poolID:00000000-0000-0000-0000-000000000000,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00, slot:0x06, domain:0x0000, type:pci, function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1} devices={device:scsi,model:virtio-scsi,type:controller} devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00, slot:0x03, domain:0x0000, type:pci, function:0x0},device:bridge,type:interface} devices={device:console,type:console} devices={device:vga,alias:video0,type:video} devices={device:vnc,type:graphics} vmName=HostedEngine spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir smp=1 maxVCpus=8 cpuType=Opteron_G5 emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(', ')|first devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng} Also, I think this happened when I was upgrading ovirt1 (last in the gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I forgot to enable the global maintenance.
Sorry, I don't understand>Can you please explain what happened? I have updated the engine first -> All OK, next was the arbiter -> again no issues with it.Next was the empty host -> ovirt2 and everything went OK.After that I migrated the engine to ovirt2 , and tried to updated ovirt1.The web showed that the installation failed, but using "yum update" was working.During the update via yum of ovirt1 -> the engine app crashed and restarted on ovirt2.After the reboot of ovirt1 I have noticed the error about pinging the gateway ,thus I stopped the engine and stopped the following services on both hosts (global maintenance):ovirt-ha-agent ovirt-ha-broker vdsmd supervdsmd sanlock Next was a reinitialization of the sanlock space via 'sanlock direct -s'. In the end I have managed to power on the hosted-engine and it was running for a while. As the errors did not stop - I have decided to shutdown everything, then power it up , heal gluster and check what will happen. Currently I'm not able to power up the engine:
[root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt1.localdomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.localdomain Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 45e6772b local_conf_timestamp : 288 Host timestamp : 287 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=287 (Thu Mar 7 15:34:06 2019) host-id=1 score=3400 vm_conf_refresh_time=288 (Thu Mar 7 15:34:07 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host ovirt2.localdomain (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2.localdomain Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 2e9a0444 local_conf_timestamp : 3886 Host timestamp : 3885 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3885 (Thu Mar 7 15:34:05 2019) host-id=2 score=3400 vm_conf_refresh_time=3886 (Thu Mar 7 15:34:06 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start Command VM.getStats with args {'vmID': '8474ae07-f172-4a20-b516-375c73903df7'} failed: (code=1, message=Virtual machine does not exist: {'vmId': u'8474ae07-f172-4a20-b516-375c73903df7'}) [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt1.localdomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.localdomain Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "Down"} Score : 3400 stopped : False Local maintenance : False crc32 : 6b086b7c local_conf_timestamp : 328 Host timestamp : 327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=327 (Thu Mar 7 15:34:46 2019) host-id=1 score=3400 vm_conf_refresh_time=328 (Thu Mar 7 15:34:47 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host ovirt2.localdomain (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2.localdomain Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : c5890e9c local_conf_timestamp : 3926 Host timestamp : 3925 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3925 (Thu Mar 7 15:34:45 2019) host-id=2 score=3400 vm_conf_refresh_time=3926 (Thu Mar 7 15:34:45 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! [root@ovirt1 ovirt-hosted-engine-ha]# virsh list --all Id Name State ---------------------------------------------------- - HostedEngine shut off I am really puzzled why both volumes are wiped out . Best Regards,Strahil Nikolov