Hi Simone,
and thanks for your help.
So far I found out that there is some problem with the local copy of the HostedEngine
config (see attached part of vdsm.log).
I have found out an older xml configuration (in an old vdsm.log) and defining the VM
works, but powering it on reports:
[root@ovirt1 ~]# virsh define hosted-engine.xmlDomain HostedEngine defined from
hosted-engine.xml
[root@ovirt1 ~]# virsh list --all Id Name
State---------------------------------------------------- - HostedEngine
shut off
[root@ovirt1 ~]# virsh start HostedEngineerror: Failed to start domain HostedEngineerror:
Network not found: no network with matching name 'vdsm-ovirtmgmt'
[root@ovirt1 ~]# virsh net-list --all Name State Autostart
Persistent---------------------------------------------------------- ;vdsmdummy;
active no no default inactive no yes
[root@ovirt1 ~]# brctl showbridge name bridge id STP enabled
interfaces;vdsmdummy; 8000.000000000000 noovirtmgmt
8000.bc5ff467f5b3 no enp2s0
[root@ovirt1 ~]# ip a s1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state
UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6
::1/128 scope host valid_lft forever preferred_lft forever2: enp2s0:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovirtmgmt state UP group
default qlen 1000 link/ether bc:5f:f4:67:f5:b3 brd ff:ff:ff:ff:ff:ff3: ovs-system:
<BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether f6:78:c7:2d:32:f9 brd ff:ff:ff:ff:ff:ff4: br-int: <BROADCAST,MULTICAST>
mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 66:36:dd:63:dc:48 brd
ff:ff:ff:ff:ff:ff20: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
noqueue state UP group default qlen 1000 link/ether bc:5f:f4:67:f5:b3 brd
ff:ff:ff:ff:ff:ff inet 192.168.1.90/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever inet 192.168.1.243/24 brd 192.168.1.255 scope
global secondary ovirtmgmt valid_lft forever preferred_lft forever inet6
fe80::be5f:f4ff:fe67:f5b3/64 scope link valid_lft forever preferred_lft forever21:
;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen
1000 link/ether ce:36:8d:b7:64:bd brd ff:ff:ff:ff:ff:ff
192.168.1.243/24 is the one of the IPs in ctdb..
So , now comes the question - is there an xml in the logs that defines the network ?My
hope is to power up the HostedEngine properly and hope that it will push all the
configurations to the right places ... maybe this is way too optimistic.
At least I have learned a lot for oVirt.
Best Regards,Strahil Nikolov
В четвъртък, 7 март 2019 г., 17:55:12 ч. Гринуич+2, Simone Tiraboschi
<stirabos(a)redhat.com> написа:
On Thu, Mar 7, 2019 at 2:54 PM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
The OVF_STORE volume is going to get periodically recreated by the
engine so at least you need a running engine.
In order to avoid this kind of issue we have two OVF_STORE disks, in your case:
MainThread::INFO::2019-03-06
06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429,
volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81>MainThread::INFO::2019-03-06
06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0,
volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4
Can you please check if you have at lest the second copy?
Second Copy is empty
too:[root@ovirt1 ~]# ll
/rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429
total 66561
-rw-rw----. 1 vdsm kvm 0 Mar 4 05:23 c3309fc0-8707-4de1-903d-8d4bbb024f81
-rw-rw----. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
-rw-r--r--. 1 vdsm kvm 435 Mar 4 05:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta
And even in the case you lost both, we are storing on the shared
storage the initial vm.conf:>MainThread::ERROR::2019-03-06
>06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store)
Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf
Can you please check what do you have
in /var/run/ovirt-hosted-engine-ha/vm.conf ? It exists and has the following:
[root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf
# Editing the hosted engine VM is only possible via the manager UI\API
# This file was generated at Thu Mar 7 15:37:26 2019
vmId=8474ae07-f172-4a20-b516-375c73903df7
memSize=4096
display=vnc
devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1,
type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk}
devices={index:0,iface:virtio,format:raw,poolID:00000000-0000-0000-0000-000000000000,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00,
slot:0x06, domain:0x0000, type:pci,
function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1}
devices={device:scsi,model:virtio-scsi,type:controller}
devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00,
slot:0x03, domain:0x0000, type:pci, function:0x0},device:bridge,type:interface}
devices={device:console,type:console}
devices={device:vga,alias:video0,type:video}
devices={device:vnc,type:graphics}
vmName=HostedEngine
spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
smp=1
maxVCpus=8
cpuType=Opteron_G5
emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(',
')|first
devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng}
You should be able to copy it to /root/myvm.conf.xml and start the engine VM
withhosted-engine --vm-start --vm-conf=/root/myvm.conf
Also, I think this happened when I was upgrading ovirt1 (last in the gluster cluster) from
4.3.0 to 4.3.1 . The engine got restarted , because I forgot to enable the global
maintenance.
Sorry, I don't understand>Can you please explain what happened?
I have updated the engine first -> All OK, next was the arbiter -> again no
issues with it.Next was the empty host -> ovirt2 and everything went OK.After that I
migrated the engine to ovirt2 , and tried to updated ovirt1.The web showed that the
installation failed, but using "yum update" was working.During the update via
yum of ovirt1 -> the engine app crashed and restarted on ovirt2.After the reboot of
ovirt1 I have noticed the error about pinging the gateway ,thus I stopped the engine and
stopped the following services on both hosts (global maintenance):ovirt-ha-agent
ovirt-ha-broker vdsmd supervdsmd sanlock
Next was a reinitialization of the sanlock space via 'sanlock direct -s'.
In the end I have managed to power on the hosted-engine and it was running for a while.
As the errors did not stop - I have decided to shutdown everything, then power it up ,
heal gluster and check what will happen.
Currently I'm not able to power up the engine:
[root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
Please notice that in global maintenance mode nothing will try to start the engine VM for
you.I assume you tried to exit global maintenance mode or at least you tried to manually
start it with hosted-engine --vm-start, right?
--== Host ovirt1.localdomain (id: 1) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : ovirt1.localdomain
Host ID : 1
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 45e6772b
local_conf_timestamp : 288
Host timestamp : 287
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=287 (Thu Mar 7 15:34:06 2019)
host-id=1
score=3400
vm_conf_refresh_time=288 (Thu Mar 7 15:34:07 2019)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host ovirt2.localdomain (id: 2) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : ovirt2.localdomain
Host ID : 2
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 2e9a0444
local_conf_timestamp : 3886
Host timestamp : 3885
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3885 (Thu Mar 7 15:34:05 2019)
host-id=2
score=3400
vm_conf_refresh_time=3886 (Thu Mar 7 15:34:06 2019)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
!! Cluster is in GLOBAL MAINTENANCE mode !!
[root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start
Command VM.getStats with args {'vmID':
'8474ae07-f172-4a20-b516-375c73903df7'} failed:
(code=1, message=Virtual machine does not exist: {'vmId':
u'8474ae07-f172-4a20-b516-375c73903df7'})
[root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start
VM exists and is down, cleaning up and restarting
[root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host ovirt1.localdomain (id: 1) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : ovirt1.localdomain
Host ID : 1
Engine status : {"reason": "bad vm status",
"health": "bad", "vm": "down", "detail":
"Down"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 6b086b7c
local_conf_timestamp : 328
Host timestamp : 327
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=327 (Thu Mar 7 15:34:46 2019)
host-id=1
score=3400
vm_conf_refresh_time=328 (Thu Mar 7 15:34:47 2019)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host ovirt2.localdomain (id: 2) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : ovirt2.localdomain
Host ID : 2
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : c5890e9c
local_conf_timestamp : 3926
Host timestamp : 3925
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3925 (Thu Mar 7 15:34:45 2019)
host-id=2
score=3400
vm_conf_refresh_time=3926 (Thu Mar 7 15:34:45 2019)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
!! Cluster is in GLOBAL MAINTENANCE mode !!
[root@ovirt1 ovirt-hosted-engine-ha]# virsh list --all
Id Name State
----------------------------------------------------
- HostedEngine shut off
I am really puzzled why both volumes are wiped out .
This is really scaring: can you please double check gluster logs for warning and errors?
Best Regards,Strahil Nikolov