[ovirt-users] 2 hosts starting the engine at the same time?

Gianluca Cecchi gianluca.cecchi at gmail.com
Sun Jul 9 19:54:09 UTC 2017


Hello.
I'm on 4.1.3 with self hosted engine and glusterfs as storage.
I updated the kernel  on engine so I executed these steps:

- enable global maintenace from the web admin gui
- wait some minutes
- shutdown the engine vm from inside its OS
- wait some minutes
- execute on one host
[root at ovirt02 ~]# hosted-engine --set-maintenance --mode=none

I see that the qemu-kvm process for the engine starts on two hosts and then
on one of them it gets a "kill -15" and stops
Is it expected behaviour? It seems somehow dangerous to me..

- when in maintenance

[root at ovirt02 ~]# hosted-engine --vm-status


!! Cluster is in GLOBAL MAINTENANCE mode !!


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt01.localdomain.local
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up",
"detail": "up"}
Score                              : 2597
stopped                            : False
Local maintenance                  : False
crc32                              : 7931c5c3
local_conf_timestamp               : 19811
Host timestamp                     : 19794
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=19794 (Sun Jul  9 21:31:50 2017)
    host-id=1
    score=2597
    vm_conf_refresh_time=19811 (Sun Jul  9 21:32:06 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=GlobalMaintenance
    stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : 192.168.150.103
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 616ceb02
local_conf_timestamp               : 2829
Host timestamp                     : 2812
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=2812 (Sun Jul  9 21:31:52 2017)
    host-id=2
    score=3400
    vm_conf_refresh_time=2829 (Sun Jul  9 21:32:09 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=GlobalMaintenance
    stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt03.localdomain.local
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 871204b2
local_conf_timestamp               : 24584
Host timestamp                     : 24567
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=24567 (Sun Jul  9 21:31:52 2017)
    host-id=3
    score=3400
    vm_conf_refresh_time=24584 (Sun Jul  9 21:32:09 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=GlobalMaintenance
    stopped=False


!! Cluster is in GLOBAL MAINTENANCE mode !!
[root at ovirt02 ~]#


- then I exit global maintenance
[root at ovirt02 ~]# hosted-engine --set-maintenance --mode=none


- During monitoring of status, at some point I see "EngineStart" on both
host2 and host3

[root at ovirt02 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt01.localdomain.local
Host ID                            : 1
Engine status                      : {"reason": "bad vm status", "health":
"bad", "vm": "down", "detail": "down"}
Score                              : 3230
stopped                            : False
Local maintenance                  : False
crc32                              : 25cadbfb
local_conf_timestamp               : 20055
Host timestamp                     : 20040
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=20040 (Sun Jul  9 21:35:55 2017)
    host-id=1
    score=3230
    vm_conf_refresh_time=20055 (Sun Jul  9 21:36:11 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineDown
    stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : 192.168.150.103
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : e6951128
local_conf_timestamp               : 3075
Host timestamp                     : 3058
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=3058 (Sun Jul  9 21:35:59 2017)
    host-id=2
    score=3400
    vm_conf_refresh_time=3075 (Sun Jul  9 21:36:15 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineStart
    stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt03.localdomain.local
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 382efde5
local_conf_timestamp               : 24832
Host timestamp                     : 24816
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=24816 (Sun Jul  9 21:36:01 2017)
    host-id=3
    score=3400
    vm_conf_refresh_time=24832 (Sun Jul  9 21:36:17 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineStart
    stopped=False
[root at ovirt02 ~]#

and then

[root at ovirt02 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt01.localdomain.local
Host ID                            : 1
Engine status                      : {"reason": "bad vm status", "health":
"bad", "vm": "down", "detail": "down"}
Score                              : 3253
stopped                            : False
Local maintenance                  : False
crc32                              : 3fc39f31
local_conf_timestamp               : 20087
Host timestamp                     : 20070
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=20070 (Sun Jul  9 21:36:26 2017)
    host-id=1
    score=3253
    vm_conf_refresh_time=20087 (Sun Jul  9 21:36:43 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineDown
    stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : 192.168.150.103
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 4a05c31e
local_conf_timestamp               : 3109
Host timestamp                     : 3079
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=3079 (Sun Jul  9 21:36:19 2017)
    host-id=2
    score=3400
    vm_conf_refresh_time=3109 (Sun Jul  9 21:36:49 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineStarting
    stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt03.localdomain.local
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 382efde5
local_conf_timestamp               : 24832
Host timestamp                     : 24816
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=24816 (Sun Jul  9 21:36:01 2017)
    host-id=3
    score=3400
    vm_conf_refresh_time=24832 (Sun Jul  9 21:36:17 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineStart
    stopped=False
[root at ovirt02 ~]#

and

[root at ovirt02 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt01.localdomain.local
Host ID                            : 1
Engine status                      : {"reason": "bad vm status", "health":
"bad", "vm": "down", "detail": "down"}
Score                              : 3253
stopped                            : False
Local maintenance                  : False
crc32                              : 3fc39f31
local_conf_timestamp               : 20087
Host timestamp                     : 20070
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=20070 (Sun Jul  9 21:36:26 2017)
    host-id=1
    score=3253
    vm_conf_refresh_time=20087 (Sun Jul  9 21:36:43 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineDown
    stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : 192.168.150.103
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 4a05c31e
local_conf_timestamp               : 3109
Host timestamp                     : 3079
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=3079 (Sun Jul  9 21:36:19 2017)
    host-id=2
    score=3400
    vm_conf_refresh_time=3109 (Sun Jul  9 21:36:49 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineStarting
    stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt03.localdomain.local
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : fc1e8cf9
local_conf_timestamp               : 24868
Host timestamp                     : 24836
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=24836 (Sun Jul  9 21:36:21 2017)
    host-id=3
    score=3400
    vm_conf_refresh_time=24868 (Sun Jul  9 21:36:53 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineStarting
    stopped=False
[root at ovirt02 ~]#

and at the end Host3 goes to "ForceStop" for the engine

[root at ovirt02 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt01.localdomain.local
Host ID                            : 1
Engine status                      : {"reason": "bad vm status", "health":
"bad", "vm": "down", "detail": "down"}
Score                              : 3312
stopped                            : False
Local maintenance                  : False
crc32                              : e9d53432
local_conf_timestamp               : 20120
Host timestamp                     : 20102
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=20102 (Sun Jul  9 21:36:58 2017)
    host-id=1
    score=3312
    vm_conf_refresh_time=20120 (Sun Jul  9 21:37:15 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineDown
    stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : 192.168.150.103
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health":
"bad", "vm": "up", "detail": "powering up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7d2330be
local_conf_timestamp               : 3141
Host timestamp                     : 3124
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=3124 (Sun Jul  9 21:37:04 2017)
    host-id=2
    score=3400
    vm_conf_refresh_time=3141 (Sun Jul  9 21:37:21 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineStarting
    stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt03.localdomain.local
Host ID                            : 3
Engine status                      : {"reason": "Storage of VM is locked.
Is another host already starting the VM?", "health": "bad", "vm":
"already_locked", "detail": "down"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 179825e8
local_conf_timestamp               : 24900
Host timestamp                     : 24883
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=24883 (Sun Jul  9 21:37:08 2017)
    host-id=3
    score=3400
    vm_conf_refresh_time=24900 (Sun Jul  9 21:37:24 2017)
    conf_on_shared_storage=True
    maintenance=False
    state=EngineForceStop
    stopped=False
[root at ovirt02 ~]#


Comparing /var/log/libvirt/qemu/HostedEngine of host2 and host3

Host2:

2017-07-09 19:36:36.094+0000: starting up libvirt version: 2.0.0, package:
10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>,
2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0
(qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local
 ... char device redirected to /dev/pts/1 (label charconsole0)
warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]


Host3:

2017-07-09 19:36:38.143+0000: starting up libvirt version: 2.0.0, package:
10.el7_3.9 (CentOS BuildSystem <http://bu
gs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version:
2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hos
tname: ovirt03.localdomain.local
 ... char device redirected to /dev/pts/1 (label charconsole0)
2017-07-09 19:36:38.584+0000: shutting down
2017-07-09T19:36:38.589729Z qemu-kvm: terminating on signal 15 from pid 1835

any comment?
Is it only a matter of powering on the VM in paused mode before starting
the OS itself, or do I risk corruption due to 2 qemu-kvm processes trying
to start the engine vm os?

Thanks,
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170709/955157ce/attachment-0001.html>


More information about the Users mailing list