2 hosts starting the engine at the same time?

Hello. I'm on 4.1.3 with self hosted engine and glusterfs as storage. I updated the kernel on engine so I executed these steps: - enable global maintenace from the web admin gui - wait some minutes - shutdown the engine vm from inside its OS - wait some minutes - execute on one host [root@ovirt02 ~]# hosted-engine --set-maintenance --mode=none I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour? It seems somehow dangerous to me.. - when in maintenance [root@ovirt02 ~]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.localdomain.local Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 2597 stopped : False Local maintenance : False crc32 : 7931c5c3 local_conf_timestamp : 19811 Host timestamp : 19794 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=19794 (Sun Jul 9 21:31:50 2017) host-id=1 score=2597 vm_conf_refresh_time=19811 (Sun Jul 9 21:32:06 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : 192.168.150.103 Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 616ceb02 local_conf_timestamp : 2829 Host timestamp : 2812 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2812 (Sun Jul 9 21:31:52 2017) host-id=2 score=3400 vm_conf_refresh_time=2829 (Sun Jul 9 21:32:09 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt03.localdomain.local Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 871204b2 local_conf_timestamp : 24584 Host timestamp : 24567 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=24567 (Sun Jul 9 21:31:52 2017) host-id=3 score=3400 vm_conf_refresh_time=24584 (Sun Jul 9 21:32:09 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! [root@ovirt02 ~]# - then I exit global maintenance [root@ovirt02 ~]# hosted-engine --set-maintenance --mode=none - During monitoring of status, at some point I see "EngineStart" on both host2 and host3 [root@ovirt02 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.localdomain.local Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 3230 stopped : False Local maintenance : False crc32 : 25cadbfb local_conf_timestamp : 20055 Host timestamp : 20040 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=20040 (Sun Jul 9 21:35:55 2017) host-id=1 score=3230 vm_conf_refresh_time=20055 (Sun Jul 9 21:36:11 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : 192.168.150.103 Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : e6951128 local_conf_timestamp : 3075 Host timestamp : 3058 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3058 (Sun Jul 9 21:35:59 2017) host-id=2 score=3400 vm_conf_refresh_time=3075 (Sun Jul 9 21:36:15 2017) conf_on_shared_storage=True maintenance=False state=EngineStart stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt03.localdomain.local Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 382efde5 local_conf_timestamp : 24832 Host timestamp : 24816 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=24816 (Sun Jul 9 21:36:01 2017) host-id=3 score=3400 vm_conf_refresh_time=24832 (Sun Jul 9 21:36:17 2017) conf_on_shared_storage=True maintenance=False state=EngineStart stopped=False [root@ovirt02 ~]# and then [root@ovirt02 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.localdomain.local Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 3253 stopped : False Local maintenance : False crc32 : 3fc39f31 local_conf_timestamp : 20087 Host timestamp : 20070 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=20070 (Sun Jul 9 21:36:26 2017) host-id=1 score=3253 vm_conf_refresh_time=20087 (Sun Jul 9 21:36:43 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : 192.168.150.103 Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 4a05c31e local_conf_timestamp : 3109 Host timestamp : 3079 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3079 (Sun Jul 9 21:36:19 2017) host-id=2 score=3400 vm_conf_refresh_time=3109 (Sun Jul 9 21:36:49 2017) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt03.localdomain.local Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 382efde5 local_conf_timestamp : 24832 Host timestamp : 24816 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=24816 (Sun Jul 9 21:36:01 2017) host-id=3 score=3400 vm_conf_refresh_time=24832 (Sun Jul 9 21:36:17 2017) conf_on_shared_storage=True maintenance=False state=EngineStart stopped=False [root@ovirt02 ~]# and [root@ovirt02 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.localdomain.local Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 3253 stopped : False Local maintenance : False crc32 : 3fc39f31 local_conf_timestamp : 20087 Host timestamp : 20070 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=20070 (Sun Jul 9 21:36:26 2017) host-id=1 score=3253 vm_conf_refresh_time=20087 (Sun Jul 9 21:36:43 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : 192.168.150.103 Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 4a05c31e local_conf_timestamp : 3109 Host timestamp : 3079 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3079 (Sun Jul 9 21:36:19 2017) host-id=2 score=3400 vm_conf_refresh_time=3109 (Sun Jul 9 21:36:49 2017) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt03.localdomain.local Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : fc1e8cf9 local_conf_timestamp : 24868 Host timestamp : 24836 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=24836 (Sun Jul 9 21:36:21 2017) host-id=3 score=3400 vm_conf_refresh_time=24868 (Sun Jul 9 21:36:53 2017) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False [root@ovirt02 ~]# and at the end Host3 goes to "ForceStop" for the engine [root@ovirt02 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.localdomain.local Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 3312 stopped : False Local maintenance : False crc32 : e9d53432 local_conf_timestamp : 20120 Host timestamp : 20102 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=20102 (Sun Jul 9 21:36:58 2017) host-id=1 score=3312 vm_conf_refresh_time=20120 (Sun Jul 9 21:37:15 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : 192.168.150.103 Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "up", "detail": "powering up"} Score : 3400 stopped : False Local maintenance : False crc32 : 7d2330be local_conf_timestamp : 3141 Host timestamp : 3124 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3124 (Sun Jul 9 21:37:04 2017) host-id=2 score=3400 vm_conf_refresh_time=3141 (Sun Jul 9 21:37:21 2017) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt03.localdomain.local Host ID : 3 Engine status : {"reason": "Storage of VM is locked. Is another host already starting the VM?", "health": "bad", "vm": "already_locked", "detail": "down"} Score : 3400 stopped : False Local maintenance : False crc32 : 179825e8 local_conf_timestamp : 24900 Host timestamp : 24883 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=24883 (Sun Jul 9 21:37:08 2017) host-id=3 score=3400 vm_conf_refresh_time=24900 (Sun Jul 9 21:37:24 2017) conf_on_shared_storage=True maintenance=False state=EngineForceStop stopped=False [root@ovirt02 ~]# Comparing /var/log/libvirt/qemu/HostedEngine of host2 and host3 Host2: 2017-07-09 19:36:36.094+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local ... char device redirected to /dev/pts/1 (label charconsole0) warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9] Host3: 2017-07-09 19:36:38.143+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bu gs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hos tname: ovirt03.localdomain.local ... char device redirected to /dev/pts/1 (label charconsole0) 2017-07-09 19:36:38.584+0000: shutting down 2017-07-09T19:36:38.589729Z qemu-kvm: terminating on signal 15 from pid 1835 any comment? Is it only a matter of powering on the VM in paused mode before starting the OS itself, or do I risk corruption due to 2 qemu-kvm processes trying to start the engine vm os? Thanks, Gianluca

On Sun, Jul 9, 2017 at 9:54 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello. I'm on 4.1.3 with self hosted engine and glusterfs as storage. I updated the kernel on engine so I executed these steps:
- enable global maintenace from the web admin gui - wait some minutes - shutdown the engine vm from inside its OS - wait some minutes - execute on one host [root@ovirt02 ~]# hosted-engine --set-maintenance --mode=none
I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour? It seems somehow dangerous to me..
And I don't know how related, but the engine vm doesn't come up. Connecting to its vnc console I get it "booting from hard disk" ....: https://drive.google.com/file/d/0BwoPbcrMv8mvOEJWeVRvNThmTWc/view?usp=sharin... Gluster volume for the engine vm storage domain seems ok... [root@ovirt01 vdsm]# gluster volume heal engine info Brick ovirt01.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 Brick ovirt02.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 Brick ovirt03.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 [root@ovirt01 vdsm]# and in HostedEngine.log 2017-07-09 19:59:20.660+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt01.localdomain.local LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+rtm,+hle -m 6144 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 87fd6bdb-535d-45b8-81d4-7e3101a6c364 -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=564D777E-B638-E808-9044-680BA4957704,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-07-09T19:59:20,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/run/vdsm/storage/e9e4a478-f391-42e5-9bb8-ed22a33e5cab/cf8b8f4e-fa01-457e-8a96-c5a27f8408f8/94c46bac-0a9f-49e8-9188-627fa0caf2b6,format=raw,if=none,id=drive-virtio-disk0,serial=cf8b8f4e-fa01-457e-8a96-c5a27f8408f8,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:0a:e7:ba,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on char device redirected to /dev/pts/1 (label charconsole0) warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]

On Sun, Jul 9, 2017 at 11:12 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Sun, Jul 9, 2017 at 9:54 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello. I'm on 4.1.3 with self hosted engine and glusterfs as storage. I updated the kernel on engine so I executed these steps:
- enable global maintenace from the web admin gui - wait some minutes - shutdown the engine vm from inside its OS - wait some minutes - execute on one host [root@ovirt02 ~]# hosted-engine --set-maintenance --mode=none
I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour?
In the 'hosted-engine' script itself, in the function cmd_vm_start, there is a comment: # TODO: Check first the sanlock status, and if allows: Perhaps ha-agent checks sanlock status before starting the VM? Adding Martin. Please also check/share agent.log.
It seems somehow dangerous to me..
And I don't know how related, but the engine vm doesn't come up. Connecting to its vnc console I get it "booting from hard disk" ....: https://drive.google.com/file/d/0BwoPbcrMv8mvOEJWeVRvNThmTWc/view?usp=sharin...
Gluster volume for the engine vm storage domain seems ok...
[root@ovirt01 vdsm]# gluster volume heal engine info Brick ovirt01.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0
Brick ovirt02.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0
Brick ovirt03.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0
[root@ovirt01 vdsm]#
and in HostedEngine.log
2017-07-09 19:59:20.660+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt01.localdomain.local LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+rtm,+hle -m 6144 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 87fd6bdb-535d-45b8-81d4-7e3101a6c364 -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=564D777E-B638-E808-9044-680BA4957704,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-07-09T19:59:20,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/run/vdsm/storage/e9e4a478-f391-42e5-9bb8-ed22a33e5cab/cf8b8f4e-fa01-457e-8a96-c5a27f8408f8/94c46bac-0a9f-49e8-9188-627fa0caf2b6,format=raw,if=none,id=drive-virtio-disk0,serial=cf8b8f4e-fa01-457e-8a96-c5a27f8408f8,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:0a:e7:ba,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on char device redirected to /dev/pts/1 (label charconsole0) warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi

On Mon, Jul 10, 2017 at 8:11 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jul 9, 2017 at 11:12 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Sun, Jul 9, 2017 at 9:54 PM, Gianluca Cecchi <
gianluca.cecchi@gmail.com>
wrote:
Hello. I'm on 4.1.3 with self hosted engine and glusterfs as storage. I updated the kernel on engine so I executed these steps:
- enable global maintenace from the web admin gui - wait some minutes - shutdown the engine vm from inside its OS - wait some minutes - execute on one host [root@ovirt02 ~]# hosted-engine --set-maintenance --mode=none
I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour?
In the 'hosted-engine' script itself, in the function cmd_vm_start, there is a comment: # TODO: Check first the sanlock status, and if allows:
Perhaps ha-agent checks sanlock status before starting the VM? Adding Martin.
Please also check/share agent.log.
-- Didi
Agent.log in gzip format for ovirt01 here: https://drive.google.com/file/d/0BwoPbcrMv8mvTnk2TWtvc0Fwakk/view?usp=sharin... Agent.log in gzip format for ovirt02 here: https://drive.google.com/file/d/0BwoPbcrMv8mvZV9seEVLSGRkeGM/view?usp=sharin... Agent.log in gzip format for ovirt03 here: https://drive.google.com/file/d/0BwoPbcrMv8mvN2E1dmlld2Ftekk/view?usp=sharin... Do you understand why my engine vm remains in "Booting from Hard Disk" screen from these logs? Thanks, Gianluca

On Mon, Jul 10, 2017 at 12:42 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com
wrote:
Do you understand why my engine vm remains in "Booting from Hard Disk" screen from these logs?
Thanks, Gianluca
2017-07-09 23:09:13.894+0000: starting up libvirt version: 2.0.0,
One of possible causes of my engine unable to boot up could be this? In 4.1.2 I was already in 7.3 and with the same qemu-kvm-ev and libvirt versions of 4.1.3, if I compare the command line of the VM, it was: diff 4.1.2 4.1.3 < 2017-07-04 16:42:19.418+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local --- package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local 5,6c5,6 < -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off < -cpu qemu64,-svm -m 6144 ---
-machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+rtm,+hle -m 6144 9c9 < -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=564D7100-F0D4-3ACC-795A-145A595604C0,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364'
-smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=564D8F16-993A-33E1-3B2E-E1740F99C542,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364' 13c13 < -rtc base=2017-07-04T16:42:19,driftfix=slew
-rtc base=2017-07-09T23:09:13,driftfix=slew 21c21 < -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33
-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 28a29,30 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 31c33,35 < -incoming defer -msg timestamp=on
-object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on
So it seems it changed both "machine" (from pc-i440fx-rhel7.2.0 to pc-i440fx-rhel7.3.0) and "cpu" (from qemu64,-svm to Broadwell,+rtm,+hle) Can I revert so that I can check if it has any influence? BTW: this is a nested environment, where the L0 host is ESX 6.0 U2. Thanks for any suggestin to fix engine start Gianluca

On Mon, Jul 10, 2017 at 1:40 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jul 10, 2017 at 12:42 PM, Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
Do you understand why my engine vm remains in "Booting from Hard Disk" screen from these logs?
Thanks, Gianluca
One of possible causes of my engine unable to boot up could be this?
In 4.1.2 I was already in 7.3 and with the same qemu-kvm-ev and libvirt versions of 4.1.3, if I compare the command line of the VM, it was:
diff 4.1.2 4.1.3
2017-07-09 23:09:13.894+0000: starting up libvirt version: 2.0.0,
< 2017-07-04 16:42:19.418+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local --- package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local 5,6c5,6 < -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off < -cpu qemu64,-svm -m 6144 ---
-machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+rtm,+hle -m 6144 9c9 < -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=564D7100- F0D4-3ACC-795A-145A595604C0,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364'
-smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=564D8F16-993A- 33E1-3B2E-E1740F99C542,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364' 13c13 < -rtc base=2017-07-04T16:42:19,driftfix=slew
-rtc base=2017-07-09T23:09:13,driftfix=slew 21c21 < -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33
-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 28a29,30 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 31c33,35 < -incoming defer -msg timestamp=on
-object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on
So it seems it changed both "machine" (from pc-i440fx-rhel7.2.0 to pc-i440fx-rhel7.3.0) and "cpu" (from qemu64,-svm to Broadwell,+rtm,+hle) Can I revert so that I can check if it has any influence?
BTW: this is a nested environment, where the L0 host is ESX 6.0 U2.
Thanks for any suggestin to fix engine start
Gianluca
Currently I have set the environment in global maintenance. Does it make sense to try to start the HostedEngine with an alternate vm.conf to crosscheck it it is then able to start ok? I see that there is the file /var/run/ovirt-hosted-engine-ha/vm.conf that seems refreshed every minute It seems that apparently I can copy it into another place, modify it and try to start engine with the modified file using hosted-engine --vm-start --vm-conf=/alternate/path_vm.conf is it correct? the modified file would be such that: [root@ovirt02 images]# diff /var/run/ovirt-hosted-engine-ha/vm.conf /root/alternate_vm.conf 1,2c1,2 < cpuType=Broadwell < emulatedMachine=pc-i440fx-rhel7.3.0 ---
cpuType=qemu64 emulatedMachine=pc-i440fx-rhel7.2.0 [root@ovirt02 images]#

On Mon, Jul 10, 2017 at 5:12 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jul 10, 2017 at 1:40 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jul 10, 2017 at 12:42 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Do you understand why my engine vm remains in "Booting from Hard Disk" screen from these logs?
Thanks, Gianluca
One of possible causes of my engine unable to boot up could be this?
In 4.1.2 I was already in 7.3 and with the same qemu-kvm-ev and libvirt versions of 4.1.3, if I compare the command line of the VM, it was:
diff 4.1.2 4.1.3
< 2017-07-04 16:42:19.418+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local ---
2017-07-09 23:09:13.894+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt02.localdomain.local 5,6c5,6 < -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off < -cpu qemu64,-svm -m 6144
-machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+rtm,+hle -m 6144 9c9 < -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=564D7100-F0D4-3ACC-795A-145A595604C0,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364'
-smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=564D8F16-993A-33E1-3B2E-E1740F99C542,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364' 13c13 < -rtc base=2017-07-04T16:42:19,driftfix=slew
-rtc base=2017-07-09T23:09:13,driftfix=slew 21c21 < -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33
-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 28a29,30 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 31c33,35 < -incoming defer -msg timestamp=on
-object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on
So it seems it changed both "machine" (from pc-i440fx-rhel7.2.0 to pc-i440fx-rhel7.3.0) and "cpu" (from qemu64,-svm to Broadwell,+rtm,+hle) Can I revert so that I can check if it has any influence?
BTW: this is a nested environment, where the L0 host is ESX 6.0 U2.
Thanks for any suggestin to fix engine start
Gianluca
Currently I have set the environment in global maintenance. Does it make sense to try to start the HostedEngine with an alternate vm.conf to crosscheck it it is then able to start ok?
Not sure. Depends on why you think it currently fails. Sorry but I didn't check your logs yet.
I see that there is the file /var/run/ovirt-hosted-engine-ha/vm.conf that seems refreshed every minute
Indeed. In recent versions it's possible to change some of the HE VM configuration from the engine itself, just like any other VM, so HA has to update this file.
It seems that apparently I can copy it into another place, modify it and try to start engine with the modified file using
hosted-engine --vm-start --vm-conf=/alternate/path_vm.conf
is it correct?
Yes, as also written here: https://www.ovirt.org/documentation/how-to/hosted-engine/#handle-engine-vm-b...
the modified file would be such that:
[root@ovirt02 images]# diff /var/run/ovirt-hosted-engine-ha/vm.conf /root/alternate_vm.conf 1,2c1,2 < cpuType=Broadwell < emulatedMachine=pc-i440fx-rhel7.3.0 ---
cpuType=qemu64 emulatedMachine=pc-i440fx-rhel7.2.0 [root@ovirt02 images]#
No idea about your specific issue or whether this can fix it, but you can try. Especially if you can test on a test env... Best, -- Didi

On Mon, Jul 10, 2017 at 4:38 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Currently I have set the environment in global maintenance. Does it make sense to try to start the HostedEngine with an alternate vm.conf to crosscheck it it is then able to start ok?
Not sure. Depends on why you think it currently fails. Sorry but I didn't check your logs yet.
Because in 4.1.2 it started. During update to 4.1.3 I moved the engine vm without problems to the already updated hosts. But I suppose during migration the qemu-kvm command line is preserverd, isn't it? And only when I planned to update kernel of the engine vm so I had to power off it, I began to have problems....
I see that there is the file /var/run/ovirt-hosted-engine-ha/vm.conf that seems refreshed every minute
Indeed. In recent versions it's possible to change some of the HE VM configuration from the engine itself, just like any other VM, so HA has to update this file.
It seems that apparently I can copy it into another place, modify it and
try
to start engine with the modified file using
hosted-engine --vm-start --vm-conf=/alternate/path_vm.conf
is it correct?
Yes, as also written here:
https://www.ovirt.org/documentation/how-to/hosted- engine/#handle-engine-vm-boot-problems
the modified file would be such that:
[root@ovirt02 images]# diff /var/run/ovirt-hosted-engine-ha/vm.conf /root/alternate_vm.conf 1,2c1,2 < cpuType=Broadwell < emulatedMachine=pc-i440fx-rhel7.3.0 ---
cpuType=qemu64 emulatedMachine=pc-i440fx-rhel7.2.0 [root@ovirt02 images]#
No idea about your specific issue or whether this can fix it, but you can try. Especially if you can test on a test env...
Best, -- Didi
And in fact the engine vm is now up again with the customized parameters How can I have it permanent? Frome where on shared storage are they read? Possibly the problems is generated by my nested environment: L0 = ESXi 6.0 U2 on NUC6i5syh L1 are my oVirt hosts on this L0 L2 is my engine VM that doesn't start in 4.1.3 it seems that cpuType=Broadwell and/or emulatedMachine=pc-i440fx-rhel7.3.0 generates problems.... And I think I would have same problems with ordinary VMs, correct? Gianluca

On Mon, Jul 10, 2017 at 6:42 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jul 10, 2017 at 4:38 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Currently I have set the environment in global maintenance. Does it make sense to try to start the HostedEngine with an alternate vm.conf to crosscheck it it is then able to start ok?
Not sure. Depends on why you think it currently fails. Sorry but I didn't check your logs yet.
Because in 4.1.2 it started. During update to 4.1.3 I moved the engine vm without problems to the already updated hosts. But I suppose during migration the qemu-kvm command line is preserverd, isn't it? And only when I planned to update kernel of the engine vm so I had to power off it, I began to have problems....
I see that there is the file /var/run/ovirt-hosted-engine-ha/vm.conf that seems refreshed every minute
Indeed. In recent versions it's possible to change some of the HE VM configuration from the engine itself, just like any other VM, so HA has to update this file.
It seems that apparently I can copy it into another place, modify it and try to start engine with the modified file using
hosted-engine --vm-start --vm-conf=/alternate/path_vm.conf
is it correct?
Yes, as also written here:
https://www.ovirt.org/documentation/how-to/hosted-engine/#handle-engine-vm-b...
the modified file would be such that:
[root@ovirt02 images]# diff /var/run/ovirt-hosted-engine-ha/vm.conf /root/alternate_vm.conf 1,2c1,2 < cpuType=Broadwell < emulatedMachine=pc-i440fx-rhel7.3.0 ---
cpuType=qemu64 emulatedMachine=pc-i440fx-rhel7.2.0 [root@ovirt02 images]#
No idea about your specific issue or whether this can fix it, but you can try. Especially if you can test on a test env...
Best, -- Didi
And in fact the engine vm is now up again with the customized parameters How can I have it permanent? Frome where on shared storage are they read?
I think you are supposed to change such parameters from the engine ui, not by directly manipulating the shared storage using [1]. But I didn't try this myself, not sure if cpu type can be changed. [1] http://www.ovirt.org/develop/release-management/features/sla/hosted-engine-e...
Possibly the problems is generated by my nested environment:
L0 = ESXi 6.0 U2 on NUC6i5syh L1 are my oVirt hosts on this L0 L2 is my engine VM that doesn't start in 4.1.3
it seems that cpuType=Broadwell and/or emulatedMachine=pc-i440fx-rhel7.3.0 generates problems....
No idea about this, adding Michal. Best,
And I think I would have same problems with ordinary VMs, correct? Gianluca
-- Didi

On Wed, Jul 12, 2017 at 12:16 PM, Yedidyah Bar David <didi@redhat.com> wrote:
the modified file would be such that:
[root@ovirt02 images]# diff /var/run/ovirt-hosted-engine-ha/vm.conf /root/alternate_vm.conf 1,2c1,2 < cpuType=Broadwell < emulatedMachine=pc-i440fx-rhel7.3.0 ---
cpuType=qemu64 emulatedMachine=pc-i440fx-rhel7.2.0 [root@ovirt02 images]#
No idea about your specific issue or whether this can fix it, but you can try. Especially if you can test on a test env...
Best, -- Didi
And in fact the engine vm is now up again with the customized parameters How can I have it permanent? Frome where on shared storage are they read?
I think you are supposed to change such parameters from the engine ui, not by directly manipulating the shared storage using [1]. But I didn't try this myself, not sure if cpu type can be changed.
[1] http://www.ovirt.org/develop/release-management/features/ sla/hosted-engine-edit-configuration-on-shared-storage/
In Engine VM System settings, advanced parameters, I currently have this config: https://drive.google.com/file/d/0BwoPbcrMv8mvQ29ickF2cWlMNUE/view?usp=sharin... I have to check if it is only one of the 2 parameters generating problems or only one. Nevertheless, if I try to change Custom Emulated Machine I then get: " Error while executing action: HostedEngine: There was an attempt to change Hosted Engine VM values that are locked. " and the same if I try to change Custom CPU Type...
Possibly the problems is generated by my nested environment:
L0 = ESXi 6.0 U2 on NUC6i5syh L1 are my oVirt hosts on this L0 L2 is my engine VM that doesn't start in 4.1.3
it seems that cpuType=Broadwell and/or emulatedMachine=pc-i440fx-rhel7.3.0 generates problems....
No idea about this, adding Michal.
Best,
And I think I would have same problems with ordinary VMs, correct? Gianluca
-- Didi
On the Hypervisors (actually ESXi VMs) I get this when running the command: # vdsClient -s 0 getVdsCapabilities ... clusterLevels = ['3.6', '4.0', '4.1'] containers = False cpuCores = '2' cpuFlags = 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,mmx,fxsr,sse,sse2,ss,ht,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,nopl,xtopology,tsc_reliable,nonstop_tsc,aperfmperf,eagerfpu,pni,pclmulqdq,vmx,ssse3,fma,cx16,pcid,sse4_1,sse4_2,x2apic,movbe,popcnt,tsc_deadline_timer,aes,xsave,avx,f16c,rdrand,hypervisor,lahf_lm,abm,3dnowprefetch,ida,arat,epb,pln,pts,dtherm,hwp,hwp_noitfy,hwp_act_window,hwp_epp,tpr_shadow,vnmi,ept,vpid,fsgsbase,tsc_adjust,bmi1,hle,avx2,smep,bmi2,invpcid,rtm,rdseed,adx,smap,xsaveopt,model_Haswell,model_Broadwell,model_Haswell-noTSX,model_Nehalem,model_Conroe,model_Penryn,model_IvyBridge,model_Westmere,model_Broadwell-noTSX,model_SandyBridge' cpuModel = 'Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz' cpuSockets = '1' cpuSpeed = '2302.749' cpuThreads = '2' emulatedMachines = ['pc-i440fx-rhel7.1.0', 'pc-q35-rhel7.3.0', 'rhel6.3.0', 'pc-i440fx-rhel7.0.0', 'rhel6.1.0', 'rhel6.6.0', 'rhel6.2.0', 'pc', 'pc-i440fx-rhel7.3.0', 'q35', 'pc-i440fx-rhel7.2.0', 'rhel6.4.0', 'rhel6.0.0', 'rhel6.5.0'] ... Gianluca

In the mean time I have verified that the problem is with emulatedMachine=pc-i440fx-rhel7.3.0 Summarizing: default boot ofengine VM after update to 4.1.3 is with cpuType=Broadwell and emulatedMachine=pc-i440fx-rhel7.3.0 and the engine vm hangs at "Booting from Hard Disk " screen When starting with cpuType=qemu64 and emulatedMachine=pc-i440fx-rhel7.2.0 the engine comes up normally When starting with cpuType=Broadwell and emulatedMachine=pc-i440fx-rhel7.2.0 the engine comes up normally When starting with cpuType=qemu64 and emulatedMachine=pc-i440fx-rhel7.3.0 the engine vm hangs at "Booting from Hard Disk " screen Gianluca

On 12 Jul 2017, at 16:30, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
In the mean time I have verified that the problem is with
emulatedMachine=pc-i440fx-rhel7.3.0
Might be. The fact that it works with 7.2 in esxi nested environment is nice, but definitely not supported. Use lower-than-broadwell CPU - that might help. Instead of qemu64 which is emulated...but if if works fast enough and reliably for you then it's fine
Summarizing:
default boot ofengine VM after update to 4.1.3 is with cpuType=Broadwell and emulatedMachine=pc-i440fx-rhel7.3.0 and the engine vm hangs at "Booting from Hard Disk " screen
When starting with cpuType=qemu64 and emulatedMachine=pc-i440fx-rhel7.2.0 the engine comes up normally When starting with cpuType=Broadwell and emulatedMachine=pc-i440fx-rhel7.2.0 the engine comes up normally When starting with cpuType=qemu64 and emulatedMachine=pc-i440fx-rhel7.3.0 the engine vm hangs at "Booting from Hard Disk " screen
Gianluca

On Thu, Jul 13, 2017 at 11:08 AM, Michal Skrivanek <mskrivan@redhat.com> wrote:
On 12 Jul 2017, at 16:30, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
In the mean time I have verified that the problem is with
emulatedMachine=pc-i440fx-rhel7.3.0
Might be. The fact that it works with 7.2 in esxi nested environment is nice, but definitely not supported. Use lower-than-broadwell CPU - that might help. Instead of qemu64 which is emulated...but if if works fast enough and reliably for you then it's fine
The qemu64 is not a problem actually, because if I set cpu Broadwell and machine type pc-i440fx-rhel7.2.0 things go well. Also, on an older physical hw with Westmere CPUs where I have oVirt 4.1 too, the VMs start with emulatedMachine=pc-i440fx-rhel7.3.0, so this parameter doesn't depend on cpu itself. I think emulatedMachine is comparable to vSphere Virtual HW instead, correct? And that this functionality is provided actually by qemu-kvm-ev (and perhaps in junction with seabios?). If I run rpm -q --changelog qemu-kvm-ev I see in fact ... * Mon Jun 06 2016 Miroslav Rezanina <mrezanin@redhat.com> - rhev-2.6.0-5.el7 ... - kvm-pc-New-default-pc-i440fx-rhel7.3.0-machine-type.patch [bz#1305121] ... So it means that at a certain point, the default machine type used by qemu-kvm-ev has become 7.3 and this generates problems in my specific lab environment now (not searching "official" support for it.. ;-). For the other ordinary L2 VMs definedinside this oVirt nested environment, I can set in System --> Advanced Parameters --> Custom Emulated Machine the value pc-i440fx-rhel7.2.0 and I'm ok and they are able to start. The problem still remains for the engine vm itself, where I cannot manually set it. Possibly is there a qemu-kvm-ev overall system configuration where I can tell to force emulated machine type to pc-i440fx-rhel7.2.0 (without downgrading qemu-kvm-ev)? Otherwise I know that when I have to poweroff/restart the engine vm I have to manually start it in 7.2 mode, as I'm testing right now. Hope I have clarified better... Gianluca

On 12 Jul 2017, at 16:30, Gianluca Cecchi <gianluca.cecchi@gmail.com = <mailto:gianluca.cecchi@gmail.com>> wrote:
In the mean time I have verified that the problem is with
emulatedMachine=3Dpc-i440fx-rhel7.3.0 =20 Might be. The fact that it works with 7.2 in esxi nested environment is nice, but definitely not supported. Use lower-than-broadwell CPU - that might help. Instead of qemu64 which is emulated...but if if works fast enough and reliably for you
On 13 Jul 2017, at 11:51, Gianluca Cecchi <gianluca.cecchi@gmail.com> = wrote: =20 On Thu, Jul 13, 2017 at 11:08 AM, Michal Skrivanek = <mskrivan@redhat.com <mailto:mskrivan@redhat.com>> wrote: then it's fine =20 =20 The qemu64 is not a problem actually, because if I set cpu Broadwell = and machine type pc-i440fx-rhel7.2.0 things go well. Also, on an older physical hw with Westmere CPUs where I have oVirt = 4.1 too, the VMs start with emulatedMachine=3Dpc-i440fx-rhel7.3.0, so =
--Apple-Mail=_B2BF5B68-E5BD-4998-9F5D-5188B3CB9653 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 this parameter doesn't depend on cpu itself.
=20 I think emulatedMachine is comparable to vSphere Virtual HW instead, = correct?
And that this functionality is provided actually by qemu-kvm-ev (and =
yes perhaps in junction with seabios?). yes. By using -7.2.0 type you=E2=80=99re basically just using the = backward compatibility code. Likely there was some change in how the = hardware looks like in the guest which affected ESXi nesting for some = CPUs
If I run=20 =20 rpm -q --changelog qemu-kvm-ev I see in fact =20 ... * Mon Jun 06 2016 Miroslav Rezanina <mrezanin@redhat.com = <mailto:mrezanin@redhat.com>> - rhev-2.6.0-5.el7 ... - kvm-pc-New-default-pc-i440fx-rhel7.3.0-machine-type.patch = [bz#1305121] ... =20 So it means that at a certain point, the default machine type used by = qemu-kvm-ev has become 7.3 and this generates problems in my specific = lab environment now (not searching "official" support for it.. ;-). For the other ordinary L2 VMs definedinside this oVirt nested = environment, I can set in System --> Advanced Parameters --> Custom = Emulated Machine the value pc-i440fx-rhel7.2.0 and I'm ok and they are = able to start. The problem still remains for the engine vm itself, where I cannot = manually set it. Possibly is there a qemu-kvm-ev overall system configuration where I = can tell to force emulated machine type to pc-i440fx-rhel7.2.0 (without = downgrading qemu-kvm-ev)?
I suppose you can define it in HE OVF? Didi? That would be cleaner. You can also use a vdsm hook just for that...
=20 Otherwise I know that when I have to poweroff/restart the engine vm I = have to manually start it in 7.2 mode, as I'm testing right now. =20 Hope I have clarified better... =20 Gianluca
--Apple-Mail=_B2BF5B68-E5BD-4998-9F5D-5188B3CB9653 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 13 Jul 2017, at 11:51, Gianluca Cecchi <<a = href=3D"mailto:gianluca.cecchi@gmail.com" = class=3D"">gianluca.cecchi@gmail.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D"gmail_extra"><div class=3D"gmail_quote">On Thu, = Jul 13, 2017 at 11:08 AM, Michal Skrivanek <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:mskrivan@redhat.com" target=3D"_blank" = class=3D"">mskrivan@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span = class=3D"gmail-">> On 12 Jul 2017, at 16:30, Gianluca Cecchi <<a = href=3D"mailto:gianluca.cecchi@gmail.com" = class=3D"">gianluca.cecchi@gmail.com</a>> wrote:<br class=3D""> ><br class=3D""> > In the mean time I have verified that the problem is with<br = class=3D""> ><br class=3D""> > emulatedMachine=3Dpc-i440fx-<wbr class=3D"">rhel7.3.0<br class=3D""> <br class=3D""> </span>Might be. The fact that it works with 7.2 in esxi nested = environment<br class=3D""> is nice, but definitely not supported.<br class=3D""> Use lower-than-broadwell CPU - that might help. Instead of qemu64<br = class=3D""> which is emulated...but if if works fast enough and reliably for you<br = class=3D""> then it's fine<br class=3D""> <div class=3D"gmail-HOEnZb"><div class=3D"gmail-h5"><br = class=3D""></div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">The qemu64 is not a problem actually, = because if I set cpu Broadwell and machine type pc-i440fx-rhel7.2.0 = things go well.</div><div class=3D"">Also, on an older physical hw with = Westmere CPUs where I have oVirt 4.1 too, the VMs start = with emulatedMachine=3Dpc-i440fx-rhel7.3.0, so this parameter = doesn't depend on cpu itself.</div><div class=3D""><br = class=3D""></div><div class=3D"">I think emulatedMachine is = comparable to vSphere Virtual HW instead, = correct?</div></div></div></div></div></blockquote><div><br = class=3D""></div>yes</div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D"">And = that this functionality is provided actually by qemu-kvm-ev (and perhaps = in junction with = seabios?).</div></div></div></div></div></blockquote><div><br = class=3D""></div>yes. By using -7.2.0 type you=E2=80=99re basically just = using the backward compatibility code. Likely there was some change in = how the hardware looks like in the guest which affected ESXi nesting for = some CPUs</div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D"">If I = run </div><div class=3D""><br class=3D""></div><div class=3D"">rpm = -q --changelog qemu-kvm-ev I see in fact</div><div class=3D""><br = class=3D""></div><div class=3D"">...</div><div class=3D""><div = class=3D"">* Mon Jun 06 2016 Miroslav Rezanina <<a = href=3D"mailto:mrezanin@redhat.com" class=3D"">mrezanin@redhat.com</a>>= - rhev-2.6.0-5.el7</div></div><div class=3D"">...</div><div = class=3D""><div class=3D"">- = kvm-pc-New-default-pc-i440fx-rhel7.3.0-machine-type.patch = [bz#1305121]</div></div><div class=3D"">...</div><div class=3D""><br = class=3D""></div><div class=3D"">So it means that at a certain point, = the default machine type used by qemu-kvm-ev has become 7.3 and this = generates problems in my specific lab environment now (not searching = "official" support for it.. ;-).</div><div class=3D"">For the other = ordinary L2 VMs definedinside this oVirt nested environment, I can set = in System --> Advanced Parameters --> Custom Emulated Machine the = value pc-i440fx-rhel7.2.0 and I'm ok and they are able to = start.</div><div class=3D"">The problem still remains for the engine vm = itself, where I cannot manually set = it.</div></div></div></div></div></blockquote><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D"">Possibly = is there a qemu-kvm-ev overall system configuration where I can tell to = force emulated machine type to pc-i440fx-rhel7.2.0 (without = downgrading = qemu-kvm-ev)?</div></div></div></div></div></blockquote><div><br = class=3D""></div>I suppose you can define it in HE OVF? Didi? That would = be cleaner.</div><div>You can also use a vdsm hook just for = that...</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div= class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><br class=3D""></div><div = class=3D"">Otherwise I know that when I have to poweroff/restart the = engine vm I have to manually start it in 7.2 mode, as I'm testing right = now.</div><div class=3D""><br class=3D""></div><div class=3D"">Hope I = have clarified better...</div><div class=3D""><br class=3D""></div><div = class=3D"">Gianluca</div></div></div></div> </div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_B2BF5B68-E5BD-4998-9F5D-5188B3CB9653--

In the 'hosted-engine' script itself, in the function cmd_vm_start, there is a comment: # TODO: Check first the sanlock status, and if allows:
Perhaps ha-agent checks sanlock status before starting the VM? Adding Martin.
QEMU does that by itself. It starts, asks for a lease and dies if it can't get it. So:
I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour?
This is how it should behave, unless the reason for it is something else. Martin On Mon, Jul 10, 2017 at 8:11 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jul 9, 2017 at 11:12 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Sun, Jul 9, 2017 at 9:54 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello. I'm on 4.1.3 with self hosted engine and glusterfs as storage. I updated the kernel on engine so I executed these steps:
- enable global maintenace from the web admin gui - wait some minutes - shutdown the engine vm from inside its OS - wait some minutes - execute on one host [root@ovirt02 ~]# hosted-engine --set-maintenance --mode=none
I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour?
In the 'hosted-engine' script itself, in the function cmd_vm_start, there is a comment: # TODO: Check first the sanlock status, and if allows:
Perhaps ha-agent checks sanlock status before starting the VM? Adding Martin.
Please also check/share agent.log.
It seems somehow dangerous to me..
And I don't know how related, but the engine vm doesn't come up. Connecting to its vnc console I get it "booting from hard disk" ....: https://drive.google.com/file/d/0BwoPbcrMv8mvOEJWeVRvNThmTWc/view?usp=sharin...
Gluster volume for the engine vm storage domain seems ok...
[root@ovirt01 vdsm]# gluster volume heal engine info Brick ovirt01.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0
Brick ovirt02.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0
Brick ovirt03.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0
[root@ovirt01 vdsm]#
and in HostedEngine.log
2017-07-09 19:59:20.660+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7.10.1), hostname: ovirt01.localdomain.local LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+rtm,+hle -m 6144 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 87fd6bdb-535d-45b8-81d4-7e3101a6c364 -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=564D777E-B638-E808-9044-680BA4957704,uuid=87fd6bdb-535d-45b8-81d4-7e3101a6c364' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-07-09T19:59:20,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/run/vdsm/storage/e9e4a478-f391-42e5-9bb8-ed22a33e5cab/cf8b8f4e-fa01-457e-8a96-c5a27f8408f8/94c46bac-0a9f-49e8-9188-627fa0caf2b6,format=raw,if=none,id=drive-virtio-disk0,serial=cf8b8f4e-fa01-457e-8a96-c5a27f8408f8,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:0a:e7:ba,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/87fd6bdb-535d-45b8-81d4-7e3101a6c364.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on char device redirected to /dev/pts/1 (label charconsole0) warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi

On Mon, Jul 10, 2017 at 4:45 PM, Martin Sivak <msivak@redhat.com> wrote:
In the 'hosted-engine' script itself, in the function cmd_vm_start, there is a comment: # TODO: Check first the sanlock status, and if allows:
Perhaps ha-agent checks sanlock status before starting the VM? Adding Martin.
QEMU does that by itself. It starts, asks for a lease and dies if it can't get it.
So:
I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour?
This is how it should behave, unless the reason for it is something else.
Martin
Ok. My question was: no operating system is loaded at all before getting the lease, so that don't have any risk of corruption? It seems so and that only the qemu process is started... Thanks Gianluca

My question was: no operating system is loaded at all before getting the lease, so that don't have any risk of corruption?
That is exactly what it is supposed to do. The process dies before starting the OS. MArtin On Mon, Jul 10, 2017 at 5:45 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jul 10, 2017 at 4:45 PM, Martin Sivak <msivak@redhat.com> wrote:
In the 'hosted-engine' script itself, in the function cmd_vm_start, there is a comment: # TODO: Check first the sanlock status, and if allows:
Perhaps ha-agent checks sanlock status before starting the VM? Adding Martin.
QEMU does that by itself. It starts, asks for a lease and dies if it can't get it.
So:
I see that the qemu-kvm process for the engine starts on two hosts and then on one of them it gets a "kill -15" and stops Is it expected behaviour?
This is how it should behave, unless the reason for it is something else.
Martin
Ok. My question was: no operating system is loaded at all before getting the lease, so that don't have any risk of corruption? It seems so and that only the qemu process is started... Thanks Gianluca
participants (4)
-
Gianluca Cecchi
-
Martin Sivak
-
Michal Skrivanek
-
Yedidyah Bar David