On Thu, May 28, 2020 at 10:22 AM Lucia Jelinkova <ljelinko@redhat.com> wrote:
This might help: https://lists.ovirt.org/pipermail/users/2017-January/079157.html

On Thu, May 28, 2020 at 10:17 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:


On Wed, May 27, 2020 at 4:13 PM Mark R <ovirtlist@beanz.33mail.com> wrote:
Replying to my own post here, I've verified that it's currently not possible to do a greenfield deploy of oVirt 4.4.0 w/ hosted engine on EPYC CPUs due to the system setting a requirement of 'virt-ssbd' for the final HE definition after the move to shared storage. The local HE runs perfectly through the setup process because it correctly uses 'amd-ssbd' but unfortunately that doesn't stick after the disks are moved.

You can work around it via 'virsh -r dumpxml HostedEngine > /tmp/he.xml', then editing that file to simply change 'virt-ssbd' to 'amd-ssbd'. Start it via 'virsh create /tmp/he.xml' and now it runs fine. You can get into the admin interface and if you want to run something hacky could at this point change the cluster CPU type from 'Secure AMD EPYC' to just 'AMD EPYC', take everything down and bring it back up cleanly... hosted engine will now work because the requirement for virt-ssbd (not available on EPYC when using CentOS 8.1 or presumably RHEL 8, amd-ssbd is needed) is gone.


I would like to test it during final stage of deployment, but the "virsh create" command requires a password.

[root@novirt2 log]# virsh create /tmp/he.xml
Please enter your authentication name:

I don't remember how to setup a user so that I can run it on oVirt host.. any input?

Just for my testing lab in 4.4. obviously...
Thans,
Gianluca


Thanks Lucia, in the mean time I found another e-mail reporting the same credentials stored in that file (that are static and not randomly generated..).

Unfortunately the strategy works to have the hosted engine vm started, but it seems it is not recognized...

So I'm in the case of being during single host HCI deployment and in the final stage with the unsupported tsx-ctrl cpu flag, see this thread tto:
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/5LBCJGWTVRVTEWC5VSDQ2OINQ3OHKQ7K/

Strange that in /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-ansible-create_target_vm-202042810039-c8h9k9.log I have this

2020-05-28 10:01:16,234+0200 DEBUG var changed: host "localhost" var "server_cpu_dict" type "<class 'dict'>" value: "{
    "AMD EPYC": "EPYC",
    "AMD Opteron G4": "Opteron_G4",
    "AMD Opteron G5": "Opteron_G5",
    "IBM POWER8": "POWER8",
    "IBM POWER9": "POWER9",
    "IBM z114, z196": "z196-base",
    "IBM z13s, z13": "z13-base",
    "IBM z14": "z14-base",
    "IBM zBC12, zEC12": "zEC12-base",
    "Intel Broadwell Family": "Broadwell-noTSX",
    "Intel Cascadelake Server Family": "Cascadelake-Server,-hle,-rtm,+arch-capabilities",
    "Intel Haswell Family": "Haswell-noTSX",
    "Intel IvyBridge Family": "IvyBridge",
    "Intel Nehalem Family": "Nehalem",
    "Intel SandyBridge Family": "SandyBridge",
    "Intel Skylake Client Family": "Skylake-Client,-hle,-rtm",
    "Intel Skylake Server Family": "Skylake-Server,-hle,-rtm",
    "Intel Westmere Family": "Westmere",
    "Secure AMD EPYC": "EPYC,+ibpb,+virt-ssbd",
    "Secure Intel Broadwell Family": "Broadwell-noTSX,+spec-ctrl,+ssbd,+md-clear",
    "Secure Intel Cascadelake Server Family": "Cascadelake-Server,+md-clear,+mds-no,-hle,-rtm,+tsx-ctrl,+arch-capabilities",
    "Secure Intel Haswell Family": "Haswell-noTSX,+spec-ctrl,+ssbd,+md-clear",
    "Secure Intel IvyBridge Family": "IvyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear",
    "Secure Intel Nehalem Family": "Nehalem,+spec-ctrl,+ssbd,+md-clear",
    "Secure Intel SandyBridge Family": "SandyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear",
    "Secure Intel Skylake Client Family": "Skylake-Client,+spec-ctrl,+ssbd,+md-clear,-hle,-rtm",
    "Secure Intel Skylake Server Family": "Skylake-Server,+spec-ctrl,+ssbd,+md-clear,-hle,-rtm",
    "Secure Intel Westmere Family": "Westmere,+pcid,+spec-ctrl,+ssbd,+md-clear"
}"

and the tsx-ctrl flag is there.....

but when during deployment it starts the final engine vm after copying the local one to gluster storage, it gets:

May 28 10:06:59 novirt2 journal[13368]: unsupported configuration: unknown CPU feature: tsx-ctrl

The VM is started and then stopped and so on, so I try some times the dump command

virsh -r dumpxml HostedEngine > /tmp/he.xml

until I catch it
then I edit the file and remove the line with the tsx-ctrl flag and start the vm

[root@novirt2 log]# virsh create /tmp/he.xml
Please enter your authentication name: vdsm@ovirt
Please enter your password:
Domain HostedEngine created from /tmp/he.xml

So far so good the engine vm starts and it is reachable on its final hostname and engine service started and accessible web admin gui, but it is not recognized by the host

The qemu-kvm command is this one:

qemu     20826     1 99 10:16 ?        00:00:07 /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-HostedEngine/master-key.aes -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off -cpu Cascadelake-Server,md-clear=on,mds-no=on,hle=off,rtm=off,arch-capabilities=on -m size=16777216k,slots=16,maxmem=67108864k -overcommit mem-lock=off -smp 2,maxcpus=32,sockets=16,cores=2,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0-31,mem=16384 -uuid b572d924-b278-41c7-a9da-52c4f590aac1 -smbios type=1,manufacturer=oVirt,product=RHEL,version=8-1.1911.0.9.el8,serial=d584e962-5461-4fa5-affa-db413e17590c,uuid=b572d924-b278-41c7-a9da-52c4f590aac1,family=oVirt -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,fd=40,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2020-05-28T08:16:29,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 -device pcie-root-port,port=0x18,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 -device pcie-root-port,port=0x19,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 -device pcie-root-port,port=0x1a,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 -device pcie-root-port,port=0x1b,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 -device pcie-root-port,port=0x1c,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 -device pcie-root-port,port=0x1d,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 -device pcie-root-port,port=0x1e,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6 -device pcie-root-port,port=0x1f,chassis=16,id=pci.16,bus=pcie.0,addr=0x3.0x7 -device pcie-root-port,port=0x20,chassis=17,id=pci.17,bus=pcie.0,addr=0x4 -device pcie-pci-bridge,id=pci.18,bus=pci.1,addr=0x0 -device qemu-xhci,p2=8,p3=8,id=ua-b630a65c-8156-4542-b8e8-98b4d2c48f67,bus=pci.4,addr=0x0 -device virtio-scsi-pci,iothread=iothread1,id=ua-f6a70e5e-ebea-4468-a59a-cc760b9bcea4,bus=pci.5,addr=0x0 -device virtio-serial-pci,id=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad,max_ports=16,bus=pci.3,addr=0x0 -drive if=none,id=drive-ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,readonly=on -device ide-cd,bus=ide.2,drive=drive-ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,id=ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,werror=report,rerror=report -drive file=/var/run/vdsm/storage/3df8f6d4-d572-4d2b-9ab2-8abc456a396f/df02bff9-2c4b-4e14-a0a3-591a84ccaed9/bf435645-2999-4fb2-8d0e-5becab5cf389,format=raw,if=none,id=drive-ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,cache=none,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.6,addr=0x0,drive=drive-ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,id=ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,bootindex=1,write-cache=on,serial=df02bff9-2c4b-4e14-a0a3-591a84ccaed9,werror=stop,rerror=stop -netdev tap,fds=42:43,id=hostua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,vhost=on,vhostfds=44:45 -device virtio-net-pci,mq=on,vectors=6,host_mtu=1500,netdev=hostua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,id=ua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,mac=00:16:3e:0a:96:80,bus=pci.2,addr=0x0 -chardev socket,id=charserial0,fd=46,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=47,server,nowait -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=48,server,nowait -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -chardev socket,id=charchannel3,fd=49,server,nowait -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=4,chardev=charchannel3,id=channel3,name=org.ovirt.hosted-engine-setup.0 -vnc 172.19.0.224:0 -k en-us -spice port=5901,tls-port=5902,addr=172.19.0.224,disable-ticketing,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-ac7aed4d-d824-40f9-aaa8-1c0be702e38c,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device intel-hda,id=ua-c7078e28-5585-4866-bdbd-528ebddd8854,bus=pci.18,addr=0x1 -device hda-duplex,id=ua-c7078e28-5585-4866-bdbd-528ebddd8854-codec0,bus=ua-c7078e28-5585-4866-bdbd-528ebddd8854.0,cad=0 -device virtio-balloon-pci,id=ua-fc4f6b20-0b17-4198-b059-b5753893584d,bus=pci.7,addr=0x0 -object rng-random,id=objua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,filename=/dev/urandom -device virtio-rng-pci,rng=objua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,id=ua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,bus=pci.8,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

The status remains as down

[root@novirt2 log]# hosted-engine --vm-status


--== Host novirt2.example.net (id: 1) status ==--

Host ID                            : 1
Host timestamp                     : 34180
Score                              : 0
Engine status                      : {"vm": "down_unexpected", "health": "bad", "detail": "Down", "reason": "bad vm status"}
Hostname                           : novirt2.example.net
Local maintenance                  : False
stopped                            : False
crc32                              : b297faaa
conf_on_shared_storage             : True
local_conf_timestamp               : 34180
Status up-to-date                  : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=34180 (Thu May 28 10:17:57 2020)
host-id=1
score=0
vm_conf_refresh_time=34180 (Thu May 28 10:17:57 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineUnexpectedlyDown
stopped=False
timeout=Thu Jan  1 10:30:29 1970
[root@novirt2 log]#

Also restarting ovirt-ha-agent on host doesn't change the things....

In agent log i have

MainThread::ERROR::2020-05-28 10:23:45,817::hosted_engine::953::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) Failed to stop engine VM: Command VM.destroy with args {'vmID': 'b572d924-b278-41c7-a9da-52c4f590aac1'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': 'b572d924-b278-41c7-a9da-52c4f590aac1'})

MainThread::INFO::2020-05-28 10:23:45,843::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineForceStop-ReinitializeFSM) sent? ignored
MainThread::INFO::2020-05-28 10:23:45,850::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state ReinitializeFSM (score: 0)

that is strange because the VM has started with that uuid. The qemu-kvm command infact contains the same uuid referred in agent.log: "uuid=b572d924-b278-41c7-a9da-52c4f590aac1"

In the mean time the deploy fails because from host point of view it was not able to reach up state..... and to test I put in global maintenance.

I can access the engine web admin portal but it seems it is recognized as a sort of external engine....?
Perhaps I have to remove some other line inside the generated he.xml file to have the engine recognized?

See screenshots from web admin view about hosted engine vm, the cluster type that has been setup by the deploy and the storage view (data and vm storage domains not appearing, only the engine one):
https://drive.google.com/file/d/1by4mEOo3iQv1fMbRsqGvZCEOUFUJbPk4/view?usp=sharing
https://drive.google.com/file/d/199tOdLfSnWIb_rxCGM0zpgOjF3XFLUFm/view?usp=sharing
https://drive.google.com/file/d/1jVaMGvImRhyf3xQx5gn6bCu6OyfZ1Fq_/view?usp=sharing

and content of he.xml
https://drive.google.com/file/d/1vOBB0t-vKD5f_7wUUMaIcHxVDXQ4Cfbe/view?usp=sharing

any input to solve and at least try some features of 4.4 on this hw env?

Thanks,
Gianluca