
On Thu, May 28, 2020 at 10:22 AM Lucia Jelinkova <ljelinko@redhat.com> wrote:
This might help: https://lists.ovirt.org/pipermail/users/2017-January/079157.html
On Thu, May 28, 2020 at 10:17 AM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
On Wed, May 27, 2020 at 4:13 PM Mark R <ovirtlist@beanz.33mail.com> wrote:
Replying to my own post here, I've verified that it's currently not possible to do a greenfield deploy of oVirt 4.4.0 w/ hosted engine on EPYC CPUs due to the system setting a requirement of 'virt-ssbd' for the final HE definition after the move to shared storage. The local HE runs perfectly through the setup process because it correctly uses 'amd-ssbd' but unfortunately that doesn't stick after the disks are moved.
You can work around it via 'virsh -r dumpxml HostedEngine > /tmp/he.xml', then editing that file to simply change 'virt-ssbd' to 'amd-ssbd'. Start it via 'virsh create /tmp/he.xml' and now it runs fine. You can get into the admin interface and if you want to run something hacky could at this point change the cluster CPU type from 'Secure AMD EPYC' to just 'AMD EPYC', take everything down and bring it back up cleanly... hosted engine will now work because the requirement for virt-ssbd (not available on EPYC when using CentOS 8.1 or presumably RHEL 8, amd-ssbd is needed) is gone.
I would like to test it during final stage of deployment, but the "virsh create" command requires a password.
[root@novirt2 log]# virsh create /tmp/he.xml Please enter your authentication name:
I don't remember how to setup a user so that I can run it on oVirt host.. any input?
Just for my testing lab in 4.4. obviously... Thans, Gianluca
Thanks Lucia, in the mean time I found another e-mail reporting the same credentials stored in that file (that are static and not randomly generated..).
Unfortunately the strategy works to have the hosted engine vm started, but it seems it is not recognized... So I'm in the case of being during single host HCI deployment and in the final stage with the unsupported tsx-ctrl cpu flag, see this thread tto: https://lists.ovirt.org/archives/list/users@ovirt.org/thread/5LBCJGWTVRVTEWC... Strange that in /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-ansible-create_target_vm-202042810039-c8h9k9.log I have this 2020-05-28 10:01:16,234+0200 DEBUG var changed: host "localhost" var "server_cpu_dict" type "<class 'dict'>" value: "{ "AMD EPYC": "EPYC", "AMD Opteron G4": "Opteron_G4", "AMD Opteron G5": "Opteron_G5", "IBM POWER8": "POWER8", "IBM POWER9": "POWER9", "IBM z114, z196": "z196-base", "IBM z13s, z13": "z13-base", "IBM z14": "z14-base", "IBM zBC12, zEC12": "zEC12-base", "Intel Broadwell Family": "Broadwell-noTSX", "Intel Cascadelake Server Family": "Cascadelake-Server,-hle,-rtm,+arch-capabilities", "Intel Haswell Family": "Haswell-noTSX", "Intel IvyBridge Family": "IvyBridge", "Intel Nehalem Family": "Nehalem", "Intel SandyBridge Family": "SandyBridge", "Intel Skylake Client Family": "Skylake-Client,-hle,-rtm", "Intel Skylake Server Family": "Skylake-Server,-hle,-rtm", "Intel Westmere Family": "Westmere", "Secure AMD EPYC": "EPYC,+ibpb,+virt-ssbd", "Secure Intel Broadwell Family": "Broadwell-noTSX,+spec-ctrl,+ssbd,+md-clear", "Secure Intel Cascadelake Server Family": "Cascadelake-Server,+md-clear,+mds-no,-hle,-rtm,+tsx-ctrl,+arch-capabilities", "Secure Intel Haswell Family": "Haswell-noTSX,+spec-ctrl,+ssbd,+md-clear", "Secure Intel IvyBridge Family": "IvyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear", "Secure Intel Nehalem Family": "Nehalem,+spec-ctrl,+ssbd,+md-clear", "Secure Intel SandyBridge Family": "SandyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear", "Secure Intel Skylake Client Family": "Skylake-Client,+spec-ctrl,+ssbd,+md-clear,-hle,-rtm", "Secure Intel Skylake Server Family": "Skylake-Server,+spec-ctrl,+ssbd,+md-clear,-hle,-rtm", "Secure Intel Westmere Family": "Westmere,+pcid,+spec-ctrl,+ssbd,+md-clear" }" and the tsx-ctrl flag is there..... but when during deployment it starts the final engine vm after copying the local one to gluster storage, it gets: May 28 10:06:59 novirt2 journal[13368]: unsupported configuration: unknown CPU feature: tsx-ctrl The VM is started and then stopped and so on, so I try some times the dump command virsh -r dumpxml HostedEngine > /tmp/he.xml until I catch it then I edit the file and remove the line with the tsx-ctrl flag and start the vm [root@novirt2 log]# virsh create /tmp/he.xml Please enter your authentication name: vdsm@ovirt Please enter your password: Domain HostedEngine created from /tmp/he.xml So far so good the engine vm starts and it is reachable on its final hostname and engine service started and accessible web admin gui, but it is not recognized by the host The qemu-kvm command is this one: qemu 20826 1 99 10:16 ? 00:00:07 /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-HostedEngine/master-key.aes -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off -cpu Cascadelake-Server,md-clear=on,mds-no=on,hle=off,rtm=off,arch-capabilities=on -m size=16777216k,slots=16,maxmem=67108864k -overcommit mem-lock=off -smp 2,maxcpus=32,sockets=16,cores=2,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0-31,mem=16384 -uuid b572d924-b278-41c7-a9da-52c4f590aac1 -smbios type=1,manufacturer=oVirt,product=RHEL,version=8-1.1911.0.9.el8,serial=d584e962-5461-4fa5-affa-db413e17590c,uuid=b572d924-b278-41c7-a9da-52c4f590aac1,family=oVirt -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,fd=40,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2020-05-28T08:16:29,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 -device pcie-root-port,port=0x18,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 -device pcie-root-port,port=0x19,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 -device pcie-root-port,port=0x1a,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 -device pcie-root-port,port=0x1b,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 -device pcie-root-port,port=0x1c,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 -device pcie-root-port,port=0x1d,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 -device pcie-root-port,port=0x1e,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6 -device pcie-root-port,port=0x1f,chassis=16,id=pci.16,bus=pcie.0,addr=0x3.0x7 -device pcie-root-port,port=0x20,chassis=17,id=pci.17,bus=pcie.0,addr=0x4 -device pcie-pci-bridge,id=pci.18,bus=pci.1,addr=0x0 -device qemu-xhci,p2=8,p3=8,id=ua-b630a65c-8156-4542-b8e8-98b4d2c48f67,bus=pci.4,addr=0x0 -device virtio-scsi-pci,iothread=iothread1,id=ua-f6a70e5e-ebea-4468-a59a-cc760b9bcea4,bus=pci.5,addr=0x0 -device virtio-serial-pci,id=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad,max_ports=16,bus=pci.3,addr=0x0 -drive if=none,id=drive-ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,readonly=on -device ide-cd,bus=ide.2,drive=drive-ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,id=ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,werror=report,rerror=report -drive file=/var/run/vdsm/storage/3df8f6d4-d572-4d2b-9ab2-8abc456a396f/df02bff9-2c4b-4e14-a0a3-591a84ccaed9/bf435645-2999-4fb2-8d0e-5becab5cf389,format=raw,if=none,id=drive-ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,cache=none,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.6,addr=0x0,drive=drive-ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,id=ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,bootindex=1,write-cache=on,serial=df02bff9-2c4b-4e14-a0a3-591a84ccaed9,werror=stop,rerror=stop -netdev tap,fds=42:43,id=hostua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,vhost=on,vhostfds=44:45 -device virtio-net-pci,mq=on,vectors=6,host_mtu=1500,netdev=hostua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,id=ua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,mac=00:16:3e:0a:96:80,bus=pci.2,addr=0x0 -chardev socket,id=charserial0,fd=46,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=47,server,nowait -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=48,server,nowait -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -chardev socket,id=charchannel3,fd=49,server,nowait -device virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=4,chardev=charchannel3,id=channel3,name=org.ovirt.hosted-engine-setup.0 -vnc 172.19.0.224:0 -k en-us -spice port=5901,tls-port=5902,addr=172.19.0.224,disable-ticketing,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-ac7aed4d-d824-40f9-aaa8-1c0be702e38c,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device intel-hda,id=ua-c7078e28-5585-4866-bdbd-528ebddd8854,bus=pci.18,addr=0x1 -device hda-duplex,id=ua-c7078e28-5585-4866-bdbd-528ebddd8854-codec0,bus=ua-c7078e28-5585-4866-bdbd-528ebddd8854.0,cad=0 -device virtio-balloon-pci,id=ua-fc4f6b20-0b17-4198-b059-b5753893584d,bus=pci.7,addr=0x0 -object rng-random,id=objua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,filename=/dev/urandom -device virtio-rng-pci,rng=objua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,id=ua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,bus=pci.8,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on The status remains as down [root@novirt2 log]# hosted-engine --vm-status --== Host novirt2.example.net (id: 1) status ==-- Host ID : 1 Host timestamp : 34180 Score : 0 Engine status : {"vm": "down_unexpected", "health": "bad", "detail": "Down", "reason": "bad vm status"} Hostname : novirt2.example.net Local maintenance : False stopped : False crc32 : b297faaa conf_on_shared_storage : True local_conf_timestamp : 34180 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=34180 (Thu May 28 10:17:57 2020) host-id=1 score=0 vm_conf_refresh_time=34180 (Thu May 28 10:17:57 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 10:30:29 1970 [root@novirt2 log]# Also restarting ovirt-ha-agent on host doesn't change the things.... In agent log i have MainThread::ERROR::2020-05-28 10:23:45,817::hosted_engine::953::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) Failed to stop engine VM: Command VM.destroy with args {'vmID': 'b572d924-b278-41c7-a9da-52c4f590aac1'} failed: (code=1, message=Virtual machine does not exist: {'vmId': 'b572d924-b278-41c7-a9da-52c4f590aac1'}) MainThread::INFO::2020-05-28 10:23:45,843::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineForceStop-ReinitializeFSM) sent? ignored MainThread::INFO::2020-05-28 10:23:45,850::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state ReinitializeFSM (score: 0) that is strange because the VM has started with that uuid. The qemu-kvm command infact contains the same uuid referred in agent.log: "uuid=b572d924-b278-41c7-a9da-52c4f590aac1" In the mean time the deploy fails because from host point of view it was not able to reach up state..... and to test I put in global maintenance. I can access the engine web admin portal but it seems it is recognized as a sort of external engine....? Perhaps I have to remove some other line inside the generated he.xml file to have the engine recognized? See screenshots from web admin view about hosted engine vm, the cluster type that has been setup by the deploy and the storage view (data and vm storage domains not appearing, only the engine one): https://drive.google.com/file/d/1by4mEOo3iQv1fMbRsqGvZCEOUFUJbPk4/view?usp=s... https://drive.google.com/file/d/199tOdLfSnWIb_rxCGM0zpgOjF3XFLUFm/view?usp=s... https://drive.google.com/file/d/1jVaMGvImRhyf3xQx5gn6bCu6OyfZ1Fq_/view?usp=s... and content of he.xml https://drive.google.com/file/d/1vOBB0t-vKD5f_7wUUMaIcHxVDXQ4Cfbe/view?usp=s... any input to solve and at least try some features of 4.4 on this hw env? Thanks, Gianluca