On Thu, May 28, 2020 at 10:22 AM Lucia Jelinkova <ljelinko(a)redhat.com>
wrote:
This might help:
https://lists.ovirt.org/pipermail/users/2017-January/079157.html
On Thu, May 28, 2020 at 10:17 AM Gianluca Cecchi <
gianluca.cecchi(a)gmail.com> wrote:
>
>
> On Wed, May 27, 2020 at 4:13 PM Mark R <ovirtlist(a)beanz.33mail.com>
> wrote:
>
>> Replying to my own post here, I've verified that it's currently not
>> possible to do a greenfield deploy of oVirt 4.4.0 w/ hosted engine on EPYC
>> CPUs due to the system setting a requirement of 'virt-ssbd' for the
final
>> HE definition after the move to shared storage. The local HE runs perfectly
>> through the setup process because it correctly uses 'amd-ssbd' but
>> unfortunately that doesn't stick after the disks are moved.
>>
>> You can work around it via 'virsh -r dumpxml HostedEngine >
>> /tmp/he.xml', then editing that file to simply change 'virt-ssbd' to
>> 'amd-ssbd'. Start it via 'virsh create /tmp/he.xml' and now it
runs fine.
>> You can get into the admin interface and if you want to run something hacky
>> could at this point change the cluster CPU type from 'Secure AMD EPYC'
to
>> just 'AMD EPYC', take everything down and bring it back up cleanly...
>> hosted engine will now work because the requirement for virt-ssbd (not
>> available on EPYC when using CentOS 8.1 or presumably RHEL 8, amd-ssbd is
>> needed) is gone.
>>
>>
> I would like to test it during final stage of deployment, but the "virsh
> create" command requires a password.
>
> [root@novirt2 log]# virsh create /tmp/he.xml
> Please enter your authentication name:
>
> I don't remember how to setup a user so that I can run it on oVirt host..
> any input?
>
> Just for my testing lab in 4.4. obviously...
> Thans,
> Gianluca
>
>
Thanks Lucia, in the mean time I found another e-mail reporting the same
credentials stored in that file (that are static and not randomly
generated..).
Unfortunately the strategy works to have the hosted engine vm started, but
it seems it is not recognized...
So I'm in the case of being during single host HCI deployment and in the
final stage with the unsupported tsx-ctrl cpu flag, see this thread tto:
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/5LBCJGWTVRVT...
Strange that in
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-ansible-create_target_vm-202042810039-c8h9k9.log
I have this
2020-05-28 10:01:16,234+0200 DEBUG var changed: host "localhost" var
"server_cpu_dict" type "<class 'dict'>" value: "{
"AMD EPYC": "EPYC",
"AMD Opteron G4": "Opteron_G4",
"AMD Opteron G5": "Opteron_G5",
"IBM POWER8": "POWER8",
"IBM POWER9": "POWER9",
"IBM z114, z196": "z196-base",
"IBM z13s, z13": "z13-base",
"IBM z14": "z14-base",
"IBM zBC12, zEC12": "zEC12-base",
"Intel Broadwell Family": "Broadwell-noTSX",
"Intel Cascadelake Server Family":
"Cascadelake-Server,-hle,-rtm,+arch-capabilities",
"Intel Haswell Family": "Haswell-noTSX",
"Intel IvyBridge Family": "IvyBridge",
"Intel Nehalem Family": "Nehalem",
"Intel SandyBridge Family": "SandyBridge",
"Intel Skylake Client Family": "Skylake-Client,-hle,-rtm",
"Intel Skylake Server Family": "Skylake-Server,-hle,-rtm",
"Intel Westmere Family": "Westmere",
"Secure AMD EPYC": "EPYC,+ibpb,+virt-ssbd",
"Secure Intel Broadwell Family":
"Broadwell-noTSX,+spec-ctrl,+ssbd,+md-clear",
"Secure Intel Cascadelake Server Family":
"Cascadelake-Server,+md-clear,+mds-no,-hle,-rtm,+tsx-ctrl,+arch-capabilities",
"Secure Intel Haswell Family":
"Haswell-noTSX,+spec-ctrl,+ssbd,+md-clear",
"Secure Intel IvyBridge Family":
"IvyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear",
"Secure Intel Nehalem Family":
"Nehalem,+spec-ctrl,+ssbd,+md-clear",
"Secure Intel SandyBridge Family":
"SandyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear",
"Secure Intel Skylake Client Family":
"Skylake-Client,+spec-ctrl,+ssbd,+md-clear,-hle,-rtm",
"Secure Intel Skylake Server Family":
"Skylake-Server,+spec-ctrl,+ssbd,+md-clear,-hle,-rtm",
"Secure Intel Westmere Family":
"Westmere,+pcid,+spec-ctrl,+ssbd,+md-clear"
}"
and the tsx-ctrl flag is there.....
but when during deployment it starts the final engine vm after copying the
local one to gluster storage, it gets:
May 28 10:06:59 novirt2 journal[13368]: unsupported configuration: unknown
CPU feature: tsx-ctrl
The VM is started and then stopped and so on, so I try some times the dump
command
virsh -r dumpxml HostedEngine > /tmp/he.xml
until I catch it
then I edit the file and remove the line with the tsx-ctrl flag and start
the vm
[root@novirt2 log]# virsh create /tmp/he.xml
Please enter your authentication name: vdsm@ovirt
Please enter your password:
Domain HostedEngine created from /tmp/he.xml
So far so good the engine vm starts and it is reachable on its final
hostname and engine service started and accessible web admin gui, but it is
not recognized by the host
The qemu-kvm command is this one:
qemu 20826 1 99 10:16 ? 00:00:07 /usr/libexec/qemu-kvm -name
guest=HostedEngine,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-HostedEngine/master-key.aes
-machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off -cpu
Cascadelake-Server,md-clear=on,mds-no=on,hle=off,rtm=off,arch-capabilities=on
-m size=16777216k,slots=16,maxmem=67108864k -overcommit mem-lock=off -smp
2,maxcpus=32,sockets=16,cores=2,threads=1 -object iothread,id=iothread1
-numa node,nodeid=0,cpus=0-31,mem=16384 -uuid
b572d924-b278-41c7-a9da-52c4f590aac1 -smbios
type=1,manufacturer=oVirt,product=RHEL,version=8-1.1911.0.9.el8,serial=d584e962-5461-4fa5-affa-db413e17590c,uuid=b572d924-b278-41c7-a9da-52c4f590aac1,family=oVirt
-no-user-config -nodefaults -device sga -chardev
socket,id=charmonitor,fd=40,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc
base=2020-05-28T08:16:29,driftfix=slew -global
kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -global
ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device
pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6
-device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7
-device
pcie-root-port,port=0x18,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3
-device
pcie-root-port,port=0x19,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1
-device
pcie-root-port,port=0x1a,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2
-device
pcie-root-port,port=0x1b,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3
-device
pcie-root-port,port=0x1c,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4
-device
pcie-root-port,port=0x1d,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5
-device
pcie-root-port,port=0x1e,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6
-device
pcie-root-port,port=0x1f,chassis=16,id=pci.16,bus=pcie.0,addr=0x3.0x7
-device pcie-root-port,port=0x20,chassis=17,id=pci.17,bus=pcie.0,addr=0x4
-device pcie-pci-bridge,id=pci.18,bus=pci.1,addr=0x0 -device
qemu-xhci,p2=8,p3=8,id=ua-b630a65c-8156-4542-b8e8-98b4d2c48f67,bus=pci.4,addr=0x0
-device
virtio-scsi-pci,iothread=iothread1,id=ua-f6a70e5e-ebea-4468-a59a-cc760b9bcea4,bus=pci.5,addr=0x0
-device
virtio-serial-pci,id=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad,max_ports=16,bus=pci.3,addr=0x0
-drive if=none,id=drive-ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,readonly=on
-device
ide-cd,bus=ide.2,drive=drive-ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,id=ua-fa671f6c-dc42-4c59-a66d-ccfa3d5d422b,werror=report,rerror=report
-drive
file=/var/run/vdsm/storage/3df8f6d4-d572-4d2b-9ab2-8abc456a396f/df02bff9-2c4b-4e14-a0a3-591a84ccaed9/bf435645-2999-4fb2-8d0e-5becab5cf389,format=raw,if=none,id=drive-ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,cache=none,aio=threads
-device
virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.6,addr=0x0,drive=drive-ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,id=ua-df02bff9-2c4b-4e14-a0a3-591a84ccaed9,bootindex=1,write-cache=on,serial=df02bff9-2c4b-4e14-a0a3-591a84ccaed9,werror=stop,rerror=stop
-netdev
tap,fds=42:43,id=hostua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,vhost=on,vhostfds=44:45
-device
virtio-net-pci,mq=on,vectors=6,host_mtu=1500,netdev=hostua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,id=ua-b29ca99f-a53e-4de7-8655-b65ef4ba5dc4,mac=00:16:3e:0a:96:80,bus=pci.2,addr=0x0
-chardev socket,id=charserial0,fd=46,server,nowait -device
isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,fd=47,server,nowait -device
virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0
-chardev socket,id=charchannel1,fd=48,server,nowait -device
virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel2,name=vdagent -device
virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0
-chardev socket,id=charchannel3,fd=49,server,nowait -device
virtserialport,bus=ua-608f9599-30b2-4ee6-a0d3-d5fb588583ad.0,nr=4,chardev=charchannel3,id=channel3,name=org.ovirt.hosted-engine-setup.0
-vnc 172.19.0.224:0 -k en-us -spice
port=5901,tls-port=5902,addr=172.19.0.224,disable-ticketing,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on
-device
qxl-vga,id=ua-ac7aed4d-d824-40f9-aaa8-1c0be702e38c,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1
-device
intel-hda,id=ua-c7078e28-5585-4866-bdbd-528ebddd8854,bus=pci.18,addr=0x1
-device
hda-duplex,id=ua-c7078e28-5585-4866-bdbd-528ebddd8854-codec0,bus=ua-c7078e28-5585-4866-bdbd-528ebddd8854.0,cad=0
-device
virtio-balloon-pci,id=ua-fc4f6b20-0b17-4198-b059-b5753893584d,bus=pci.7,addr=0x0
-object
rng-random,id=objua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,filename=/dev/urandom
-device
virtio-rng-pci,rng=objua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,id=ua-c4c3e5e7-1c19-4582-a87c-4f3fee4a0ee5,bus=pci.8,addr=0x0
-sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on
The status remains as down
[root@novirt2 log]# hosted-engine --vm-status
--== Host
novirt2.example.net (id: 1) status ==--
Host ID : 1
Host timestamp : 34180
Score : 0
Engine status : {"vm": "down_unexpected",
"health":
"bad", "detail": "Down", "reason": "bad vm
status"}
Hostname :
novirt2.example.net
Local maintenance : False
stopped : False
crc32 : b297faaa
conf_on_shared_storage : True
local_conf_timestamp : 34180
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=34180 (Thu May 28 10:17:57 2020)
host-id=1
score=0
vm_conf_refresh_time=34180 (Thu May 28 10:17:57 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineUnexpectedlyDown
stopped=False
timeout=Thu Jan 1 10:30:29 1970
[root@novirt2 log]#
Also restarting ovirt-ha-agent on host doesn't change the things....
In agent log i have
MainThread::ERROR::2020-05-28
10:23:45,817::hosted_engine::953::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
Failed to stop engine VM: Command VM.destroy with args {'vmID':
'b572d924-b278-41c7-a9da-52c4f590aac1'} failed:
(code=1, message=Virtual machine does not exist: {'vmId':
'b572d924-b278-41c7-a9da-52c4f590aac1'})
MainThread::INFO::2020-05-28
10:23:45,843::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition
(EngineForceStop-ReinitializeFSM) sent? ignored
MainThread::INFO::2020-05-28
10:23:45,850::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state ReinitializeFSM (score: 0)
that is strange because the VM has started with that uuid. The qemu-kvm
command infact contains the same uuid referred in agent.log:
"uuid=b572d924-b278-41c7-a9da-52c4f590aac1"
In the mean time the deploy fails because from host point of view it was
not able to reach up state..... and to test I put in global maintenance.
I can access the engine web admin portal but it seems it is recognized as a
sort of external engine....?
Perhaps I have to remove some other line inside the generated he.xml file
to have the engine recognized?
See screenshots from web admin view about hosted engine vm, the cluster
type that has been setup by the deploy and the storage view (data and vm
storage domains not appearing, only the engine one):
https://drive.google.com/file/d/1by4mEOo3iQv1fMbRsqGvZCEOUFUJbPk4/view?us...
https://drive.google.com/file/d/199tOdLfSnWIb_rxCGM0zpgOjF3XFLUFm/view?us...
https://drive.google.com/file/d/1jVaMGvImRhyf3xQx5gn6bCu6OyfZ1Fq_/view?us...
and content of he.xml
https://drive.google.com/file/d/1vOBB0t-vKD5f_7wUUMaIcHxVDXQ4Cfbe/view?us...
any input to solve and at least try some features of 4.4 on this hw env?
Thanks,
Gianluca