So this looks like it is a bug in qemu/libvirtd

We had avic=1 set for kvm_amd, but when this is set the qemu capabilities cache showed all EPYC/AMD Variants as unusable and blocking due to missing 'x2apic'. My guess is that it should probably be looking for the avic flag instead. My other guess is that avic doesn't actually get enabled at all when it is turned on.

Even though I had disable avic earlier in testing, libvirt did not pickup the capabilities change until I cleared its cache.

I'm also assuming some of the verification code changes in ovirt from 4.2 to 4.3 or libvirt updated and exposed this.

Apparently qemu will just drop unsupported flags when starting a VM. Which is why this was working before.

Mystery solved.

Regards,

Ryan

On Sun, Feb 10, 2019 at 8:54 AM Greg Sheremeta <gshereme@redhat.com> wrote:
Thanks, Ryan.
I opened https://bugzilla.redhat.com/show_bug.cgi?id=1674265 to track this.

Greg

On Sat, Feb 9, 2019 at 5:50 PM Ryan Bullock <rrb3942@gmail.com> wrote:
Got a host activated!

1. Update host to 4.3
2. rm /var/cache/libvirt/qemu/capabilities/*.xml
3. systemctl restart libvirtd
4. Activate host

Seems like some kind of stuck state going from 4.2 -> 4.3

Hope this helps someone else.

On Sat, Feb 9, 2019 at 1:12 PM Ryan Bullock <rrb3942@gmail.com> wrote:
I tried that too, but it still complains about an unsupported CPU in the new cluster. Even if I leave the cluster level at 4.2, if I update the host to 4.3 it can't activate under a 4.2 cluster.
Makes me think something changed in how it verifies the CPU support and for some reason it is not liking my EPYC systems.

On Sat, Feb 9, 2019 at 10:18 AM Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
On Sat, Feb 9, 2019 at 7:43 PM Ryan Bullock <rrb3942@gmail.com> wrote:
>
> So I tried making a new cluster with a 4.2 compatibility level and moving one of my EPYC hosts into it. I then updated the host to 4.3 and switched the cluster version 4.3 + set cluster cpu to the new AMD EPYC IBPD SSBD (also tried plain AMD EPYC). It still fails to make the host operational complaining that 'CPU type is not supported in this cluster compatibility version or is not supported at all'.
>
When I did this with Epyc I made new cluster wth 4.3 level and Epyc
CPU. And then moved the nodes to it. Maybe try that? I also had to
move couple of VM's to new cluster because old cluster couldn't
upgrade with those. When nodes and couple problem VM's were in new
cluster I could upgrade old cluster to new level.

-Juhani


--

GREG SHEREMETA

SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX

Red Hat NA

gshereme@redhat.com    IRC: gshereme