Hi,

I think I found the problem - our definition of CPU types in the database is not correct. We do autodetection of the CPU type based on the CPU flags but they are not in sync with what we send to the VDSM.

https://github.com/oVirt/ovirt-engine/blob/874e390a40ee2f23ea108955e2946c7c419f067e/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql#L1067

And yes, it happened for both your servers, Secure Intel Cascadelake Server Family and Secure AMD EPYC.

As for the he_cluster_cpu_type option, I think those are taken from the DB configuration as well. In your case it should be "Intel Cascadelake Server Family".

Regards,

Lucia

On Fri, May 29, 2020 at 11:39 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Fri, May 29, 2020 at 9:34 AM Simone Tiraboschi <stirabos@redhat.com> wrote:


On Thu, May 28, 2020 at 11:56 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Thu, May 28, 2020 at 3:09 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

[snip]


for the cluster type in the mean time I was able to change it to "Intel Cascadelake Server Family" from web admin gui and now I have to try these steps and see if engine starts automatically without manual operations

1) set global maintenance
2) shutdown engine
3) exit maintenance
4) see if the engine vm starts without the cpu flag....


I confirm that point 4) was successful and engine vm was able to autostart, after changing cluster type.

As expected,
in my opinion now the point is just about understanding why the engine detected your host with the wrong CPU features set.

To be fully honest, as you can see in https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/README.md#L46 , we already have a variable (he_cluster_cpu_type) to force a cluster CPU type from the ansible role but I don't think is exposed in the interactive installer.
 
 
Can I artificially set it into a playbook, just to verify correct completion of setup workflow or do you think that it will be any way overwritten at run time by what detected?
It is not clear in my opinion what does it mean the sentence: "cluster CPU type to be used in hosted-engine cluster (the same as HE host or lower)"
With "as HE host" does it mean what gotten from vdsm capabilities or what?
 
That one is just a leftover from the install process.
It's normally automatically cleaned up as one of the latest actions in the ansible role used for the deployment.
I suspect that, due to the wrongly detected CPU type, in your case something failed really close to the end of the deployment and so the leftover: you can safely manually delete it.
 

Yes, the deploy failed because it was not anle to detected final engine as up....

As asked by Lucia, the Origin of the VM was "External"
The VM had no disks and no network interfaces. I was able to remove it without problems at the moment.
Thanks,
Gianluca