Some feedback of an upgrade from 4.2 to 4.3 (EPYC Cluster)

Hi, 1) Thank you all for the official support of the AMD EPYC processors in version 4.3! :-) 2) During the upgrade process I removed the old guest ISO file and replaced it with the new one (different filename of course). As I moved a virtual machine with the old ISO file mounted, the vm crashed. "Migration failed: Lost connection with qemu process (VM: <vmname>, Source: <ovirt-node-host>)." Yes I know this is not real bug (dumb user), but maybe it is good preventing the users for doing something stupid ;-) 3) The log contains following messages after upgrading the node: "VDSM <ovirt-node-host> command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] genev_sys_6081 is not present in the system'}" This stops after some time ~20-30 minutes, but comes back after a restart of a node. And disappears again after some time. At the moment we have no problems and not taking this as a real issue. 4) If we are running Windows 10 guests newer than build version 1709, the vm crashes on our EPYC cluster, on the SandyBridge cluster the issue doesn't exists. The reason is the wrong CPU model of QEMU, see: <https://bugzilla.redhat.com/show_bug.cgi?id=1593190> We fixed this problem with a line in "/etc/modprobe.d/kvm.conf" "options kvm ignore_msrs=1". Thank you all for this great piece of software :-) Regard Tobias

On Mon, Apr 29, 2019 at 3:18 PM Tobias Scheinert <tobias.scheinert@uni-ulm.de> wrote:
Hi,
1) Thank you all for the official support of the AMD EPYC processors in version 4.3! :-)
:-)
2) During the upgrade process I removed the old guest ISO file and replaced it with the new one (different filename of course). As I moved a virtual machine with the old ISO file mounted, the vm crashed.
"Migration failed: Lost connection with qemu process (VM: <vmname>, Source: <ovirt-node-host>)."
Yes I know this is not real bug (dumb user), but maybe it is good preventing the users for doing something stupid ;-)
Would you like to open a bugzilla bug to track this?
3) The log contains following messages after upgrading the node:
"VDSM <ovirt-node-host> command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] genev_sys_6081 is not present in the system'}"
This stops after some time ~20-30 minutes, but comes back after a restart of a node. And disappears again after some time. At the moment we have no problems and not taking this as a real issue.
No idea, but I see this is discussed in a separate thread currently.
4) If we are running Windows 10 guests newer than build version 1709, the vm crashes on our EPYC cluster, on the SandyBridge cluster the issue doesn't exists. The reason is the wrong CPU model of QEMU, see: <https://bugzilla.redhat.com/show_bug.cgi?id=1593190>
We fixed this problem with a line in "/etc/modprobe.d/kvm.conf" "options kvm ignore_msrs=1".
Thank you all for this great piece of software :-)
Thanks for the report! Best regards, -- Didi

On 30 Apr 2019, at 07:51, Yedidyah Bar David <didi@redhat.com> wrote:
On Mon, Apr 29, 2019 at 3:18 PM Tobias Scheinert <tobias.scheinert@uni-ulm.de> wrote:
Hi,
1) Thank you all for the official support of the AMD EPYC processors in version 4.3! :-)
:-)
2) During the upgrade process I removed the old guest ISO file and replaced it with the new one (different filename of course). As I moved a virtual machine with the old ISO file mounted, the vm crashed.
"Migration failed: Lost connection with qemu process (VM: <vmname>, Source: <ovirt-node-host>)."
Yes I know this is not real bug (dumb user), but maybe it is good preventing the users for doing something stupid ;-)
how did you remove it exactly? deleted from the actual NFS storage? If yes then there’s no way we could prevent that
Would you like to open a bugzilla bug to track this?
3) The log contains following messages after upgrading the node:
"VDSM <ovirt-node-host> command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] genev_sys_6081 is not present in the system'}"
This stops after some time ~20-30 minutes, but comes back after a restart of a node. And disappears again after some time. At the moment we have no problems and not taking this as a real issue.
No idea, but I see this is discussed in a separate thread currently.
4) If we are running Windows 10 guests newer than build version 1709, the vm crashes on our EPYC cluster, on the SandyBridge cluster the issue doesn't exists. The reason is the wrong CPU model of QEMU, see: <https://bugzilla.redhat.com/show_bug.cgi?id=1593190>
We fixed this problem with a line in "/etc/modprobe.d/kvm.conf" "options kvm ignore_msrs=1”.
it may take time to fix. Fedora hosts might work, but they have different issues and I wouldn’t recommend them in general. So if that workaround is fine for now please keep using that until a fixed qemu find its way to CentOS. Thanks, michal
Thank you all for this great piece of software :-)
Thanks for the report!
Best regards, -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/3LFGQIFKVOD6XR...
participants (3)
-
Michal Skrivanek
-
Tobias Scheinert
-
Yedidyah Bar David