All host non operational after engine upgrade

Greetings. Upgraded my Engine from a pre 3.6 version (not sure about the version, but it's only a month old or so) to RC today and found to my great surprise all hosts marked as Non Operational. The cluster working fine just hours ago now have some mismatch with my hosts CPUs. The cluster is set to Opteron G4 and all hosts CPU have Opteron 4226 installed. 2015-10-19 20:23:09,509 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) [36da0975] Correlation ID: 36da0975, Job ID: 44cf7ac9-5fe1-4151-9f6f-f1396124f179, Call Stack: null, Custom Event ID: -1, Message: Host patty.elementary.se does not comply with the cluster AMD_G4 emulated machines. The current cluster compatibility level supports [pc-i440fx-rhel7.2.0, pc-i440fx-2.1, pseries-rhel7.2.0] and the host emulated machines are pc-i440fx-rhel7.1.0,rhel6.3.0,pc-q35-rhel7.0.0,rhel6.1.0,rhel6.6.0,rhel6.2.0,pc,pc-q35-rhel7.1.0,q35,rhel6.4.0,rhel6.0.0,rhel6.5.0,pc-i440fx-rhel7.0.0. The error above look qemu to me so I tried reinstalling one of the node with the latest Ovirt-node snapshot downloaded from Jenkins --> http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.6_create-iso-el7_merged/last... That made however no difference. I could use some help.. TIA Rgds Jonas

On Mon, Oct 19, 2015 at 8:40 PM, Jonas Israelsson <jonas@israelsson.com> wrote:
Greetings.
Upgraded my Engine from a pre 3.6 version (not sure about the version, but it's only a month old or so) to RC today and found to my great surprise all hosts marked as Non Operational.
The cluster working fine just hours ago now have some mismatch with my hosts CPUs. The cluster is set to Opteron G4 and all hosts CPU have Opteron 4226 installed.
2015-10-19 20:23:09,509 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) [36da0975] Correlation ID: 36da0975, Job ID: 44cf7ac9-5fe1-4151-9f6f-f1396124f179, Call Stack: null, Custom Event ID: -1, Message: Host patty.elementary.se does not comply with the cluster AMD_G4 emulated machines. The current cluster compatibility level supports [pc-i440fx-rhel7.2.0, pc-i440fx-2.1, pseries-rhel7.2.0] and the host emulated machines are pc-i440fx-rhel7.1.0,rhel6.3.0,pc-q35-rhel7.0.0,rhel6.1.0,rhel6.6.0,rhel6.2.0,pc,pc-q35-rhel7.1.0,q35,rhel6.4.0,rhel6.0.0,rhel6.5.0,pc-i440fx-rhel7.0.0.
The error above look qemu to me so I tried reinstalling one of the node with the latest Ovirt-node snapshot downloaded from Jenkins --> http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.6_create-iso-el7_merged/last...
That made however no difference.
The issue seams to be in node: the latest engine build requires pc-i440fx-rhel7.2.0 which comes with qemu 2.3 while probably node is still built from centos 7.1 with an older qemu. For standard el7 host we are releasing it in the oVirt repo ( http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-kvm-ev-2.3.... ) but it seams that we didn't included it in node. Please wait for a new build of node with that. If you are in hurry you can also try to manually fix on your hosts: $ remount / re-write $ cd /tmp $ wget all the qemu 2.3 rpm-files from http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/ (there is no yum on node) $ rpm … to install/update the involved rpms
I could use some help..
TIA
Rgds Jonas _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Thanks a million for looking into this.
Please wait for a new build of node with that. If you are in hurry you can also try to manually fix on your hosts:
$ remount / re-write $ cd /tmp $ wget all the qemu 2.3 rpm-files from http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/ (there is no yum on node) $ rpm … to install/update the involved rpms
You are absolutely right and after updating the qemu-packages along with it's depending librarys I was able to bring the node back online. This is what I had to do <snip> mount -o remount,rw / cd /tmp wget http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/libunwind-1.1-5.... wget http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/gperftools-libs-... wget http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-kvm-ev-2.3.... wget http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-kvm-common-... wget http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-img-ev-2.3.... wget http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-kvm-tools-e... # Libs rpm -Uhv libunwind-1.1-5.el7.x86_64.rpm rpm -Uhv gperftools-libs-2.4-7.el7.x86_64.rpm # Qemu rpm -Uvh qemu-kvm-common-ev-2.3.0-29.1.el7.x86_64.rpm --nodeps rpm -Uvh qemu-img-ev-2.3.0-29.1.el7.x86_64.rpm --nodeps rpm -Uhv qemu-kvm-tools-ev-2.3.0-29.1.el7.x86_64.rpm rpm -Uhv qemu-kvm-ev-2.3.0-29.1.el7.x86_64.rpm systemctl restart vdsmd Activate host </snip>

This is a multi-part message in MIME format. --------------090505070903010701010106 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 2015-10-20 09:54, Simone Tiraboschi wrote:
On Mon, Oct 19, 2015 at 8:40 PM, Jonas Israelsson <jonas@israelsson.com <mailto:jonas@israelsson.com>> wrote:
Greetings.
Upgraded my Engine from a pre 3.6 version (not sure about the version, but it's only a month old or so) to RC today and found to my great surprise all hosts marked as Non Operational.
The cluster working fine just hours ago now have some mismatch with my hosts CPUs. The cluster is set to Opteron G4 and all hosts CPU have Opteron 4226 installed.
2015-10-19 20:23:09,509 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) [36da0975] Correlation ID: 36da0975, Job ID: 44cf7ac9-5fe1-4151-9f6f-f1396124f179, Call Stack: null, Custom Event ID: -1, Message: Host patty.elementary.se <http://patty.elementary.se> does not comply with the cluster AMD_G4 emulated machines. The current cluster compatibility level supports [pc-i440fx-rhel7.2.0, pc-i440fx-2.1, pseries-rhel7.2.0] and the host emulated machines are pc-i440fx-rhel7.1.0,rhel6.3.0,pc-q35-rhel7.0.0,rhel6.1.0,rhel6.6.0,rhel6.2.0,pc,pc-q35-rhel7.1.0,q35,rhel6.4.0,rhel6.0.0,rhel6.5.0,pc-i440fx-rhel7.0.0.
The error above look qemu to me so I tried reinstalling one of the node with the latest Ovirt-node snapshot downloaded from Jenkins --> http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.6_create-iso-el7_merged/last...
That made however no difference.
The issue seams to be in node: the latest engine build requires pc-i440fx-rhel7.2.0 which comes with qemu 2.3 while probably node is still built from centos 7.1 with an older qemu. For standard el7 host we are releasing it in the oVirt repo ( http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-kvm-ev-2.3.... ) but it seams that we didn't included it in node.
You know if that version bump affects all architectures (not only my AMD famliy) and the iso-node is to be considered bricked ? If so does not this candidate as a blocker, or is the iso-node not part of the 3.6 release ? --------------090505070903010701010106 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <br> <br> <div class="moz-cite-prefix">On 2015-10-20 09:54, Simone Tiraboschi wrote:<br> </div> <blockquote cite="mid:CAN8-ONpvXks2eintZfgqi31m6WB5JEt3eYdXJ2KgkdvM3AZ4TQ@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Mon, Oct 19, 2015 at 8:40 PM, Jonas Israelsson <span dir="ltr"><<a moz-do-not-send="true" href="mailto:jonas@israelsson.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:jonas@israelsson.com">jonas@israelsson.com</a></a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Greetings.<br> <br> Upgraded my Engine from a pre 3.6 version (not sure about the version, but it's only a month old or so) to RC today and found to my great surprise all hosts marked as Non Operational.<br> <br> The cluster working fine just hours ago now have some mismatch with my hosts CPUs. The cluster is set to Opteron G4 and all hosts CPU have Opteron 4226 installed.<br> <br> 2015-10-19 20:23:09,509 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) [36da0975] Correlation ID: 36da0975, Job ID: 44cf7ac9-5fe1-4151-9f6f-f1396124f179, Call Stack: null, Custom Event ID: -1, Message: Host <a moz-do-not-send="true" href="http://patty.elementary.se" rel="noreferrer" target="_blank">patty.elementary.se</a> does not comply with the cluster AMD_G4 emulated machines. The current cluster compatibility level supports [pc-i440fx-rhel7.2.0, pc-i440fx-2.1, pseries-rhel7.2.0] and the host emulated machines are pc-i440fx-rhel7.1.0,rhel6.3.0,pc-q35-rhel7.0.0,rhel6.1.0,rhel6.6.0,rhel6.2.0,pc,pc-q35-rhel7.1.0,q35,rhel6.4.0,rhel6.0.0,rhel6.5.0,pc-i440fx-rhel7.0.0.<br> <br> The error above look qemu to me so I tried reinstalling one of the node with the latest Ovirt-node snapshot downloaded from Jenkins --> <a moz-do-not-send="true" href="http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.6_create-iso-el7_merged/last..." rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.6_create-iso-el7_merged/lastSuccessfulBuild/artifact/exported-artifacts/ovirt-node-iso-3.6-0.999.201510190928.el7.centos.iso</a><br> <br> That made however no difference.<br> </blockquote> <div><br> </div> <div>The issue seams to be in node: the latest engine build requires pc-i440fx-rhel7.2.0 which comes with qemu 2.3 while probably node is still built from centos 7.1 with an older qemu.</div> <div>For standard el7 host we are releasing it in the oVirt repo ( <a moz-do-not-send="true" href="http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-kvm-ev-2.3.0-29.1.el7.x86_64.rpm">http://resources.ovirt.org/pub/ovirt-3.6-pre/rpm/el7/x86_64/qemu-kvm-ev-2.3.0-29.1.el7.x86_64.rpm</a> ) but it seams that we didn't included it in node.</div> </div> </div> </div> </blockquote> You know if that version bump affects all architectures (not only my AMD famliy) and the iso-node is to be considered bricked ?<br> <br> If so does not this candidate as a blocker, or is the iso-node not part of the 3.6 release ?<br> <br> <br> </body> </html> --------------090505070903010701010106--
participants (3)
-
Jonas Israelsson
-
Jonas Israelsson
-
Simone Tiraboschi