[Users] 6.4 CR: oVirt 3.1 breaks with missing cpu features after update to CentOS 6.4 (6.3 + CR)

Dan Kenigsberg danken at redhat.com
Thu Mar 7 20:31:12 UTC 2013


On Thu, Mar 07, 2013 at 04:50:16PM +0100, Patrick Hurrelmann wrote:
> On 07.03.2013 16:18, Dan Kenigsberg wrote:
> > On Thu, Mar 07, 2013 at 03:59:27PM +0100, Patrick Hurrelmann wrote:
> >> On 05.03.2013 13:49, Dan Kenigsberg wrote:
> >>> On Tue, Mar 05, 2013 at 12:32:31PM +0100, Patrick Hurrelmann wrote:
> >>>> On 05.03.2013 11:14, Dan Kenigsberg wrote:
> >>>> <snip>
> >>>>>>>>
> >>>>>>>> My version of vdsm as stated by Dreyou:
> >>>>>>>> v 4.10.0-0.46 (.15), builded from
> >>>>>>>> b59c8430b2a511bcea3bc1a954eee4ca1c0f4861 (branch ovirt-3.1)
> >>>>>>>>
> >>>>>>>> I can't see that Ia241b09c96fa16441ba9421f61a2f9a417f0d978 was merged to
> >>>>>>>> 3.1 Branch?
> >>>>>>>>
> >>>>>>>> I applied that patch locally and restarted vdsmd but this does not
> >>>>>>>> change anything. Supported cpu is still as low as Conroe instead of
> >>>>>>>> Nehalem. Or is there more to do than patching libvirtvm.py?
> >>>>>>>
> >>>>>>> What is libvirt's opinion about your cpu compatibility?
> >>>>>>>
> >>>>>>>      virsh -r cpu-compare <(echo '<cpu match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
> >>>>>>>
> >>>>>>> If you do not get "Host CPU is a superset of CPU described in bla", then
> >>>>>>> the problem is within libvirt.
> >>>>>>>
> >>>>>>> Dan.
> >>>>>>
> >>>>>> Hi Dan,
> >>>>>>
> >>>>>> virsh -r cpu-compare <(echo '<cpu
> >>>>>> match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
> >>>>>> Host CPU is a superset of CPU described in /dev/fd/63
> >>>>>>
> >>>>>> So libvirt obviously is fine. Something different would have surprised
> >>>>>> my as virsh capabilities seemed correct anyway.
> >>>>>
> >>>>> So maybe, just maybe, libvirt has changed their cpu_map, a map that
> >>>>> ovirt-3.1 had a bug reading.
> >>>>>
> >>>>> Would you care to apply http://gerrit.ovirt.org/5035 to see if this is
> >>>>> it?
> >>>>>
> >>>>> Dan.
> >>>>
> >>>> Hi Dan,
> >>>>
> >>>> success! Applying that patch made the cpu recognition work again. The
> >>>> cpu type in admin portal shows again as Nehalem. Output from getVdsCaps:
> >>>>
> >>>>    cpuCores = 4
> >>>>    cpuFlags = fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,
> >>>>               mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,
> >>>>               ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,
> >>>>               arch_perfmon,pebs,bts,rep_good,xtopology,nonstop_tsc,
> >>>>               aperfmperf,pni,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,
> >>>>               ssse3,cx16,xtpr,pdcm,sse4_1,sse4_2,popcnt,lahf_lm,ida,
> >>>>               dts,tpr_shadow,vnmi,flexpriority,ept,vpid,model_Nehalem,
> >>>>               model_Conroe,model_coreduo,model_core2duo,model_Penryn,
> >>>>               model_n270
> >>>>    cpuModel = Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz
> >>>>    cpuSockets = 1
> >>>>    cpuSpeed = 2393.769
> >>>>
> >>>>
> >>>> I compared libvirt's cpu_map.xml on both Centos 6.3 and CentOS 6.4 and
> >>>> indeed they do differ in large portions. So this patch should probably
> >>>> be merged to 3.1 branch? I will contact Dreyou and request that this
> >>>> patch will also be included in his builds. I guess otherwise there will
> >>>> be quite some fallout after people start picking CentOS 6.4 for oVirt 3.1.
> >>>>
> >>>> Thanks again and best regards
> >>>
> >>> Thank you for reporting this issue and verifying its fix.
> >>>
> >>> I'm not completely sure that we should keep maintaining the ovirt-3.1
> >>> branch upstream - but a build destined for el6.4 must have it.
> >>>
> >>> If you believe we should release a fix version for 3.1, please verify
> >>> that http://gerrit.ovirt.org/12723 has no ill effects.
> >>>
> >>> Dan.
> >>
> >> I did none additional tests and the new CentOS 6.4 host failed start or
> >> migrate any vm. It always boils down to:
> >>
> >> Thread-43::ERROR::2013-03-07
> >> 15:02:51,950::task::853::TaskManager.Task::(_setError)
> >> Task=`52a9f96f-3dfd-4bcf-8d7a-db14e650b4c1`::Unexpected error
> >> Traceback (most recent call last):
> >>   File "/usr/share/vdsm/storage/task.py", line 861, in _run
> >>     return fn(*args, **kargs)
> >>   File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
> >>     res = f(*args, **kwargs)
> >>   File "/usr/share/vdsm/storage/hsm.py", line 2551, in getVolumeSize
> >>     apparentsize = str(volume.Volume.getVSize(sdUUID, imgUUID, volUUID,
> >> bs=1))
> >>   File "/usr/share/vdsm/storage/volume.py", line 283, in getVSize
> >>     return mysd.getVolumeClass().getVSize(mysd, imgUUID, volUUID, bs)
> >>   File "/usr/share/vdsm/storage/blockVolume.py", line 101, in getVSize
> >>     return int(int(lvm.getLV(sdobj.sdUUID, volUUID).size) / bs)
> >>   File "/usr/share/vdsm/storage/lvm.py", line 772, in getLV
> >>     lv = _lvminfo.getLv(vgName, lvName)
> >>   File "/usr/share/vdsm/storage/lvm.py", line 567, in getLv
> >>     lvs = self._reloadlvs(vgName)
> >>   File "/usr/share/vdsm/storage/lvm.py", line 419, in _reloadlvs
> >>     self._lvs.pop((vgName, lvName), None)
> >>   File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
> >>     self.gen.throw(type, value, traceback)
> >>   File "/usr/share/vdsm/storage/misc.py", line 1219, in acquireContext
> >>     yield self
> >>   File "/usr/share/vdsm/storage/lvm.py", line 404, in _reloadlvs
> >>     lv = makeLV(*fields)
> >>   File "/usr/share/vdsm/storage/lvm.py", line 218, in makeLV
> >>     attrs = _attr2NamedTuple(args[LV._fields.index("attr")],
> >> LV_ATTR_BITS, "LV_ATTR")
> >>   File "/usr/share/vdsm/storage/lvm.py", line 188, in _attr2NamedTuple
> >>     attrs = Attrs(*values)
> >> TypeError: __new__() takes exactly 9 arguments (10 given)
> >>
> >> and followed by:
> >>
> >> Thread-43::ERROR::2013-03-07
> >> 15:02:51,987::dispatcher::69::Storage.Dispatcher.Protect::(run)
> >> __new__() takes exactly 9 arguments (10 given)
> >> Traceback (most recent call last):
> >>   File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
> >>     result = ctask.prepare(self.func, *args, **kwargs)
> >>   File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
> >>     raise self.error
> >> TypeError: __new__() takes exactly 9 arguments (10 given)
> >> Thread-43::DEBUG::2013-03-07
> >> 15:02:51,987::vm::580::vm.Vm::(_startUnderlyingVm)
> >> vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::_ongoingCreations released
> >> Thread-43::ERROR::2013-03-07
> >> 15:02:51,987::vm::604::vm.Vm::(_startUnderlyingVm)
> >> vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::The vm start process failed
> >> Traceback (most recent call last):
> >>   File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
> >>     self._run()
> >>   File "/usr/share/vdsm/libvirtvm.py", line 1289, in _run
> >>     devices = self.buildConfDevices()
> >>   File "/usr/share/vdsm/vm.py", line 431, in buildConfDevices
> >>     self._normalizeVdsmImg(drv)
> >>   File "/usr/share/vdsm/vm.py", line 358, in _normalizeVdsmImg
> >>     drv['truesize'] = res['truesize']
> >> KeyError: 'truesize'
> >>
> >> In webadmin the start and migrate operations fail with 'truesize'.
> >>
> >> I could find BZ#876958 which has the very same error. So I tried to
> >> apply patch http://gerrit.ovirt.org/9317. I had to apply it manually
> >> (guess patch would need a rebase for 3.1), but it works.
> > 
> > Thanks for the report. I've made a public backport for this in
> > http://gerrit.ovirt.org/12836/ and would ask you again to tick that it
> > is verified by you.
> > 
> >>
> >> I now can start new virtual machines successfully on a CentOS 6.4 /
> >> oVirt 3.1 host. Migration of vm from CentOS 6.3 hosts work, but not the
> >> other way around. Migration from 6.4 to 6.3 fails:
> >>
> >> Thread-1296::ERROR::2013-03-07 15:55:24,845::vm::176::vm.Vm::(_recover)
> >> vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::internal error Process
> >> exited while reading console log output: Supported machines are:
> >> pc         RHEL 6.3.0 PC (alias of rhel6.3.0)
> >> rhel6.3.0  RHEL 6.3.0 PC (default)
> >> rhel6.2.0  RHEL 6.2.0 PC
> >> rhel6.1.0  RHEL 6.1.0 PC
> >> rhel6.0.0  RHEL 6.0.0 PC
> >> rhel5.5.0  RHEL 5.5.0 PC
> >> rhel5.4.4  RHEL 5.4.4 PC
> >> rhel5.4.0  RHEL 5.4.0 PC
> >>
> >> Thread-1296::ERROR::2013-03-07 15:55:24,988::vm::240::vm.Vm::(run)
> >> vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::Failed to migrate
> >> Traceback (most recent call last):
> >>   File "/usr/share/vdsm/vm.py", line 223, in run
> >>     self._startUnderlyingMigration()
> >>   File "/usr/share/vdsm/libvirtvm.py", line 451, in
> >> _startUnderlyingMigration
> >>     None, maxBandwidth)
> >>   File "/usr/share/vdsm/libvirtvm.py", line 491, in f
> >>     ret = attr(*args, **kwargs)
> >>   File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py",
> >> line 82, in wrapper
> >>     ret = f(*args, **kwargs)
> >>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in
> >> migrateToURI2
> >>     if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
> >> dom=self)
> >> libvirtError: internal error Process exited while reading console log
> >> output: Supported machines are:
> >> pc         RHEL 6.3.0 PC (alias of rhel6.3.0)
> >> rhel6.3.0  RHEL 6.3.0 PC (default)
> >> rhel6.2.0  RHEL 6.2.0 PC
> >> rhel6.1.0  RHEL 6.1.0 PC
> >> rhel6.0.0  RHEL 6.0.0 PC
> >> rhel5.5.0  RHEL 5.5.0 PC
> >> rhel5.4.4  RHEL 5.4.4 PC
> >> rhel5.4.0  RHEL 5.4.0 PC
> >>
> >> But I guess this is fine and migration from higher host version to a
> >> lower version is probably not supported, right?
> > 
> > Well, I suppose that qemu would allow migration if you begine with a
> > a *guest* of version rhel6.3.0. Please try it out.
> > 
> > Dan.
> 
> Alright, just verified it. A vm started on a 6.3 host can be
> successfully migrated to the new 6.4 host and then back to any other 6.3
> host. It just won't migrate a vm started on 6.4 to any host running 6.3.

This surprises me. Engine should have used the same emulatedMachine
value, independent of the initial host. Could you share the vdsm.log
lines mentionioning "emulatedMachine" in both cases?

Dan.



More information about the Users mailing list