On Thu, Mar 07, 2013 at 03:59:27PM +0100, Patrick Hurrelmann wrote:
> On 05.03.2013 13:49, Dan Kenigsberg wrote:
>> On Tue, Mar 05, 2013 at 12:32:31PM +0100, Patrick Hurrelmann wrote:
>>> On 05.03.2013 11:14, Dan Kenigsberg wrote:
>>> <snip>
>>>>>>>
>>>>>>> My version of vdsm as stated by Dreyou:
>>>>>>> v 4.10.0-0.46 (.15), builded from
>>>>>>> b59c8430b2a511bcea3bc1a954eee4ca1c0f4861 (branch ovirt-3.1)
>>>>>>>
>>>>>>> I can't see that
Ia241b09c96fa16441ba9421f61a2f9a417f0d978 was merged to
>>>>>>> 3.1 Branch?
>>>>>>>
>>>>>>> I applied that patch locally and restarted vdsmd but this
does not
>>>>>>> change anything. Supported cpu is still as low as Conroe
instead of
>>>>>>> Nehalem. Or is there more to do than patching libvirtvm.py?
>>>>>>
>>>>>> What is libvirt's opinion about your cpu compatibility?
>>>>>>
>>>>>> virsh -r cpu-compare <(echo '<cpu
match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
>>>>>>
>>>>>> If you do not get "Host CPU is a superset of CPU described
in bla", then
>>>>>> the problem is within libvirt.
>>>>>>
>>>>>> Dan.
>>>>>
>>>>> Hi Dan,
>>>>>
>>>>> virsh -r cpu-compare <(echo '<cpu
>>>>>
match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
>>>>> Host CPU is a superset of CPU described in /dev/fd/63
>>>>>
>>>>> So libvirt obviously is fine. Something different would have
surprised
>>>>> my as virsh capabilities seemed correct anyway.
>>>>
>>>> So maybe, just maybe, libvirt has changed their cpu_map, a map that
>>>> ovirt-3.1 had a bug reading.
>>>>
>>>> Would you care to apply
http://gerrit.ovirt.org/5035 to see if this is
>>>> it?
>>>>
>>>> Dan.
>>>
>>> Hi Dan,
>>>
>>> success! Applying that patch made the cpu recognition work again. The
>>> cpu type in admin portal shows again as Nehalem. Output from getVdsCaps:
>>>
>>> cpuCores = 4
>>> cpuFlags = fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,
>>> mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,
>>> ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,
>>> arch_perfmon,pebs,bts,rep_good,xtopology,nonstop_tsc,
>>> aperfmperf,pni,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,
>>> ssse3,cx16,xtpr,pdcm,sse4_1,sse4_2,popcnt,lahf_lm,ida,
>>> dts,tpr_shadow,vnmi,flexpriority,ept,vpid,model_Nehalem,
>>> model_Conroe,model_coreduo,model_core2duo,model_Penryn,
>>> model_n270
>>> cpuModel = Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
>>> cpuSockets = 1
>>> cpuSpeed = 2393.769
>>>
>>>
>>> I compared libvirt's cpu_map.xml on both Centos 6.3 and CentOS 6.4 and
>>> indeed they do differ in large portions. So this patch should probably
>>> be merged to 3.1 branch? I will contact Dreyou and request that this
>>> patch will also be included in his builds. I guess otherwise there will
>>> be quite some fallout after people start picking CentOS 6.4 for oVirt 3.1.
>>>
>>> Thanks again and best regards
>>
>> Thank you for reporting this issue and verifying its fix.
>>
>> I'm not completely sure that we should keep maintaining the ovirt-3.1
>> branch upstream - but a build destined for el6.4 must have it.
>>
>> If you believe we should release a fix version for 3.1, please verify
>> that
http://gerrit.ovirt.org/12723 has no ill effects.
>>
>> Dan.
>
> I did none additional tests and the new CentOS 6.4 host failed start or
> migrate any vm. It always boils down to:
>
> Thread-43::ERROR::2013-03-07
> 15:02:51,950::task::853::TaskManager.Task::(_setError)
> Task=`52a9f96f-3dfd-4bcf-8d7a-db14e650b4c1`::Unexpected error
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/task.py", line 861, in _run
> return fn(*args, **kargs)
> File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
> res = f(*args, **kwargs)
> File "/usr/share/vdsm/storage/hsm.py", line 2551, in getVolumeSize
> apparentsize = str(volume.Volume.getVSize(sdUUID, imgUUID, volUUID,
> bs=1))
> File "/usr/share/vdsm/storage/volume.py", line 283, in getVSize
> return mysd.getVolumeClass().getVSize(mysd, imgUUID, volUUID, bs)
> File "/usr/share/vdsm/storage/blockVolume.py", line 101, in getVSize
> return int(int(lvm.getLV(sdobj.sdUUID, volUUID).size) / bs)
> File "/usr/share/vdsm/storage/lvm.py", line 772, in getLV
> lv = _lvminfo.getLv(vgName, lvName)
> File "/usr/share/vdsm/storage/lvm.py", line 567, in getLv
> lvs = self._reloadlvs(vgName)
> File "/usr/share/vdsm/storage/lvm.py", line 419, in _reloadlvs
> self._lvs.pop((vgName, lvName), None)
> File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
> self.gen.throw(type, value, traceback)
> File "/usr/share/vdsm/storage/misc.py", line 1219, in acquireContext
> yield self
> File "/usr/share/vdsm/storage/lvm.py", line 404, in _reloadlvs
> lv = makeLV(*fields)
> File "/usr/share/vdsm/storage/lvm.py", line 218, in makeLV
> attrs = _attr2NamedTuple(args[LV._fields.index("attr")],
> LV_ATTR_BITS, "LV_ATTR")
> File "/usr/share/vdsm/storage/lvm.py", line 188, in _attr2NamedTuple
> attrs = Attrs(*values)
> TypeError: __new__() takes exactly 9 arguments (10 given)
>
> and followed by:
>
> Thread-43::ERROR::2013-03-07
> 15:02:51,987::dispatcher::69::Storage.Dispatcher.Protect::(run)
> __new__() takes exactly 9 arguments (10 given)
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
> result = ctask.prepare(self.func, *args, **kwargs)
> File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
> raise self.error
> TypeError: __new__() takes exactly 9 arguments (10 given)
> Thread-43::DEBUG::2013-03-07
> 15:02:51,987::vm::580::vm.Vm::(_startUnderlyingVm)
> vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::_ongoingCreations released
> Thread-43::ERROR::2013-03-07
> 15:02:51,987::vm::604::vm.Vm::(_startUnderlyingVm)
> vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::The vm start process failed
> Traceback (most recent call last):
> File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
> self._run()
> File "/usr/share/vdsm/libvirtvm.py", line 1289, in _run
> devices = self.buildConfDevices()
> File "/usr/share/vdsm/vm.py", line 431, in buildConfDevices
> self._normalizeVdsmImg(drv)
> File "/usr/share/vdsm/vm.py", line 358, in _normalizeVdsmImg
> drv['truesize'] = res['truesize']
> KeyError: 'truesize'
>
> In webadmin the start and migrate operations fail with 'truesize'.
>
> I could find BZ#876958 which has the very same error. So I tried to
> apply patch
http://gerrit.ovirt.org/9317. I had to apply it manually
> (guess patch would need a rebase for 3.1), but it works.
Thanks for the report. I've made a public backport for this in
http://gerrit.ovirt.org/12836/ and would ask you again to tick that it
is verified by you.
>
> I now can start new virtual machines successfully on a CentOS 6.4 /
> oVirt 3.1 host. Migration of vm from CentOS 6.3 hosts work, but not the
> other way around. Migration from 6.4 to 6.3 fails:
>
> Thread-1296::ERROR::2013-03-07 15:55:24,845::vm::176::vm.Vm::(_recover)
> vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::internal error Process
> exited while reading console log output: Supported machines are:
> pc RHEL 6.3.0 PC (alias of rhel6.3.0)
> rhel6.3.0 RHEL 6.3.0 PC (default)
> rhel6.2.0 RHEL 6.2.0 PC
> rhel6.1.0 RHEL 6.1.0 PC
> rhel6.0.0 RHEL 6.0.0 PC
> rhel5.5.0 RHEL 5.5.0 PC
> rhel5.4.4 RHEL 5.4.4 PC
> rhel5.4.0 RHEL 5.4.0 PC
>
> Thread-1296::ERROR::2013-03-07 15:55:24,988::vm::240::vm.Vm::(run)
> vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::Failed to migrate
> Traceback (most recent call last):
> File "/usr/share/vdsm/vm.py", line 223, in run
> self._startUnderlyingMigration()
> File "/usr/share/vdsm/libvirtvm.py", line 451, in
> _startUnderlyingMigration
> None, maxBandwidth)
> File "/usr/share/vdsm/libvirtvm.py", line 491, in f
> ret = attr(*args, **kwargs)
> File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py",
> line 82, in wrapper
> ret = f(*args, **kwargs)
> File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in
> migrateToURI2
> if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
> dom=self)
> libvirtError: internal error Process exited while reading console log
> output: Supported machines are:
> pc RHEL 6.3.0 PC (alias of rhel6.3.0)
> rhel6.3.0 RHEL 6.3.0 PC (default)
> rhel6.2.0 RHEL 6.2.0 PC
> rhel6.1.0 RHEL 6.1.0 PC
> rhel6.0.0 RHEL 6.0.0 PC
> rhel5.5.0 RHEL 5.5.0 PC
> rhel5.4.4 RHEL 5.4.4 PC
> rhel5.4.0 RHEL 5.4.0 PC
>
> But I guess this is fine and migration from higher host version to a
> lower version is probably not supported, right?
Well, I suppose that qemu would allow migration if you begine with a
a *guest* of version rhel6.3.0. Please try it out.
Dan.
Alright, just verified it. A vm started on a 6.3 host can be
successfully migrated to the new 6.4 host and then back to any other 6.3
host. It just won't migrate a vm started on 6.4 to any host running 6.3.
Regards
Patrick
--
Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg
HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich