code merges in ovirt-engine repo
by Michal Skrivanek
Hi all,
we’re moving to 4.4.z development now and we need to keep a closer eye on automation results and making sure the build is not broken. For these reasons we’re considering moving to a similar model as vdsm, having a smaller set of people with merge rights to make sure the patches get in in the right order and they meet our sanity standards (OST, bug’s TM)
Any objections/comments?
Thanks,
michal
4 years, 9 months
Re: device compatibility interface for live migration with assigned devices
by Sean Mooney
resending with full cc list since i had this typed up
i would blame my email provier but my email client does not seam to like long cc lists.
we probably want to continue on alex's thread to not split the disscusion.
but i have responed inline with some example of how openstack schdules and what i ment by different mdev_types
On Tue, 2020-07-14 at 20:29 +0100, Sean Mooney wrote:
> On Tue, 2020-07-14 at 11:01 -0600, Alex Williamson wrote:
> > On Tue, 14 Jul 2020 13:33:24 +0100
> > Sean Mooney <smooney(a)redhat.com> wrote:
> >
> > > On Tue, 2020-07-14 at 11:21 +0100, Daniel P. Berrangé wrote:
> > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:
> > > > > hi folks,
> > > > > we are defining a device migration compatibility interface that helps upper
> > > > > layer stack like openstack/ovirt/libvirt to check if two devices are
> > > > > live migration compatible.
> > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two.
> > > > > e.g. we could use it to check whether
> > > > > - a src MDEV can migrate to a target MDEV,
> > >
> > > mdev live migration is completely possible to do but i agree with Dan barrange's comments
> > > from the point of view of openstack integration i dont see calling out to a vender sepecific
> > > tool to be an accpetable
> >
> > As I replied to Dan, I'm hoping Yan was referring more to vendor
> > specific knowledge rather than actual tools.
> >
> > > solutions for device compatiablity checking. the sys filesystem
> > > that describs the mdevs that can be created shoudl also
> > > contain the relevent infomation such
> > > taht nova could integrate it via libvirt xml representation or directly retrive the
> > > info from
> > > sysfs.
> > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > >
> > > so vf to vf migration is not possible in the general case as there is no standarised
> > > way to transfer teh device state as part of the siorv specs produced by the pci-sig
> > > as such there is not vender neutral way to support sriov live migration.
> >
> > We're not talking about a general case, we're talking about physical
> > devices which have vfio wrappers or hooks with device specific
> > knowledge in order to support the vfio migration interface. The point
> > is that a discussion around vfio device migration cannot be limited to
> > mdev devices.
>
> ok upstream in openstack at least we do not plan to support generic livemigration
> for passthough devivces. we cheat with network interfaces since in generaly operating
> systems handel hotplug of a nic somewhat safely so wehre no abstraction layer like
> an mdev is present or a macvtap device we hot unplug the nic before the migration
> and attach a new one after. for gpus or crypto cards this likely would not be viable
> since you can bond generic hardware devices to hide the removal and readdtion of a generic
> pci device. we were hoping that there would be a convergenca around MDEVs as a way to provide
> that abstraction going forward for generic device or some other new mechanisum in the future.
> >
> > > > > - a src MDEV can migration to a target VF in SRIOV.
> > >
> > > that also makes this unviable
> > > > > (e.g. SIOV/SRIOV backward compatibility case)
> > > > >
> > > > > The upper layer stack could use this interface as the last step to check
> > > > > if one device is able to migrate to another device before triggering a real
> > > > > live migration procedure.
> > >
> > > well actully that is already too late really. ideally we would want to do this compaiablity
> > > check much sooneer to avoid the migration failing. in an openstack envionment at least
> > > by the time we invoke libvirt (assuming your using the libvirt driver) to do the migration we have alreaedy
> > > finished schduling the instance to the new host. if if we do the compatiablity check at this point
> > > and it fails then the live migration is aborted and will not be retired. These types of late check lead to a
> > > poor user experince as unless you check the migration detial it basically looks like the migration was ignored
> > > as it start to migrate and then continuge running on the orgininal host.
> > >
> > > when using generic pci passhotuhg with openstack, the pci alias is intended to reference a single vendor
> > > id/product
> > > id so you will have 1+ alias for each type of device. that allows openstack to schedule based on the availability
> > > of
> > > a
> > > compatibale device because we track inventories of pci devices and can query that when selecting a host.
> > >
> > > if we were to support mdev live migration in the future we would want to take the same declarative approch.
> > > 1 interospec the capability of the deivce we manage
> > > 2 create inventories of the allocatable devices and there capabilities
> > > 3 schdule the instance to a host based on the device-type/capabilities and claim it atomicly to prevent raceces
> > > 4 have the lower level hyperviors do addtional validation if need prelive migration.
> > >
> > > this proposal seams to be targeting extending step 4 where as ideally we should focuse on providing the info that
> > > would
> > > be relevant in set 1 preferably in a vendor neutral way vai a kernel interface like /sys.
> >
> > I think this is reading a whole lot into the phrase "last step". We
> > want to make the information available for a management engine to
> > consume as needed to make informed decisions regarding likely
> > compatible target devices.
>
> well openstack as a management engin has 3 stages for schdule and asignment,.
> in respocne to a live migration request the api does minimal valaidation then hand the task off to the conductor
> service
> ot orchestrate. the conductor invokes an rpc to the schduler service which makes a rest call to the plamcent service.
> the placment cervice generate a set of allocation candiate for host based on qunataive and qulaitivly
> queries agains an abstract resouce provider tree model of the hosts.
> currently device pasthough is not modeled in placment so plamcnet is basicaly returning a set of host that have enough
> cpu ram and disk for the instance. in the spacial of vGPU they technically are modelled in placement but not in a way
> that would gurarentee compatiablity for migration. a generic pci device request is haneled in the second phase of
> schduling called filtering and weighing. in this pahse the nova schuleer apply a series of filter to the list of host
> returned by plamcnet to assert things like anit afintiy, tenant isolation or in the case of this converation nuam
> affintiy and pci device avaiablity. when we have filtered the posible set of host down to X number we weigh the
> listing
> to select an optimal host and set of alternitive hosts. we then enter the code that this mail suggest modfiying which
> does an rpc call to the destiation host form teh conductor to have it assert compatiablity which internaly calls back
> to
> the sourc host.
>
> so my point is we have done a lot of work by the time we call check_can_live_migrate_destination and failing
> at this point is considerd quite a late failure but its still better then failing when qemu actully tries to migrate.
> in general we would prefer to move compatiablity check as early in that workflow as possible but to be fair we dont
> actully check cpu model compatiablity until check_can_live_migrate_destination.
>
https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87...
>
> if we needed too we could read the version string on the source and write the version string on the dest at this
> point.
> doing so however would be considerd, inelegant, we have found this does not scale as the first copmpatabilty check.
> for cpu for example there are way to filter hosts by groups sets fo host with the same cpu or filtering on cpu feature
> flags that happen in the placment or filter stage both of which are very early and cheap to do at runtime.
>
> the "read for version, write for compatibility" workflow could be used as a final safe check if required but
> probing for compatibility via writes is basicaly considered an anti patteren in openstack. we try to always
> assert compatibility by reading avaiable info and asserting requirement over it not testing to see if it works.
>
> this has come up in the past in the context of virtio feature flag where the idea of spawning an instrance or trying
> to add a virtio port to ovs dpdk that reqested a specific feature flag was rejected as unacceptable from a performance
> and security point of view.
>
> >
> > > > > we are not sure if this interface is of value or help to you. please don't
> > > > > hesitate to drop your valuable comments.
> > > > >
> > > > >
> > > > > (1) interface definition
> > > > > The interface is defined in below way:
> > > > >
> > > > > __ userspace
> > > > > /\ \
> > > > > / \write
> > > > > / read \
> > > > > ________/__________ ___\|/_____________
> > > > > | migration_version | | migration_version |-->check migration
> > > > > --------------------- --------------------- compatibility
> > > > > device A device B
> > > > >
> > > > >
> > > > > a device attribute named migration_version is defined under each device's
> > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version).
> > >
> > > this might be useful as we could tag the inventory with the migration version and only might to
> > > devices with the same version
> >
> > Is cross version compatibility something that you'd consider using?
>
> yes but it would depend on what cross version actully ment.
>
> the version of an mdev is not something we would want to be exposed to endusers.
> it would be a security risk to do so as the version sting would potentaily allow the untrused user
> to discover if a device has an unpatch vulnerablity. as a result in the context of live migration
> we can only support cross verion compatiabilyt if the device in the guest does not alter as
> part of the migration and the behavior does not change.
>
> going form version 1.0 with feature X to verions 1.1 with feature X and Y but only X enabled would
> be fine. going gorm 1.0 to 2.0 where thre is only feature Y would not be ok.
> being abstract makes it a little harder to readabout but i guess i would sumerisei if its
> transparent to the guest for the lifetime of the qemu process then its ok for the backing version to change.
> if a vm is rebooted its also ok fo the vm to pick up feature Y form the 1.1 device although at that point
> it could not be migrated back to the 1.0 host as it now has feature X and Y and 1.0 only has X so that woudl be
> an obserable change if it was drop as a reult of the live migration.
> >
> > > > > userspace tools read the migration_version as a string from the source device,
> > > > > and write it to the migration_version sysfs attribute in the target device.
> > >
> > > this would not be useful as the schduler cannot directlly connect to the compute host
> > > and even if it could it would be extreamly slow to do this for 1000s of hosts and potentally
> > > multiple devices per host.
> >
> > Seems similar to Dan's requirement, looks like the 'read for version,
> > write for compatibility' test idea isn't really viable.
>
> its ineffiecnt and we have reject adding such test in the case of virtio-feature flag compatiabilty
> in the past, so its more an option of last resourt if we have no other way to support compatiablity
> checking.
> >
> > > > >
> > > > > The userspace should treat ANY of below conditions as two devices not compatible:
> > > > > - any one of the two devices does not have a migration_version attribute
> > > > > - error when reading from migration_version attribute of one device
> > > > > - error when writing migration_version string of one device to
> > > > > migration_version attribute of the other device
> > > > >
> > > > > The string read from migration_version attribute is defined by device vendor
> > > > > driver and is completely opaque to the userspace.
> > >
> > > opaque vendor specific stings that higher level orchestros have to pass form host
> > > to host and cant reason about are evil, when allowed they prolifroate and
> > > makes any idea of a vendor nutral abstraction and interoperablity between systems
> > > impossible to reason about. that said there is a way to make it opaue but still useful
> > > to userspace. see below
> > > > > for a Intel vGPU, string format can be defined like
> > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count".
> > > > >
> > > > > for an NVMe VF connecting to a remote storage. it could be
> > > > > "PCI ID" + "driver version" + "configured remote storage URL"
> > > > >
> > > > > for a QAT VF, it may be
> > > > > "PCI ID" + "driver version" + "supported encryption set".
> > > > >
> > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to
> > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1)
> > >
> > > honestly i would much prefer if the version string was just a semver string.
> > > e.g. {major}.{minor}.{bugfix}
> > >
> > > if you do a driver/frimware update and break compatiablity with an older version bump the
> > > major version.
> > >
> > > if you add optional a feature that does not break backwards compatiablity if you migrate
> > > an older instance to the new host then just bump the minor/feature number.
> > >
> > > if you have a fix for a bug that does not change the feature set or compatiblity backwards or
> > > forwards then bump the bugfix number
> > >
> > > then the check is as simple as
> > > 1.) is the mdev type the same
> > > 2.) is the major verion the same
> > > 3.) am i going form the same version to same version or same version to newer version
> > >
> > > if all 3 are true we can migrate.
> > > e.g.
> > > 2.0.1 -> 2.1.1 (ok same major version and migrating from older feature release to newer feature release)
> > > 2.1.1 -> 2.0.1 (not ok same major version and migrating from new feature release to old feature release may be
> > > incompatable)
> > > 2.0.0 -> 3.0.0 (not ok chaning major version)
> > > 2.0.1 -> 2.0.0 (ok same major and minor version, all bugfixs in the same minor release should be compatibly)
> >
> > What's the value of the bugfix field in this scheme?
>
> its not require but really its for a non visable chagne form a feature standpoint.
> a rather contrived example but if it was quadratic to inital a set of queues or device bufferes
> in 1.0.0 and you made it liniar in 1.0.1 that is a performace improvment in the device intialisation time
> which is great but it would not affect the feature set or compatiablity in any way. you could call it
> a feature but its really just an internal change but you might want to still bump the version number.
> >
> > The simplicity is good, but is it too simple. It's not immediately
> > clear to me whether all features can be hidden behind a minor version.
> > For instance, if we have an mdev device that supports this notion of
> > aggregation, which is proposed as a solution to the problem that
> > physical hardware might support lots and lots of assignable interfaces
> > which can be combined into arbitrary sets for mdev devices, making it
> > impractical to expose an mdev type for every possible enumeration of
> > assignable interfaces within a device.
>
> so this is a modeling problem and likely a limitation of the current way an mdev_type is exposed.
> stealing some linux doc eamples
>
>
> |- [parent physical device]
> |--- Vendor-specific-attributes [optional]
> |--- [mdev_supported_types]
> | |--- [<type-id>]
> | | |--- create
> | | |--- name
> | | |--- available_instances
> | | |--- device_api
> | | |--- description
>
> you could adress this in 1 of at least 3 ways.
> 1.) mdev type for each enmartion which is fine for 1-2 variabley othersize its a combinitroial explotions.
> 2.) report each of the consomable sub componetns as an mdev type and create mupltipel mdevs and assign them to the vm.
> 3.) provider an api to dynamically compose mdevs types which staticaly partion the reqouese and can then be consomed
> perferably embeding the resouce infomation in the description filed in a huma/machince readable form.
>
> 2 and 3 woudl work well with openstack however they both have there challanges
> 1 doesnt really work for anyone out side of a demo.
> > We therefore expose a base type
> > where the aggregation is built later. This essentially puts us in a
> > scenario where even within an mdev type running on the same driver,
> > there are devices that are not directly compatible with each other.
> >
> > > we dont need vendor to rencode the driver name or vendor id and product id in the string. that info is alreay
> > > available both to the device driver and to userspace via /sys already we just need to know if version of
> > > the same mdev are compatiable so a simple semver version string which is well know in the software world
> > > at least is a clean abstration we can reuse.
> >
> > This presumes there's no cross device migration.
>
> no but it does assume no cross mdev_type migration.
> it assuems that nvida_mdev_type_x on host 1 is the same as nvida_mdev_type_x on host 2.
> if the parent device differese but support the same mdev type we are asserting that they
> should be compatiable or a differnt mdev_type name should be used on each device.
>
> so we are presuming the mdev type cant change as part of a live migration and if the type
> was to change it would no longer be a live migration operation it would be something else.
> that is based on the premis that changing the mdev type would change the capabilities of the mdev
>
> > An mdev type can only
> > be migrated to the same mdev type, all of the devices within that type
> > have some based compatibility, a phsyical device can only be migrated to
> > the same physical device. In the latter case what defines the type?
>
> the type-id in /sysfs
>
> /sys/devices/virtual/mtty/mtty/
> |-- mdev_supported_types
> | |-- mtty-1 <---- this is an mdev type
> | | |-- available_instances
> | | |-- create
> | | |-- device_api
> | | |-- devices
> | | `-- name
> | `-- mtty-2 <---- as is this
> | |-- available_instances
> | |-- create
> | |-- device_api
> | |-- devices
> | `-- name
>
> |- [parent phy device]
> |--- [$MDEV_UUID]
> |--- remove
> |--- mdev_type {link to its type} <-- here
> |--- vendor-specific-attributes [optional]
>
> > If
> > it's a PCI device, is it only vendor:device IDs?
>
> no the mdev type is not defined by the vendor:device id of the parent device
> although the capablityes of that device will determin what mdev types if any it supprots.
> > What about revision?
> > What about subsystem IDs?
>
> at least for nvidia gpus i dont think if you by an evga branded v100 vs an pny branded one the capability
> would change but i do know that certenly the capablities of a dell branding intel nic and an intel branded
> one can. e.g. i have seen oem sku nics without sriov eventhoguh the same nic form intel supports it.
> sriov was deliberatly disabled in the dell firmware even though it share dhte same vendor and prodcut id but differnt
> subsystem id.
>
> if the odm made an incomatipable change like that which affect an mdev type in some way i guess i would expect them to
> change the name or the description filed content to signal that.
>
> > What about possibly an onboard ROM or
> > internal firmware?
>
> i would expect that updating the firmware/rom could result in changing a version string. that is how i was imagining
> it would change.
> > The information may be available, but which things
> > are relevant to migration?
>
> that i dont know an i really would not like to encode that knolage in the vendor specific way in higher level
> tools like openstack or even libvirt. declarative version sting comparisons or even simile feature flag
> check where an abstract huristic that can be applied across vendors would be fine. but yes i dont know
> what info would be needed in this case.
> > We already see desires to allow migration
> > between physical and mdev,
>
> migration between a phsical device and an mdev would not generally be considered a live migration in openstack.
> that would be a different operation as it would be user visible withing the guest vm.
> > but also to expose mdev types that might be
> > composable to be compatible with other types. Thanks,
>
> i think composable mdev types are really challanging without some kind of feature flag concept
> like cpu flags or ethtool nic capablities that are both human readable and easily parsable.
>
> we have the capability to schedule on cpu flags or gpu cuda level using a traits abstraction
> so instead of saying i want an vm on a host with an intel 2695v3 to ensure it has AVX
> you say i want an vm that is capable of using AVX
> https://github.com/openstack/os-traits/blob/master/os_traits/hw/cpu/x86/_...
>
> we also have trait for cuda level so instead of asking for a specifc mdev type or nvida
> gpu the idea was you woudl describe what feature cuda in this exmple you need
> https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/cuda....
>
> That is what we call qualitative schudleing and is why we create teh placement service.
> with out going in to the weeds we try to decouple quantaitive request such as 4 cpus and 1G of ram
> form the qunative i need AVX supprot
>
> e.g. resouces:VCPU=4,resouces:MEMORY_MB=1024 triats:required=HW_CPU_X86_AVX
>
> declarative quantitive and capablites reporting of resouces fits easily into that model.
> dynamic quantities that change as other mdev are allocated from the parent device or as
> new mdevs types are composed on the fly are very challenging.
>
> >
> > Alex
> >
>
>
4 years, 9 months
Import failures for networking vdsm modules in basic and he suite
by Marcin Sobczyk
Hi,
I'm observing some issues with network imports failing. For basic suite
supervdsmd fails to start with:
l 13 17:24:11 lago-basic-suite-master-host-0 python3[29380]: detected
unhandled Python exception in '/usr/share/vdsm/supervdsmd'
Jul 13 17:24:11 lago-basic-suite-master-host-0 abrt-server[29382]: Not
saving repeating crash in '/usr/share/vdsm/supervdsmd'
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
Traceback (most recent call last):
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
File "/usr/share/vdsm/supervdsmd", line 24, in <module>
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
from vdsm import supervdsm_server
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line
66, in <module>
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
from vdsm.network.initializer import init_privileged_network_components
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
File "/usr/lib/python3.6/site-packages/vdsm/network/initializer.py",
line 32, in <module>
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
from vdsm.network import nmstate
Jul 13 17:24:11 lago-basic-suite-master-host-0 daemonAdapter[29380]:
ImportError: cannot import name 'nmstate'
While deploying he basic suite I can see smth like (this one's a bit
weird cause it's extracted from ansible logs, so might be trimmed):
File \"/usr/libexec/vdsm/vm_libvirt_hook.py\", line 29, in <module>
from vdsm.virt.vmdevices import storage
File
\"/usr/$ib/python3.6/site-packages/vdsm/virt/vmdevices/__init__.py\",
line 27, in <module>
from . import graphics
File
\"/usr/lib/python3.6/site-packages/vdsm/virt/vmdevices$graphics.py\",
line 27, in <module>
from vdsm.virt import displaynetwork
File
\"/usr/lib/python3.6/site-packages/vdsm/virt/displaynetwork.py\", line
23, in <module>\$ from vdsm.network import api as net_api
File \"/usr/lib/python3.6/site-packages/vdsm/network/api.py\", line
34, in <module>
from vdsm.network import netswitch
File
\"/usr/lib/python3.6/site-packages/vdsm/network/netswitch/__init__.py\",
line 23, in <module>
from . import configurator
File \"/usr/lib/python3.6/site-packages$vdsm/
Ales, could you take a look?
Regards, Marcin
4 years, 10 months
Re: [ovirt-users] Using ovirt imageio
by Nir Soffer
On Tue, Jul 7, 2020 at 5:05 PM Łukasz Kołaciński <l.kolacinski(a)storware.eu>
wrote:
> Dear ovirt community,
>
Hi Łukasz,
Adding devel(a)ovit.org since this topic is more appropriate for the devel
list.
> I am trying to use ovirt imageio api to receive changed blocks (dirty
> bitmap) on ovirt 4.4. Could anyone tell me how to get them step by step? On
> the documentation I saw endpoint "GET /images/ticket-uuid/map". I don't
> know what ticket-uuid is and how to generate it. I also need to know how to
> use this api because I can't reach it via /ovirt-engine/api/
>
> I am asking about this endpoint:
>
> https://www.ovirt.org/documentation/incremental-backup-guide/incremental-...
>
This guide is outdated and should not be used now.
The most up to date information is here:
https://www.ovirt.org/develop/release-management/features/storage/increme...
However the extents API is also outdated in the feature page. We are
working on updating it.
So here is example:
First you must start backup with from_checkpoint_id argument:
backup = backups_service.add(
types.Backup(
disks=disks,
from_checkpoint_id="checkpoint-id",
)
)
>
"checkpoint-id" is the checkpoint created in the last backup.
This starts a backup in in incremental mode. Dirty extents are available
only
in this mode.
Then you start a transfer for download, using the backup id:
transfer = imagetransfer.create_transfer(
connection,
disk,
types.ImageTransferDirection.DOWNLOAD,
backup=types.Backup(id=backup_uuid))
The transfer.transfer_url is the URL to download from, for example:
https://host:54322/images/53787351-3f72-44a1-8a26-1323524fac4a
Connect to host:54322 and send this request:
GET /images/53787351-3f72-44a1-8a26-1323524fac4a/extents?context=dirty
And parse the return json list, containing objects like:
[
{"start": 0, "length": 65536, "dirty": true},
{"start": 65536, "length": 1048576, "dirty": false},
...
]
For example code of using the imageio API, see imageio http backend:
https://github.com/oVirt/ovirt-imageio/blob/d5aa0e1fe659f1bf1247516f83c71...
https://github.com/oVirt/ovirt-imageio/blob/d5aa0e1fe659f1bf1247516f83c71...
We are adding a ImageioClient API that makes it easier to consume without
writing any HTTP code:
https://gerrit.ovirt.org/c/110068
With this you can use:
with ImageioClient(transfer.transfer_url, cafile=args.cafile) as client:
for extent in client.extent("dirty"):
if extent.dirty:
print("##dirty start={} length={}".format(extent.start,
extent.length))
client.write_to(sys.stdout.buffer, extent.start,
extent.length)
print()
This will stream the dirty extents to stdout. Not very useful as is, but
illustrates how
you can consume the data.
Here is an example writing extents to a sparse stream format:
https://gerrit.ovirt.org/c/110069
For complete backup example code see:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup...
Note the new imagetransfer helper module:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/helper...
Nir
e-mail: l.kolacinski(a)storware.eu
> <m.helbert(a)storware.eu>
>
>
>
>
> *[image: STORWARE]* <http://www.storware.eu/>
>
>
>
> *ul. Leszno 8/44 01-192 Warszawa www.storware.eu
> <https://www.storware.eu/>*
>
> *[image: facebook]* <https://www.facebook.com/storware>
>
> *[image: twitter]* <https://twitter.com/storware>
>
> *[image: linkedin]* <https://www.linkedin.com/company/storware>
>
> *[image: Storware_Stopka_09]*
> <https://www.youtube.com/channel/UCKvLitYPyAplBctXibFWrkw>
>
>
>
> *Storware Spółka z o.o. nr wpisu do ewidencji KRS dla M.St. Warszawa
> 000510131* *, NIP 5213672602.** Wiadomość ta jest przeznaczona jedynie
> dla osoby lub podmiotu, który jest jej adresatem i może zawierać poufne
> i/lub uprzywilejowane informacje. Zakazane jest jakiekolwiek przeglądanie,
> przesyłanie, rozpowszechnianie lub inne wykorzystanie tych informacji lub
> podjęcie jakichkolwiek działań odnośnie tych informacji przez osoby lub
> podmioty inne niż zamierzony adresat. Jeżeli Państwo otrzymali przez
> pomyłkę tę informację prosimy o poinformowanie o tym nadawcy i usunięcie
> tej wiadomości z wszelkich komputerów. **This message is intended only
> for the person or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited. If you have received this message in error, please contact
> the sender and remove the material from all of your computer systems.*
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/AIJMMJOR354...
>
4 years, 10 months
VM export: task stucked
by francesco@shellrent.com
Hi all,
everyday at 01:00 AM we perform a daily backup on many VMs hosted in multiple hosts (all with oVirt 4.3) using a custom sciprt written in python3 (using SDK) and everything works "almost" fine.
There is one single VM (Windows Server 2016) with a disk of 600 GB (the real disk usage is about 150 GB) hosted on a single node that has a strange behaviour.
1) The export start after a few seconds of the execution of the script, we use "vm_service.export_to_path_on_host" class for exporting the VM as ova file;
2) After few minutes i see in the engine the command "START, DumpXmlsVDSCommand" that is completely istantly both on the host side ad the engine side, and it's fine:
2020-07-02 01:05:45,428+0200 INFO (jsonrpc/7) [api.host] START dumpxmls(vmList=[u'10e88cab-ec4f-4491-b51f-94e3d2e81a0a']) from=::ffff:$ENGINE_IP,39308 (api:48)
2020-07-02 01:05:45,428+0200 INFO (jsonrpc/7) [api.host] FINISH dumpxmls return={... LONG XML ...}
3) after 3 hours i see the following logs about the export task:
2020-07-02 04:11:39,201+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [895d6936-f45c-4766-afd4-408a4e4e9a41] START, GetVolumeInfoVDSCommand(HostName = $OVIRT_HOST, GetVolumeInfoVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', storagePoolId='79d774b7-ca5b-49c2-baf8-9275ba3f1a84', storageDomainId='6775c41c-7d67-451b-8beb-4fd086eade2e', imageGroupId='a084fa36-0f93-45c2-a323-ea9ca2d16677', imageId='55b3eac5-05b2-4bae-be50-37cde7050697'}), log id: 3cbf2c7c
2020-07-02 04:11:39,299+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetQemuImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [895d6936-f45c-4766-afd4-408a4e4e9a41] START, GetQemuImageInfoVDSCommand(HostName = $OVIRT_HOST, GetVolumeInfoVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', storagePoolId='79d774b7-ca5b-49c2-baf8-9275ba3f1a84', storageDomainId='6775c41c-7d67-451b-8beb-4fd086eade2e', imageGroupId='a084fa36-0f93-45c2-a323-ea9ca2d16677', imageId='55b3eac5-05b2-4bae-be50-37cde7050697'}), log id: 47d5122e
2020-07-02 04:43:21,339+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [312261bb] START, PrepareImageVDSCommand(HostName = $OVIRT_HOST, PrepareImageVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5'}), log id: 28c880d12020-07-02 04:43:21,650+02 INFO [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [312261bb] Executing Ansible command: ANSIBLE_STDOUT_CALLBACK=imagemeasureplugin /usr/bin/ansible-playbook --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory1121551177697847734 --extra-vars=image_path="/rhev/data-center/mnt/_data/6775c41c-7d67-451b-8beb-4fd086eade2e/images/a084fa36-0f93-45c2-a323-ea9ca2d16677/5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04" /usr/share/ovirt-engine/playbooks/ovirt-image-measure.yml [Logfile: /var/log/ovirt-engine/ova
/ovirt-image-measure-ansible-20200702044321-$OVIRT_HOST-312261bb.log]
2020-07-02 05:05:20,428+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-8) [64c95c4d] START, UploadStreamVDSCommand(HostName = $OVIRT_HOST, UploadStreamVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5'}), log id: 666809cb
2020-07-02 05:05:22,104+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-8) [64c95c4d] START, UploadStreamVDSCommand(HostName = $OVIRT_HOST, UploadStreamVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5'}), log id: 33ff46bf
2020-07-02 05:05:29,602+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-287) [64c95c4d] START, HSMClearTaskVDSCommand(HostName = $OVIRT_HOST, HSMTaskGuidBaseVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', taskId='e40a2740-37f6-455d-bed7-554efef761ff'}), log id: f01ff26
2020-07-02 05:05:29,619+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-269) [64c95c4d] START, HSMClearTaskVDSCommand(HostName = $OVIRT_HOST, HSMTaskGuidBaseVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', taskId='1c94df38-88c1-4bf8-89a2-8c322513b21b'}), log id: 6150e13b
4) And then, after several hours, the export fail with the following logs:
engine logs.
2020-07-02 12:43:21,653+02 ERROR [org.ovirt.engine.core.bll.CreateOvaCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [312261bb] Failed to measure image: null. Please check logs for more details: /var/log/ovirt-engine/ova/ovirt-image-measure-ansible-20200702044321-$OVIRT_HOST-312261bb.log
2020-07-02 12:43:23,741+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: IMPORTEXPORT_EXPORT_VM_TO_OVA_FAILED(1,225), Failed to export Vm $OVIRT_VM_TO_EXPORT as a Virtual Appliance to path /backup/$OVIRT_VM_TO_EXPORT_daily_2020Jul02.ova on Host $OVIRT_HOST
2020-07-02 12:43:26,162+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [54b569e8] START, MergeVDSCommand(HostName = $OVIRT_HOST, MergeVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', vmId='10e88cab-ec4f-4491-b51f-94e3d2e81a0a', storagePoolId='79d774b7-ca5b-49c2-baf8-9275ba3f1a84', storageDomainId='6775c41c-7d67-451b-8beb-4fd086eade2e', imageGroupId='a084fa36-0f93-45c2-a323-ea9ca2d16677', imageId='55b3eac5-05b2-4bae-be50-37cde7050697', baseImageId='5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', topImageId='55b3eac5-05b2-4bae-be50-37cde7050697', bandwidth='0'}), log id: a0b417d
2020-07-02 12:43:26,328+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [54b569e8] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM $OVIRT_HOST command MergeVDS failed: Merge failed
2020-07-02 12:43:26,328+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [54b569e8] HostName = $OVIRT_HOST
2020-07-02 12:43:26,328+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [54b569e8] Command 'MergeVDSCommand(HostName = $OVIRT_HOST, MergeVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', vmId='10e88cab-ec4f-4491-b51f-94e3d2e81a0a', storagePoolId='79d774b7-ca5b-49c2-baf8-9275ba3f1a84', storageDomainId='6775c41c-7d67-451b-8beb-4fd086eade2e', imageGroupId='a084fa36-0f93-45c2-a323-ea9ca2d16677', imageId='55b3eac5-05b2-4bae-be50-37cde7050697', baseImageId='5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', topImageId='55b3eac5-05b2-4bae-be50-37cde7050697', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52
2020-07-02 12:43:28,996+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [54b569e8] START, DumpXmlsVDSCommand(HostName = $OVIRT_HOST, Params:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', vmIds='[10e88cab-ec4f-4491-b51f-94e3d2e81a0a]'}), log id: 65709a96
2020-07-02 13:05:38,206+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [5179fcb] START, UploadStreamVDSCommand(HostName = $OVIRT_HOST, UploadStreamVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5'}), log id: 44e9ec23
2020-07-02 13:05:39,851+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [5179fcb] START, UploadStreamVDSCommand(HostName = $OVIRT_HOST, UploadStreamVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5'}), log id: 68231c70
2020-07-02 13:05:40,356+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-240) [5179fcb] START, HSMClearTaskVDSCommand(HostName = $OVIRT_HOST, HSMTaskGuidBaseVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', taskId='5d61682b-cf47-4637-9014-c647eb5265ee'}), log id: 718885f3
2020-07-02 13:05:50,357+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-290) [5179fcb] START, HSMClearTaskVDSCommand(HostName = $OVIRT_HOST, HSMTaskGuidBaseVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5', taskId='5709eb6c-a656-4a4d-b67f-72dd0bebf3e4'}), log id: 40c18d57
And here the vdsm.log of the host:
2020-07-02 12:43:26,169+0200 INFO (jsonrpc/6) [api.virt] START merge(drive={u'imageID': u'a084fa36-0f93-45c2-a323-ea9ca2d16677', u'volumeID': u'55b3eac5-05b2-4bae-be50-37cde7050697', u'domainID': u'6775c41c-7d67-451b-8beb-4fd086eade2e', u'poolID': u'79d774b7-ca5b-49c2-baf8-9275ba3f1a84'}, baseVolUUID=u'5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', topVolUUID=u'55b3eac5-05b2-4bae-be50-37cde7050697', bandwidth=u'0', jobUUID=u'b49fa67b-4a5a-4f73-823f-9e33a8e20cd2') from=::ffff:$ENGINE_IP,39308, flow_id=54b569e8, vmId=10e88cab-ec4f-4491-b51f-94e3d2e81a0a (api:48)
2020-07-02 12:43:26,196+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID=u'6775c41c-7d67-451b-8beb-4fd086eade2e', spUUID='79d774b7-ca5b-49c2-baf8-9275ba3f1a84', imgUUID=u'a084fa36-0f93-45c2-a323-ea9ca2d16677', volUUID=u'5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', options=None) from=::ffff:$ENGINE_IP,39308, flow_id=54b569e8, task_id=bb1d1fd0-9746-4949-9c6c-b069947f5875 (api:48)
2020-07-02 12:43:26,199+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6775c41c-7d67-451b-8beb-4fd086eade2e imgUUID=a084fa36-0f93-45c2-a323-ea9ca2d16677 volUUID = 5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04 (volume:240)
2020-07-02 12:43:26,205+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6775c41c-7d67-451b-8beb-4fd086eade2e/a084fa36-0f93-45c2-a323-ea9ca2d16677/5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04 info is {'status': 'OK', 'domain': '6775c41c-7d67-451b-8beb-4fd086eade2e', 'voltype': 'INTERNAL', 'description': '', 'parent': '7e802d0d-d6cd-4979-8f8d-5f4c5d5c3013', 'format': 'COW', 'generation': 0, 'image': 'a084fa36-0f93-45c2-a323-ea9ca2d16677', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '6635716608', 'children': [], 'pool': '', 'ctime': '1593558283', 'capacity': '644245094400', 'uuid': u'5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', 'truesize': '6637277184', 'type': 'SPARSE'} (volume:279)
2020-07-02 12:43:26,205+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6775c41c-7d67-451b-8beb-4fd086eade2e', 'voltype': 'INTERNAL', 'description': '', 'parent': '7e802d0d-d6cd-4979-8f8d-5f4c5d5c3013', 'format': 'COW', 'generation': 0, 'image': 'a084fa36-0f93-45c2-a323-ea9ca2d16677', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '6635716608','children': [], 'pool': '', 'ctime': '1593558283', 'capacity': '644245094400', 'uuid': u'5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', 'truesize': '6637277184', 'type': 'SPARSE'}} from=::ffff:$ENGINE_IP,39308, flow_id=54b569e8, task_id=bb1d1fd0-9746-4949-9c6c-b069947f5875 (api:54)
2020-07-02 12:43:26,206+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID=u'6775c41c-7d67-451b-8beb-4fd086eade2e', spUUID='79d774b7-ca5b-49c2-baf8-9275ba3f1a84', imgUUID=u'a084fa36-0f93-45c2-a323-ea9ca2d16677', volUUID=u'55b3eac5-05b2-4bae-be50-37cde7050697', options=None) from=::ffff:$ENGINE_IP,39308, flow_id=54b569e8, task_id=32ce9b21-b9f3-4319-a229-b32df1fb9283 (api:48)
2020-07-02 12:43:26,208+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6775c41c-7d67-451b-8beb-4fd086eade2e imgUUID=a084fa36-0f93-45c2-a323-ea9ca2d16677 volUUID = 55b3eac5-05b2-4bae-be50-37cde7050697 (volume:240)
2020-07-02 12:43:26,219+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6775c41c-7d67-451b-8beb-4fd086eade2e/a084fa36-0f93-45c2-a323-ea9ca2d16677/55b3eac5-05b2-4bae-be50-37cde7050697 info is {'status': 'OK', 'domain': '6775c41c-7d67-451b-8beb-4fd086eade2e', 'voltype': 'LEAF', 'description': '', 'parent': '5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', 'format': 'COW', 'generation': 0, 'image': 'a084fa36-0f93-45c2-a323-ea9ca2d16677', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '4077387776', 'children': [], 'pool': '', 'ctime': '1593644682', 'capacity': '644245094400', 'uuid': u'55b3eac5-05b2-4bae-be50-37cde7050697', 'truesize': '4077940736', 'type': 'SPARSE'} (volume:279)
2020-07-02 12:43:26,219+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6775c41c-7d67-451b-8beb-4fd086eade2e', 'voltype': 'LEAF', 'description': '', 'parent': '5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04', 'format': 'COW', 'generation': 0, 'image': 'a084fa36-0f93-45c2-a323-ea9ca2d16677', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '4077387776', 'children': [], 'pool': '', 'ctime': '1593644682', 'capacity': '644245094400', 'uuid': u'55b3eac5-05b2-4bae-be50-37cde7050697', 'truesize': '4077940736', 'type': 'SPARSE'}} from=::ffff:$ENGINE_IP,39308, flow_id=54b569e8, task_id=32ce9b21-b9f3-4319-a229-b32df1fb9283 (api:54)
2020-07-02 12:43:26,235+0200 INFO (jsonrpc/6) [virt.vm] (vmId='10e88cab-ec4f-4491-b51f-94e3d2e81a0a') Starting merge with jobUUID=u'b49fa67b-4a5a-4f73-823f-9e33a8e20cd2', original chain=e2bad960-6ad4-44b6-8ff0-69dcca6bf722 < 7e802d0d-d6cd-4979-8f8d-5f4c5d5c3013 < 5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04 < 55b3eac5-05b2-4bae-be50-37cde7050697 (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5951)
2020-07-02 12:43:26,280+0200 ERROR (jsonrpc/6) [virt.vm] (vmId='10e88cab-ec4f-4491-b51f-94e3d2e81a0a') Live merge failed (job: b49fa67b-4a5a-4f73-823f-9e33a8e20cd2) (vm:5957)
2020-07-02 12:43:26,296+0200 INFO (jsonrpc/6) [api.virt] FINISH merge return={'status': {'message': 'Merge failed', 'code': 52}} from=::ffff:$ENGINE_IP,39308, flow_id=54b569e8, vmId=10e88cab-ec4f-4491-b51f-94e3d2e81a0a (api:54)
2020-07-02 12:43:26,296+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 0.13 seconds (__init__:312)
2020-07-02 12:43:26,627+0200 INFO (jsonrpc/1) [vdsm.api] START getSpmStatus(spUUID=u'79d774b7-ca5b-49c2-baf8-9275ba3f1a84', options=None) from=::ffff:$ENGINE_IP,39308, flow_id=4968c8cb, task_id=ee82b2d6-39f5-43ad-83a3-803f1af8e234 (api:48)
2020-07-02 12:43:26,628+0200 INFO (jsonrpc/1) [vdsm.api] FINISH getSpmStatus return={'spm_st': {'spmId': 1, 'spmStatus': 'SPM', 'spmLver': -1}} from=::ffff:$ENGINE_IP,39308, flow_id=4968c8cb, task_id=ee82b2d6-39f5-43ad-83a3-803f1af8e234 (api:54)
2020-07-02 12:43:26,628+0200 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StoragePool.getSpmStatus succeeded in 0.00 seconds (__init__:312)
2020-07-02 12:43:26,645+0200 INFO (jsonrpc/3) [vdsm.api] START getStoragePoolInfo(spUUID=u'79d774b7-ca5b-49c2-baf8-9275ba3f1a84', options=None) from=::ffff:$ENGINE_IP,39304, flow_id=4968c8cb, task_id=ab1124bc-651d-4ade-aae4-ff9fea23f6a0 (api:48)
2020-07-02 12:43:26,646+0200 INFO (jsonrpc/3) [vdsm.api] FINISH getStoragePoolInfo return={'info': {'name': 'No Description', 'isoprefix': '', 'pool_status': 'connected', 'lver': -1, 'domains': u'6775c41c-7d67-451b-8beb-4fd086eade2e:Active', 'master_uuid': u'6775c41c-7d67-451b-8beb-4fd086eade2e', 'version': '5', 'spm_id': 1, 'type': 'LOCALFS', 'master_ver': 1}, 'dominfo': {u'6775c41c-7d67-451b-8beb-4fd086eade2e': {'status': u'Active', 'diskfree': '3563234983936', 'isoprefix': '', 'alerts': [], 'disktotal': '3923683311616', 'version': 5}}} from=::ffff:$ENGINE_IP,39304, flow_id=4968c8cb, task_id=ab1124bc-651d-4ade-aae4-ff9fea23f6a0 (api:54)
2020-07-02 12:43:26,647+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StoragePool.getInfo succeeded in 0.01 seconds (__init__:312)
5) His task seems block any other export tasks, leave them in pending. There were about 5 export tasks and 1 import temaplte task and as soon we killed the following process, the other 4 tasks and the import one completed after few minutes.
2020-07-02 04:43:21,339+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [312261bb] START, PrepareImageVDSCommand(HostName = $OVIRT_HOST, PrepareImageVDSCommandParameters:{hostId='e8c07142-fcd8-4f78-9158-ffb2caa06dc5'}), log id: 28c880d12020-07-02 04:43:21,650+02 INFO [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [312261bb] Executing Ansible command: ANSIBLE_STDOUT_CALLBACK=imagemeasureplugin /usr/bin/ansible-playbook --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory1121551177697847734 --extra-vars=image_path="/rhev/data-center/mnt/_data/6775c41c-7d67-451b-8beb-4fd086eade2e/images/a084fa36-0f93-45c2-a323-ea9ca2d16677/5b76cc6c-1a6b-4e02-8ce4-dc80faf9ed04" /usr/share/ovirt-engine/playbooks/ovirt-image-measure.yml [Logfile: /var/log/ovirt-engine/ova
/ovirt-image-measure-ansible-20200702044321-$OVIRT_HOST-312261bb.log]
The process that we killed was:
vdsm 25338 25332 99 04:14 pts/0 07:40:09 qemu-img measure -O qcow2 /rhev/data-center/mnt/_data/6775c41c-7d67-451b-8beb-4fd086eade2e/images/a084fa36-0f93-45c2-a323-ea9ca2d16677/55b3eac5-05b2-4bae-be50-37cde705069
A strace -p of the PID 25338 showed following lines:
lseek(11, 3056795648, SEEK_DATA) = 3056795648
lseek(11, 3056795648, SEEK_HOLE) = 13407092736
lseek(14, 128637468672, SEEK_DATA) = 128637468672
lseek(14, 128637468672, SEEK_HOLE) = 317708828672
lseek(14, 128646250496, SEEK_DATA) = 128646250496
lseek(14, 128646250496, SEEK_HOLE) = 317708828672
lseek(14, 128637730816, SEEK_DATA) = 128637730816
lseek(14, 128637730816, SEEK_HOLE) = 317708828672
lseek(14, 128646774784, SEEK_DATA) = 128646774784
lseek(14, 128646774784, SEEK_HOLE) = 317708828672
lseek(14, 128646709248, SEEK_DATA) = 128646709248
Can yu help us figuring out what is going on?
If any other informations is needed I'll provide 'em ASAP.
Thank you for your patience and your time,
Regards
Francesco
4 years, 10 months
oVirt vdsm Bulk-API deep dive
by Nir Soffer
I want to share an interesting talk on the new StorageDomain dump API.
When:
Thursday, July 9⋅16:00 – 16:45 (Israel time)
Where:
https://redhat.bluejeans.com/212781836
Who:
Amit Bawer
StorageDomain dump API is a faster way to query oVirt storage domain
contents, optimized
for bulk operations such as collecting sos reports or importing
storage domains during disaster
recovery.
In recent tests we found that it is 295 times faster compared with
vdsm-tool dump-volume-chains:
https://bugzilla.redhat.com/show_bug.cgi?id=1557147#c26
What can you do with this API?
Query volumes metadata by disk id:
# vdsm-client StorageDomain dump
sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] |
select(.image=="d7ead22a-0fbf-475c-a62f-6f7bc8473acb")'
{
"apparentsize": 6442450944,
"capacity": 6442450944,
"ctime": 1593128884,
"description":
"{\"DiskAlias\":\"fedora-32.raw\",\"DiskDescription\":\"Uploaded
disk\"}",
"disktype": "DATA",
"format": "RAW",
"generation": 0,
"image": "d7ead22a-0fbf-475c-a62f-6f7bc8473acb",
"legality": "LEGAL",
"parent": "00000000-0000-0000-0000-000000000000",
"status": "OK",
"truesize": 6442455040,
"type": "PREALLOCATED",
"voltype": "LEAF"
}
Find all templates:
# vdsm-client StorageDomain dump
sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] |
select(.voltype=="SHARED")'
{
"apparentsize": 6442450944,
"capacity": 6442450944,
"ctime": 1593116563,
"description":
"{\"DiskAlias\":\"fedora-32.raw\",\"DiskDescription\":\"Uploaded
disk\"}",
"disktype": "DATA",
"format": "RAW",
"generation": 0,
"image": "9b62b5fa-920e-4d0c-baf6-40406106e48e",
"legality": "LEGAL",
"parent": "00000000-0000-0000-0000-000000000000",
"status": "OK",
"truesize": 6442516480,
"type": "PREALLOCATED",
"voltype": "SHARED"
}
Please join the talk if you want to learn more.
Cheers,
Nir
4 years, 10 months
Re: [oVirt Jenkins] ovirt-system-tests_he-basic-suite-master - Build # 1655 - Still Failing!
by Yedidyah Bar David
On Sun, Jun 28, 2020 at 6:23 AM <jenkins(a)jenkins.phx.ovirt.org> wrote:
>
> Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
> Build: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1655/
This fails for some time now. Checked last one ^^, and it failed in:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/16...
2020-06-27 23:05:02,253-0400 INFO ansible task start {'status': 'OK',
'ansible_type': 'task', 'ansible_playbook':
'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml',
'ansible_task': 'ovirt.hosted_engine_setup : Check OVF_STORE volume
status'}
2020-06-27 23:05:02,253-0400 DEBUG ansible on_any args TASK:
ovirt.hosted_engine_setup : Check OVF_STORE volume status kwargs
is_conditional:False
2020-06-27 23:05:02,254-0400 DEBUG ansible on_any args localhostTASK:
ovirt.hosted_engine_setup : Check OVF_STORE volume status kwargs
2020-06-27 23:05:03,816-0400 DEBUG ansible on_any args
<ansible.executor.task_result.TaskResult object at 0x7f454e3d1eb8>
kwargs
...
2020-06-27 23:09:39,166-0400 DEBUG var changed: host "localhost" var
"ovf_store_status" type "<class 'dict'>" value: "{
"changed": true,
"failed": true,
"msg": "All items completed",
"results": [
{
"ansible_loop_var": "item",
"attempts": 12,
Meaning, it ran 12 times the command:
vdsm-client Volume getInfo
storagepoolID=41c9fdea-b8e9-11ea-ae2a-5452c0a8c863
storagedomainID=10a69775-8fb6-437d-9e78-2ecfd77c0a45
imageID=c2ad2065-1c8b-4ec1-afdd-f7cefc708cf9
volumeID=6b835f55-a512-4f83-9d25-f6837d8b5cb1
and never got a result with output including "Updated". Any idea?
Perhaps it's just a bug in the test, and we should test something else
other than "Updated"?
Or perhaps there was some other problem but I failed to find it?
Thanks,
> Build Number: 1655
> Build Status: Still Failing
> Triggered By: Started by timer
>
> -------------------------------------
> Changes Since Last Success:
> -------------------------------------
> Changes for Build #1643
> [Galit] Add missing repo which has collectd-write_syslog
>
> [Sandro Bonazzola] Revert "ovirt-release: run only on fc29 nodes"
>
>
> Changes for Build #1644
> [Michal Skrivanek] make GLANCE failures fatal
>
>
> Changes for Build #1645
> [Michal Skrivanek] use // in glance URL
>
>
> Changes for Build #1646
> [Galit] Fix: No module named ost_utils.memoized
>
>
> Changes for Build #1647
> [Martin Necas] ansible: fix deploy scripts to el8
>
> [arachmani] Add arachman(a)redhat.com to jenkins recipient lists
>
>
> Changes for Build #1648
> [Martin Necas] ansible: fix deploy scripts to el8
>
> [Evgheni Dereveanchin] Do not run jenkins check-patch job on fc29
>
>
> Changes for Build #1649
> [Martin Necas] ansible: fix deploy scripts to el8
>
>
> Changes for Build #1650
> [Martin Necas] ansible: fix deploy scripts to el8
>
>
> Changes for Build #1651
> [Galit] Add centos8.2 official to templates
>
>
> Changes for Build #1652
> [Evgeny Slutsky] he-basic-suite-master: Add Domain_name to the host when vm_run
>
> [Gal Ben Haim] Add gcc to fcraw
>
> [Sandro Bonazzola] pipelines: add ovirt-dependencies
>
>
> Changes for Build #1653
> [Evgeny Slutsky] he-basic-suite-master: Add Domain_name to the host when vm_run
>
>
> Changes for Build #1654
> [Galit] Add centos 8.2 image: el8.2-base
>
> [Evgheni Dereveanchin] do not expand values in run_oc_playbook
>
>
> Changes for Build #1655
> [Galit] Add centos 8.2 image: el8.2-base
>
>
>
>
> -----------------
> Failed Tests:
> -----------------
> No tests ran.
--
Didi
4 years, 10 months
sic transit gloria mundi
by Dan Kenigsberg
Seven years ago, in June 2013, Giuseppe Valarelli introduced
tests/functional/networkTests.py in
https://gerrit.ovirt.org/#/c/14840/. It held high aspirations: to
provide a comprehensive functional test for Vdsm's network API, in
order to ease development of new code and reduce chances of
reintroducing old bugs.
390 commits later, Andrej Cernak removed the last lines of code from
that file in https://gerrit.ovirt.org/#/c/109182/.
In between, many many other developers contributed to the set of
tests, and later dissipated its content to multiple pytest-driven test
modules. tests/functional/networkTests.py was never an emblem of great
software design; it makes me happy to see it gone. But it was very
useful and its spirit lives on: API tests are now the rule of Vdsm
development, not an exception.
I'd like to thank all the people who made this possible.
$ git log 1c5f15c9b~ -- tests/functional/networkTests.py|grep
Author|sort|uniq -c|sort -nr
72 Author: Edward Haas <edwardh(a)redhat.com>
58 Author: Ido Barkan <ibarkan(a)redhat.com>
46 Author: Ondřej Svoboda <osvoboda(a)redhat.com>
40 Author: Petr Horáček <phoracek(a)redhat.com>
35 Author: Dan Kenigsberg <danken(a)redhat.com>
34 Author: Antoni S. Puimedon <asegurap(a)redhat.com>
24 Author: Bell Levin <blevin(a)redhat.com>
23 Author: Leon Goldberg <lgoldber(a)redhat.com>
18 Author: Giuseppe Vallarelli <gvallare(a)redhat.com>
8 Author: Assaf Muller <amuller(a)redhat.com>
5 Author: Sveta Haas <sveta.haas(a)gmail.com>
4 Author: Mark Wu <wudxw(a)linux.vnet.ibm.com>
3 Author: Nir Soffer <nsoffer(a)redhat.com>
2 Author: Yaniv Bronhaim <ybronhei(a)redhat.com>
2 Author: Sagi Shnaidman <sshnaidm(a)redhat.com>
2 Author: Petr Sebek <psebek(a)redhat.com>
2 Author: Miguel Angel Ajo <miguelangel(a)ajo.es>
2 Author: Andrej Cernek <acernek(a)redhat.com>
1 Author: Viliam Serecun <v.serecun(a)gmail.com>
1 Author: Vered Volansky <vvolansk(a)redhat.com>
1 Author: Timothy Asir Jeyasingh <tjeyasin(a)redhat.com>
1 Author: shlad <a.shlado(a)gmail.com>
1 Author: pkliczewski <piotr.kliczewski(a)gmail.com>
1 Author: Petr Benas <pbenas(a)redhat.com>
1 Author: Marcin Sobczyk <msobczyk(a)redhat.com>
1 Author: Marcin Mirecki <mmirecki(a)redhat.com>
1 Author: Ilia Meerovich <iliam(a)redhat.com>
1 Author: Genadi Chereshnya <gcheresh(a)redhat.com>
1 Author: Dominik Holler <dholler(a)redhat.com>
4 years, 10 months