April 2020 - Devel - oVirt List Archives

PoC - using pre-built VM images in OST
by Marcin Sobczyk 29 Apr '20

29 Apr '20

Hi, recently I've been working on a PoC for OST that replaces the usage of lago templates with pre-built, layered VM images packed in RPMs [2][7]. What's the motivation? There are two big pains around OST - first one is that it's slow and the second one is it uses lago, which is unmaintained. How is OST working currently? Lago launches VMs based on templates. It actually has its own mechanism for VM templating - you can find the ones that we currently use here [1]. How these templates are created? There is a multiple-page doc somewhere that describes the process, but few are familiar with it. These templates are nothing special really - just a xzipped qcow with some metadata attached. The proposition here is to replace those templates with RPMs with qcows inside. The RPMs themselves would be built by a CI pipeline. An example of a pipeline like this can be found here [2]. Why RPMs? It ticks all the boxes really. RPMs provide: - tried and well known mechanisms for packaging, versioning, and distribution instead of lago's custom ones - dependencies which permit to layer the VM images in a controllable way - we already install RPMs when running OST, so using the new ones is a matter of adding some dependencies How the image building pipeline works? [3] - we download a dvd iso for installation of the distro - we use 'virt-install' with the dvd iso + kickstart file to build a 'base' layer qcow image - we create another qcow image that has the 'base' image as the backing one. In this image we use 'virt-customize' to run 'dnf upgrade'. This is our 'upgrade' layer. - we create two more qcow images that have the 'upgrade' image as the backing one. On one of them we install the 'ovirt-host' package and on the other the 'ovirt-engine'. These are our 'host-installed' and 'engine-installed' layers. - we create 4 RPMs for these qcows: * ost-images-base * ost-images-upgrade * ost-images-host-installed * ost-images-engine-installed - we publish the RPMs to templates.ovirt.org/yum/ DNF repository (not implemented yet) Each of those RPMs holds their respective qcow image. They also have proper dependencies set up - since 'upgrade' layer requires 'base' layer to be functional, it has an RPM requirement to that package. Same thing happens for '*-installed' packages which depend on 'upgrade' package. Since this is only a PoC there's still a lot of room for improvement around the pipeline. The 'base' RPM would be actually built very rarely, since it's a bare distro, and the 'upgrade' and '*-installed' RPMs would be built nightly. This would allow us to simply type 'dnf upgrade' on any machine and have a fresh set of VMs ready to be used with OST. Advantages: - we have CI for building OST images instead of current, obscure template creating process - we get rid of lots of unnecessary preparations that are done during each OST run by moving stuff from 'deploy scripts' [4] to image-building pipeline - this should speed up the runs a lot - if the nightly pipeline for building images is not successful, the RPMs won't be published - OST will use the older ones. This makes a nice "early error detection" mechanism and can partially mitigate situations where everything is blocked because of some, i.e. dependency issues. - it's another step for removing responsibilities from lago - the pre-built VM images can be used for much more than OST - functional testing of vdsm/engine on a VM? We have an image for that - we can build images for multiple distros, both u/s and d/s, easily Caveats: - we have to download the RPMs before running OST and that takes time, since they're big. This can be handled by having them cached on the CI slaves though. - current limitations of CI and lago force us to make a copy of the images after installation so they can be seen both by the processes in the chroot and libvirt, which is running outside of chroot. Right now they're placed in '/dev/shm' (which would actually make some sense if they could be shared among all OST runs on the slave, but that's another story). There are some possible workarounds around that problem too (like running pipelines on bare metal machines with libvirt running inside chroot) - multiple qcow layers can slow down the runs because there's a lot of jumping around. This can be handled by i.e. introducing a meta package that squashes all the layers into one. - we need a way to run OST with custom-built artifacts. There are multiple ways we can approach it: * use 'upgrade' layer and not '*-installed' one * first build your artifacts, then build VM image RPMs that have your artifacts installed and pass those RPMs to OST run * add 'ci build vms' that will do both ^^^ steps for you Even here we can still benefit from using pre-built images - we can create a 'deps-installed' layer that sits between 'upgrade' and '*-installed' and contains all vdsm's/engine's dependencies. Some numbers So let's take a look at two OST runs - first one that uses the old way of working [5] and one that uses the new pre-built VM images [6]. The hacky change that allows us to use the pre-built images is here [7]. Here are some running times: - chroot init: 00:34 for the old way vs 14:03 for pre-built images This happens because the slave didn't have the new RPMs and chroot cached, so it took a lot of time to download them - the RPMs are ~2GB currently. When they will be available in cache it will get much closer to the old-way timing. - deployment times: * engine 08:09 for the old way vs 03:31 for pre-built images * host-1 05:05 for the old way vs 02:00 for pre-built images Here we can clearly see the benefits. This is without any special fine tuning really - even when using pre-built images there's still some deployment done, which can be moved to image-creating pipeline. Further improvements We could probably get rid of all the funny custom repository stuff that we're doing right now because we can put everything that's necessary to pre-built VM images. We can ship the images with ssh key injected - currently lago injects an ssh key for root user in each run, which requires selinux relabeling, which takes a lot of time. We can try creating 'ovirt-deployed' images, where the whole ovirt solution would be already deployed for some tests. WDYT? Regards, Marcin [1] https://templates.ovirt.org/repo/ [2] https://gerrit.ovirt.org/#/c/108430/ [3] https://gerrit.ovirt.org/#/c/108430/6/ost-images/Makefile.am [4] https://github.com/oVirt/ovirt-system-tests/tree/master/common/deploy-scrip… [5] https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-test… [6] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/9027/… [7] https://gerrit.ovirt.org/#/c/108610/

5 6

git pull throws error fatal: the remote end hung up unexpectedly
by Ritesh Chikatwar 29 Apr '20

29 Apr '20

Hello, git pull throws fatal: the remote end hung up unexpectedly How to solve this problem?

1 1

ovirt-system-tests_he-basic-suite-4.3 fails on storage domain unreachable
by Sandro Bonazzola 29 Apr '20

29 Apr '20

Debugging https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/427 on engine.log I see: At 2020-04-28 23:48:18,378-04 I see:SetVdsStatusVDSCommandParameters:{ hostId='b34db269-5351-4653-9a0c-90a9154cd687', status='NonOperational', nonOperationalReason='STORAGE_DOMAIN_UNREACHABLE', stopSpmFailureLogged='false', maintenanceReason='null'} So, when test try to put host1 in local maintenance at 2020-04-28 23:59:51 it fails with: Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_NO_ALTERNATE_HOST_FOR_HOSTED_ENGINE vdsm on host0 shows a traceback 2020-04-28 23:43:04,944-0400 ERROR (jsonrpc/0) [vds] setKsmTune API call failed. (API:1660) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1657, in setKsmTune supervdsm.getProxy().ksmTune(tuningParams) File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in ksmTune File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) IOError: [Errno 22] Invalid argument which seems unrelated but maybe worth to be investigated by storage team. +Tal Nisan <tnisan(a)redhat.com> can you look into this? More close to the failure on host0, I see: 2020-04-28 23:49:58,775-0400 ERROR (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') The vm start process failed (vm:934) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 868, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2895, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future 2020-04-29T03:49:55.582536Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on: Failed to get shared "write" lock Is another process using the image [/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]? 2020-04-28 23:49:58,775-0400 INFO (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Changed state to Down: internal error: qemu unexpectedly closed the monitor: 2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future 2020-04-29T03:49:55.582536Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on: Failed to get shared "write" lock Is another process using the image [/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]? (code=1) (vm:1702) 2020-04-28 23:49:58,799-0400 INFO (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection (guestagent:455) 2020-04-28 23:49:58,849-0400 INFO (jsonrpc/1) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:192.168.200.99,49938, vmId=b6ca2e94-df8b-48e9-b0ee-2bc0f939786a (api:48) 2020-04-28 23:49:58,851-0400 INFO (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Release VM resources (vm:5186) 2020-04-28 23:49:58,851-0400 WARN (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') trying to set state to Powering down when already Down (vm:626) 2020-04-28 23:49:58,851-0400 INFO (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection (guestagent:455) +Ryan Barry <rbarry(a)redhat.com> can you check the qemu-kvm warning? Help understanding why storage domain became unreachable is welcome. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo(a)redhat.com <https://www.redhat.com/>* <https://www.redhat.com/en/summit?sc_cid=7013a000002D2QxAAK>* *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

1 0

ovirt-system-tests_he-basic-suite-4.3 - internal repo not available
by Sandro Bonazzola 29 Apr '20

29 Apr '20

Hi, I was trying to debug https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/427 and in logs at https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/427/art… I see: 2020-04-29 02:14:21,289::ssh.py::ssh::96::lago.ssh::DEBUG::Command 81de70d2 on lago-he-basic-suite-4-3-host-0 errors: http://192.168.200.1:8585/default/el7/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found Trying other mirror. Can you check why repo is not available in this stage? -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo(a)redhat.com <https://www.redhat.com/>* <https://www.redhat.com/en/summit?sc_cid=7013a000002D2QxAAK>* *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

1 0

Ansible runner service update
by Martin Necas 29 Apr '20

29 Apr '20

Hello everybody, please *update* to the newest version of *ansible-runner-service-dev,* it should have been ansible-runner-service-dev-1.0.2. We have merged patch [3] on the engine to master, if you will have some issues with host installation please try to rebase. If you will have further issues please let me know. If needed list of builds for the development environment: [1] [2]. Martin Necas [1] https://cbs.centos.org/koji/packageinfo?packageID=7695 [2] https://copr.fedorainfracloud.org/coprs/mnecas/ansible-runner/ [3] https://gerrit.ovirt.org/#/c/108532/

1 0

ovirt-imageio image upload feedback on engine 4.4.0-31
by Guilherme De Oliveira Santos 28 Apr '20

28 Apr '20

Hi people, I recently had to upload an image on a 4.4.0-31 engine and Nir told me to share here the feedback. I installed ovirt-imageio-common-2.0.5-0.el8ev.x86_64 and created the following script to upload it using sdk: #!/usr/bin/bash python3 /root/ovirt-engine-sdk/sdk/examples/upload_disk.py \ --engine-url "https://`hostname`/" \ --username admin@internal \ --password-file pssword \ --cafile /tmp/ca.pem \ --disk-format qcow2 \ --disk-sparse \ --sd-name nfs_0 \ /tmp/rhel-guest-image-7.6-210.x86_64.qcow2 The image got uploaded pretty quickly and I didn't have any trouble... Cheers, Gui

2 1

engine-setup failing on restarting ovirt-imageio
by Sandro Bonazzola 28 Apr '20

28 Apr '20

Failing job is https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6794/ Here's the relevant log in engine-setup: 2020-04-27 09:25:35,457-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'stop', 'ovirt-imageio.service'), executable='None', cwd='None', env=None 2020-04-27 09:25:35,652-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'stop', 'ovirt-imageio.service'), rc=0 2020-04-27 09:25:35,653-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'stop', 'ovirt-imageio.service') stdout: 2020-04-27 09:25:35,653-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'stop', 'ovirt-imageio.service') stderr: 2020-04-27 09:25:35,653-0400 DEBUG otopi.plugins.otopi.services.systemd systemd.state:170 starting service ovirt-imageio 2020-04-27 09:25:35,654-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'ovirt-imageio.service'), executable='None', cwd='None', env=None 2020-04-27 09:25:36,631-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'ovirt-imageio.service'), rc=1 2020-04-27 09:25:36,632-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-imageio.service') stdout: 2020-04-27 09:25:36,632-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-imageio.service') stderr: Job for ovirt-imageio.service failed because the control process exited with error code. See "systemctl status ovirt-imageio.service" and "journalctl -xe" for details. 2020-04-27 09:25:36,633-0400 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt_imageio/config.py", line 190, in _closeup_resatrt_service state=state, File "/usr/share/otopi/plugins/otopi/services/systemd.py", line 181, in state service=name, RuntimeError: Failed to start service 'ovirt-imageio' 2020-04-27 09:25:36,636-0400 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Closing up': Failed to start service 'ovirt-imageio' -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo(a)redhat.com <https://www.redhat.com/>* <https://www.redhat.com/en/summit?sc_cid=7013a000002D2QxAAK>* *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

5 16

Uploading iso / ovirt-imageio
by Artur Socha 27 Apr '20

27 Apr '20

Hi, After a long break I am trying setup my engine so that I could upload images (ISO) there and use them when spinning new VMs. First I tried recent master (FC31) and then latest 4.4 beta from RPMs (Centos8). I have ovirt-imageio daemon running on the same host engine is running. (with the simplest possible configuration - TLS disabled) Unfortunately when I hit 'test connection' in admin ui/storage/domain I keep seeing error message complaining about ovirt-imageio-proxy setup. I know that recently there has been quite a lot of changes related to the next version of imageio. Is there something else besides ovirt-imageio daemon that I should have installed/configured? Perhaps there is some documentation I could use? thanks! -- Artur Socha Senior Software Engineer, RHV Red Hat

2 4

Failed to run engine setup on existing dev-env
by Eyal Shenitzky 26 Apr '20

26 Apr '20

Hi, I tried to run engine-setup on an existing dev-env and failed with the following error - [ ERROR ] Failed to execute stage 'Misc configuration': Cannot locate application option ImageProxyAddress The error appears only after patch https://gerrit.ovirt.org/#/c/108349/ was merged. Is there any manual steps needed? -- Regards, Eyal Shenitzky

3 6

Backup is not supported by the host, Ovirt 4.4 Beta
by francesco＠shellrent.com 26 Apr '20

26 Apr '20

Hi all, I'm trying to experiment with incremental backup using the example script provided on sdk git page (https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_v…) I installed a fresh engine on a CentOS 8 VM and installed a fresh node on CentOS 8 as well, connected to the engine without any problems. All libvirt packages version 5.6 and ovirt-imageio-common version 2.0.3 . Passing all the needed args to the script (I hardcoded connections var like user and password) it throws the following error: ``` [root@centos8 ~]# python3 backup_vm.py full --engine-url https://ovirt-engine-fqdn.com --username admin@internal --password-file ./passwd --backup-dir /home 11d70eb0-4d7c-4308-82a6-470e21d80ecd [ 0.0 ] Starting full backup for vm 11d70eb0-4d7c-4308-82a6-470e21d80ecd Traceback (most recent call last): File "backup_vm.py", line 397, in <module> main() File "backup_vm.py", line 141, in main args.command(args) File "backup_vm.py", line 154, in cmd_full backup = start_backup(connection, args) File "backup_vm.py", line 247, in start_backup disks=disks File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 33583, in add return self._internal_add(backup, headers, query, wait) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add return future.wait() if wait else future File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback self._check_fault(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot backup VM. Backup is not supported by the host (centos8).]". HTTP response code is 409. ``` The VM is created from centOS 7 template imported from ovirt-image-repository, and the options "Enable Incremental Backup" on the disk is ticked. I'm definitely missing something... But what? On the top of the script, I read "Using this example requires a special libvirt version supporting incremental backup.". Which version of libvirt? I guessed the 5.6 (or at least not the 4.5 shipped in CentOS 7). Thank you for your time and help, Francesco

4 4