Snapshot Deletion Issue
by Shubha Kulkarni
Hello,
I am investigating an issue with deleting snapshot in oVirt 4.3.10. Basically the delete snapshot operation fails and I am seeing following error in VDSM(vdsm-4.30.46) log -
=============================================================================================================================================
2021-03-17 21:38:01,346-0400 INFO (jsonrpc/1) [virt.vm] (vmId='4b04639c-386e-463c-8a8a-dfd3bc46d306') Starting merge with jobUUID=u'e6206d9f-6899-40b7-857a-3be3dd42d77d', original chain=ac065664-ad6c-4ad6-aea2-9558df71d41d < 210900dd-ff19-4a00-8706-926cb192b0db < 273fc001-1c18-4234-b2b6-f0485e5d13ef < e1e10354-ae43-456f-9341-1bb81c5bf960 < 7985a853-4d47-428b-bb5b-b34a38cc57e0 (top), disk='sda', base='sda[2]', top='sda[1]', bandwidth=0, flags=8 (vm:5954)
2021-03-17 21:38:01,354-0400 ERROR (jsonrpc/1) [virt.vm] (vmId='4b04639c-386e-463c-8a8a-dfd3bc46d306') Live merge failed (job: e6206d9f-6899-40b7-857a-3be3dd42d77d) (vm:5960)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5958, in merge
bandwidth, flags)
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 719, in blockCommit
if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)
libvirtError: internal error: qemu block name 'json:{"backing": {"backing": {"backing": {"driver": "raw", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/nash-nfs7:_nfs_shubha__ovirt__2_data/8bce5be7-aede-4743-b3fd-1c199880892f/images/eb3cb8f2-9544-472f-946b-646eab9c621f/ac065664-ad6c-4ad6-aea2-9558df71d41d"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/nash-nfs7:_nfs_shubha__ovirt__2_data/8bce5be7-aede-4743-b3fd-1c199880892f/images/eb3cb8f2-9544-472f-946b-646eab9c621f/210900dd-ff19-4a00-8706-926cb192b0db"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/nash-nfs7:_nfs_shubha__ovirt__2_data/8bce5be7-aede-4743-b3fd-1c199880892f/images/eb3cb8f2-9544-472f-946b-646eab9c621f/273fc001-1c18-4234-b2b6-f0485e5d13ef"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/nash-nfs7:_nfs_shubha__ovirt__2_data/8bce5be7-aede-4743-b3fd-1c199880892f/images/eb3cb8f2-9544-472f-946b-646eab9c621f/e1e10354-ae43-456f-9341-1bb81c5bf960"}}' doesn
2021-03-17 21:38:01,368-0400 INFO (jsonrpc/1) [api.virt] FINISH merge return={'status': {'message': 'Merge failed', 'code': 52}}
============================================================================================================================================
I found out that there is an issue logged here https://bugzilla.redhat.com/show_bug.cgi?id=1785939. However I am not able to see how it was fixed and whether it was merged in oVirt. Where can I find the relevant commit (in vdsm and/or libvirt)?
Thanks
3 years, 8 months
Re: docs: pointers to more in-depth internals?
by Nir Soffer
On Tue, Mar 16, 2021 at 10:47 PM Greg King <greg.king(a)oracle.com> wrote:
> I am new to vdsm and trying to understand the architecture/internals much
> better
>
Welcome to vdsm Greg!
The ovirt documentation for architecture I have found so far seems to be
> relatively high level
>
And it is mostly outdated, but we don't have anything better.
> My effort to understand the architecture by walking through the vdsm code
> using pdb/rpdb is slow and probably not all that efficient
>
>
>
> Does anyone have pointers to documentation that might explain the vdsm
> modules, classes and internals a little more in depth?
>
I don't think we have more detailed documentation, but there are lot of
talks and slide decks that give more info on specific topics, and are
usually
are more updated:
https://www.ovirt.org/community/archived_conferences_presentations.html
There is also lot of content on youtube, here some example that I could
find easily:
- [oVirt 3.6 deep dive] - live storage migration between mixed domains
https://www.youtube.com/watch?v=BPy29Q__VV4
- oVirt 4.1 deep dive - VM leases
https://www.youtube.com/watch?v=MVa-4fQo2V8
- Back to the future – incremental backup in oVirt
https://www.youtube.com/watch?v=X-xHD9ddN6s
- oVirt 4k - teaching an old dog new tricks
https://www.youtube.com/watch?v=Q1VQxjYEzDY
>
> I’d also like to understand where I might be able to add rpdb.set_trace()
> so I can step through functions being called in libvirt.py
>
I don't think using a debugger is very helpful with vdsm, since vdsm is not
designed for stopping a thread for unlimited time. In some cases the system
will log warning and traceback every 60 seconds about blocked worker.
In other cases monitoring code may fail to update stats, which may cause
engines to deactivate a host or migrate vms or other trouble.
The best way to debug and understand vdsm is to follow the logs, and add
move logs when needed. The main advantage compared with a debugger is
that the time spent with the logs will pay back when you have to debug real
issues in user setup, when logs are the only available resource.
Having said that, being able to follow the entire flow by printing a
traceback
is a great way to understand how the system works.
You can use vdsm.common.concurrent.format_traceback:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
To print traceback at interesting points. For tracing function from the
libvirt
python binginding, you can modify libvirtconnection.py:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
This module creates a connection, and wraps libvirt.virDomain with a wrapper
that panics on fatal errors. You can modify the wrapper to log a traceback
for all or some of libvirt.virDomain functions.
Another option it to modify the virDomain wrapper to log a traceback:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
For example here:
https://github.com/oVirt/vdsm/blob/114121ab122a0cd5e529807b938b3506f247f4...
Good luck with your vdsm ride!
Nir
3 years, 8 months
ovirt-system-tests hosted-engine suites and CentOS Stream - current status
by Yedidyah Bar David
Hi all,
Right now, ovirt appliance is built nightly, and publishes (to
ovirt-master-snapshot) two packages: ovirt-engine-appliance (the
"regular" one, based on CentOS Linux) and
ovirt-engine-appliance-centos-stream (based on CentOS Stream).
This can already be used by anyone that wants to try this manually.
For trying in CI:
1. A pending patch to ost-images [1] can be merged once it passes the
manual build I ran (link in a gerrit comment there). Should take a few
more hours. This patch changes the existing image that was generated,
called ost-images-el8stream-he-installed, to include a stream
appliance (from current repos).
2. A patch for OST [2] can be used, as needed, for manual runs. I
updated it now to expect the output of [1] once it's merged, so should
probably be rebased and/or 'ci test/ci build' once [1] is merged (and
image is published).
TODO:
1. Decide how we want to continue, going forward. If we aim at
complete move to Stream in 4.4.6, perhaps now is the time to start...
An alternative is to somehow support both in parallel - will be more
complex, obviously.
2. Handle ovirt-node
Best regards,
[1] https://gerrit.ovirt.org/c/ost-images/+/113633
[2] https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977
--
Didi
3 years, 8 months
Re: basic suite fails on test_metrics_and_log_collector
by Michal Skrivanek
> On 17. 3. 2021, at 13:53, Dana Elfassy <delfassy(a)redhat.com> wrote:
>
> Adding +Marcin Sobczyk <mailto:msobczyk@redhat.com>
>
> On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <didi(a)redhat.com <mailto:didi@redhat.com>> wrote:
> On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David <didi(a)redhat.com <mailto:didi@redhat.com>> wrote:
> >
> > Hi all,
> >
> > This started a few days ago [1] and randomly happens since then:
> >
> > E DEBUG: Configuration:
> > E DEBUG: command: collect
> > E DEBUG: Traceback (most recent call last):
> > E DEBUG: File
> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> > line 2067, in <module>
> > E DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"])
> > E DEBUG: Exception: /dev/shm/log directory is not
> > empty.ERROR: /dev/shm/log directory is not empty.non-zero return code
> >
> > Michal tried to fix this by using a random directory but it still fails [2]:
> >
> > DEBUG: command: collect
> > DEBUG: Traceback (most recent call last):
> > DEBUG: File "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> > line 2067, in <module>
> > DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"])
> > DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR:
> > /dev/shm/kaN7uY directory is not empty.non-zero return code
> >
> > Since I suppose that the randomness of mktemp is good enough, it must
> > be something else. Also, the last successful run before [1] used the
> > same OST git commit (same code), so I do not think it's something in
> > OST's code.
> >
> > Any idea?
> >
> > I think I'll push a patch to create and use the directory right before
> > calling ovirt-log-collector, which is probably better in other ways.
>
> My patch [1] still fails, with a somewhat different error message, but
> this made me check further, and while I still do not understand, I have
> this to add:
>
> In the failing runs, ovirt-log-collector is called *twice* in parallel. E.g.
> in [2] (the check-patch of [1]):
>
> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]:
> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]:
> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
>
> It also generates two logs, which you can check/compare.
>
> It's the same for previous ones, e.g. latest nightly [3][4]:
>
> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]:
> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
> the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For
> complete SELinux messages run: sealert -l
> d03a8655-9430-4fcf-9892-3b4df1939899
> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
> the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012*****
> Plugin catchall (100. confidence) suggests
> **************************#012#012If you believe that systemd should
> be allowed unlink access on the
> ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by
> default.#012Then you should report this as a bug.#012You can generate
> a local policy module to allow this access.#012Do#012allow this access
> for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow
> -M my-systemd#012# semodule -X 300 -i my-systemd.pp#012
> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]:
> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
>
> Any idea what might have caused this to start happening? Perhaps
> a bug in ansible, or ansible-runner? It reminds me of [5].
> Adding Dana and Martin.
>
> I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker.
it’s from January and tehre are no comments there. Dana, any update?
It does look serious but perhaps not really hit in real world scenarios?
>
> Best regards,
>
> [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875 <https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875>
>
> [2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/159... <https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/159...>
>
> [3] https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night... <https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...>
>
> [4] https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night... <https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...>
>
> [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707 <https://bugzilla.redhat.com/show_bug.cgi?id=1917707>
>
> >
> > Best regards,
> >
> > [1] https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night... <https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...>
> >
> > [2] https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night... <https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...>
> >
> >
> > --
> > Didi
>
>
>
> --
> Didi
>
> _______________________________________________
> Infra mailing list -- infra(a)ovirt.org
> To unsubscribe send an email to infra-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/PSO5IRX73FS...
3 years, 8 months
Migrate VMs between DCs
by Daniel Gurgel
How to migrate virtual servers between different data centers that do not share the same SAN?
There are limitations, including connection technology, depending on the Storage Model used.
For example, Dell PowerStore uses only 1 iSCSI network and Dell EMC Compellent, 2 separate iSCSI networks.
This connection is not possible, even for security and redundancy reasons.
Exports Domains are marked as discontinued in 4.4.x.
Are there plans for 4.4.x or 4.5.x Live Migrate Storage to be implemented?
3 years, 8 months
Re: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 962 - Still Failing!
by Yedidyah Bar David
On Tue, Mar 16, 2021 at 7:06 AM <jenkins(a)jenkins.phx.ovirt.org> wrote:
>
> Project: https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/
> Build: https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
> Build Number: 962
> Build Status: Still Failing
> Triggered By: Started by timer
>
> -------------------------------------
> Changes Since Last Success:
> -------------------------------------
> Changes for Build #953
> [Michal Skrivanek] randomize /dev/shm logcollector tmp directory
>
>
> Changes for Build #954
> [Michal Skrivanek] randomize /dev/shm logcollector tmp directory
>
>
> Changes for Build #955
> [Michal Skrivanek] randomize /dev/shm logcollector tmp directory
>
>
> Changes for Build #956
> [Michal Skrivanek] randomize /dev/shm logcollector tmp directory
>
>
> Changes for Build #957
> [Michal Skrivanek] randomize /dev/shm logcollector tmp directory
>
>
> Changes for Build #958
> [Michal Skrivanek] randomize /dev/shm logcollector tmp directory
>
>
> Changes for Build #959
> [Michal Skrivanek] randomize /dev/shm logcollector tmp directory
>
>
> Changes for Build #960
> [Andrej Cernek] pylint: Upgrade to 2.7
>
>
> Changes for Build #961
> [Andrej Cernek] pylint: Upgrade to 2.7
>
>
> Changes for Build #962
> [Andrej Cernek] pylint: Upgrade to 2.7
>
>
>
>
> -----------------
> Failed Tests:
> -----------------
> 1 tests failed.
> FAILED: basic-suite-master.test-scenarios.test_001_initialize_engine.test_set_hostnames
>
> Error Message:
> failed on setup with "TypeError: __new__() missing 2 required positional arguments: 'version' and 'repo'"
>
> Stack Trace:
> ansible_by_hostname = <function module_mapper_for at 0x7ffbad0acc80>
>
> @pytest.fixture(scope="session", autouse=True)
> def check_installed_packages(ansible_by_hostname):
> vms_pckgs_dict_list = []
> for hostname in backend.default_backend().hostnames():
> vm_pckgs_dict = _get_custom_repos_packages(
> > ansible_by_hostname(hostname))
>
> ost_utils/ost_utils/pytest/fixtures/check_repos.py:39:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ost_utils/ost_utils/pytest/fixtures/check_repos.py:55: in _get_custom_repos_packages
> repo_name)
> ost_utils/ost_utils/pytest/fixtures/check_repos.py:69: in _get_installed_packages
> Package(*line) for line in result
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> .0 = <list_iterator object at 0x7ffba6e97860>
>
> > Package(*line) for line in result
> ]
> E TypeError: __new__() missing 2 required positional arguments: 'version' and 'repo'
This failed, because 'dnf repo-pkgs' has split the output to two
lines, so the first
didn't include a version [1]:
lago-basic-suite-master-host-1 | CHANGED | rc=0 >>
Installed Packages
ovirt-ansible-collection.noarch 1.3.2-0.1.master.20210315141358.el8 @extra-src-1
python3-ovirt-engine-sdk4.x86_64
4.4.10-1.20210315.gitf8b9f2a.el8 @extra-src-1
We should either give up on this, or rewrite the call 'dnf repo-pkgs'
in some other
language that does not require parsing of human-targeted output
(perhaps python or
ansible), or amend a bit the current code and hope it will survive longer...
Trying last one:
https://gerrit.ovirt.org/c/ovirt-system-tests/+/113895
[1] https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
>
> ost_utils/ost_utils/pytest/fixtures/check_repos.py:69: TypeError_______________________________________________
> Infra mailing list -- infra(a)ovirt.org
> To unsubscribe send an email to infra-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/HBWJEEDHVVR...
--
Didi
3 years, 8 months
docs: pointers to more in-depth internals?
by Greg King
I am new to vdsm and trying to understand the architecture/internals much better
The ovirt documentation for architecture I have found so far seems to be relatively high level
My effort to understand the architecture by walking through the vdsm code using pdb/rpdb is slow and probably not all that efficient
Does anyone have pointers to documentation that might explain the vdsm modules, classes and internals a little more in depth?
I'd also like to understand where I might be able to add rpdb.set_trace() so I can step through functions being called in libvirt.py
[oracle-email-sig-198324-355094]
Gregory King | Software Development Manager | +1.303.272.2427
Oracle Virtualization Sustaining Engineering
500 Eldorado Boulevard Build 5 | Broomfield Colorado 80021
Mobile: +1.303.968.8169 | Fax: +1.303.272.2427
3 years, 8 months