Cannot delete snapshot
by Giulio Casella
Hi,
I'm having (another) issue working with snapshots.
Since a few days my backup system (storware vprotect) cannot delete a
snapshot. Message reported by SPM is:
HSMGetAllTasksStatusesVDS failed: value=Volume does not exist:
('6180c2e8-141a-4126-bcb1-5af5caa98175',) abortedcode=201
Other snapshots from other VMs are working fine (created and deleted as
needed).
I verified it, I can reach on SPM filesystem other snapshots from the
same VM, but that one doesn't exist.
I suspect that during normal backup management that snapshot has been
correctly removed from file system, but still live in ovirt database.
I'm pretty sure vProtect use ovirt API to operate, it doesn't directly
interact with ovirt database. That's why I'm talking to you and not to
storware guys.
Is there a (safe) way to get rid of that snapshot?
TIA
Regards,
Giulio Casella
3 years, 7 months
Re: Ansible ovirt.ovirt_vm nics
by Matthew.Stier@fujitsu.com
I believe I have identified the issue, variable inheritance.
The inventory I include to define the hosts, includes several classes. ‘virtualmachines’ which the script uses to get the list of VMs it is going to process. All VM defaults are defined within its ‘:vars’ section.
I also have groups of VMs where I want to override variables. Unfortunately, what is being defined in ‘virtualmachines’ is not being override in the class level, because they are considered sibling classes.
I have found the ‘ansible_group_priority’ variable, which can be assigned at each class. I assigned the entry in [virtualmachines:vars] with the value of 2 (default is 1) and then in the other classes, I assigned a value of 3.
Now the variable in [class:vars] will override the variables in [virtualmachines:vars]. Of course the host level can still override all.
From: Matthew.Stier(a)fujitsu.com <Matthew.Stier(a)fujitsu.com>
Sent: Thursday, April 22, 2021 2:01 PM
To: Martin Perina <mperina(a)redhat.com>
Cc: users(a)ovirt.org; Martin Necas <mnecas(a)redhat.com>
Subject: [ovirt-users] Re: Ansible ovirt.ovirt_vm nics
It’s basically the one listed in the URL. The changes I’ve made is to add some addition ‘defaults’, and the code I mentioned below, to select different nic profiles for vms that are on different networks.
I’m running an Oracle Linux 7u9 system, and the version of Ansible is 2.9.18
In my environment, I have created different networks, and labeled them based upon their vlan tag number (vlan20, vlan21, vlan22, …)
A VM can be created on any of these vlans, so I need to be able to select the nic profile within ‘ini’ file holding all the configuration information. (hostname, fqdn, IP address, etc…)
From: Martin Perina <mperina(a)redhat.com<mailto:mperina@redhat.com>>
Sent: Thursday, April 22, 2021 2:51 AM
To: Stier, Matthew <Matthew.Stier(a)fujitsu.com<mailto:Matthew.Stier@fujitsu.com>>
Cc: users(a)ovirt.org<mailto:users@ovirt.org>; Martin Necas <mnecas(a)redhat.com<mailto:mnecas@redhat.com>>
Subject: Re: [ovirt-users] Ansible ovirt.ovirt_vm nics
Hi Matthew,
Could you please share with us your playbook? Which ansible version are you using? Are you using ovirt_vm module from oVirt Ansible Collection which contains newer versions than the ovirt_vm module included in Ansible 2.9?
https://docs.ansible.com/ansible/latest/collections/ovirt/ovirt/ovirt_vm_...
Thanks,
Martin
On Thu, Apr 22, 2021 at 9:28 AM Matthew.Stier(a)fujitsu.com<mailto:Matthew.Stier@fujitsu.com> <Matthew.Stier(a)fujitsu.com<mailto:Matthew.Stier@fujitsu.com>> wrote:
The ‘nics’ section of ovirt_vm is vague and with nearly no examples.
My playbook is based upon https://blogs.oracle.com/scoter/ansible-with-oracle-linux-virtualization-...
I’ve made several modifications, (more default) and added a few lines, which I believe is supposed to assign vnic profiles to the primary vnic (nic1), based upon the definition of ‘vm_nic1_profile’ defined in an included ‘ini’ file. (the profile defaulting to blank if it is not defined in the ini file.)
It isn’t doing its job.
nics:
- name: “nic1”
profile_name: "{{ hostvars[item]['vm_nic1_profile'] | default('') }}”
The playbook runs without complaint. If I run it with option ‘-vvv’ part of the output lists the variables, but the ‘nics’ variable is an empty list (nics[])
Any hints on what I’m doing wrong? I’ve checked the forum, but it tends to strip leading spaces, which is bad for indent sensitive code.
_______________________________________________
Users mailing list -- users(a)ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to users-leave(a)ovirt.org<mailto:users-leave@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MEZLXGSSRCR...
--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.
3 years, 8 months
HCI - oVirt for CEPH
by penguin pages
I have been building out HCI stack with KVM/RHEV + oVirt with the HCI deployment process. This is very nice for small / remote site use cases, but with Gluster being anounced as EOL in 18 months, what is the replacement plan?
Are their working projects and plans to replace Gluster with CEPH?
Are their deployment plans to get an HCI stack onto a supported file system?
I liked gluster for the control plan for the oVirt engine and smaller utility VMs as each system has a full copy, I can retrieve /extract a copy of the VM without having all bricks back... it was just "easy" to use. CEPH just means more complexity.. and though it scales better and has better features, it means that repair means having critical mass of nodes up before you can extra data (vs any disk can be pulled out of a gluster node, plugged into my laptop and I can at least extract the data).
I guess I am not trying to debate shifting to CEPH.. it does not matter.. that ship sailed... What I am asking is when / what are the plans for replacement of Gluster for HCI. Because right now, for small sites for HCI, when Gluster is no longer supported.. and CEPH does not make it... is to go VMWare and vSAN or some other total different stack.
3 years, 8 months
Re: Attempting to detach a storage domain
by Matthew.Stier@fujitsu.com
Problem solved.
I used Ansible and ran 'grep Duplicate /var/log/vdsm/vdsm.log' on all 121 of my hosts. Only the log of the host currently running the hosted-engine had entries.
I migrated the hosted-engine, and all other VMs to other hosts, and then put the affected host into maintenance. I then undeployed it; then patched it; and finally deployed it.
Its back up and running and taking VMs, and no more 'Duplicate keys' errors.
-----Original Message-----
From: Matthew.Stier(a)fujitsu.com <Matthew.Stier(a)fujitsu.com>
Sent: Monday, April 26, 2021 7:18 PM
To: Nir Soffer <nsoffer(a)redhat.com>
Cc: users(a)ovirt.org
Subject: [ovirt-users] Re: Attempting to detach a storage domain
There is no /var/log/vdsm directory on the hosted-engine.
This is from the current host, hosting the engine: aaa.bbb.ccc.100 is the hosted-engine
2021-04-26 19:12:42,760-0500 INFO (jsonrpc/2) [api.host] START getCapabilities() from=::ffff:aaa.bbb.ccc.100,37812 (api:48)
2021-04-26 19:12:42,763-0500 INFO (jsonrpc/2) [api.host] FINISH getCapabilities error=internal error: Duplicate key from=::ffff:168.127.151.100,37812 (api:52)
2021-04-26 19:12:42,763-0500 ERROR (jsonrpc/2) [jsonrpc.JsonRpcServer] Internal server error (__init__:350) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request
res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 198, in _dynamicMethod
result = fn(*methodArgs)
File "<string>", line 2, in getCapabilities
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1371, in getCapabilities
c = caps.get()
File "/usr/lib/python2.7/site-packages/vdsm/host/caps.py", line 95, in get
machinetype.compatible_cpu_models())
File "/usr/lib/python2.7/site-packages/vdsm/common/cache.py", line 43, in __call__
value = self.func(*args)
File "/usr/lib/python2.7/site-packages/vdsm/machinetype.py", line 142, in compatible_cpu_models
all_models = domain_cpu_models(c, arch, cpu_mode)
File "/usr/lib/python2.7/site-packages/vdsm/machinetype.py", line 97, in domain_cpu_models
domcaps = conn.getDomainCapabilities(None, arch, None, virt_type, 0)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3844, in getDomainCapabilities
if ret is None: raise libvirtError ('virConnectGetDomainCapabilities() failed', conn=self)
libvirtError: internal error: Duplicate key
2021-04-26 19:12:42,764-0500 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getCapabilities failed (error -32603) in 0.00 seconds (__init__:312)
-----Original Message-----
From: Nir Soffer <nsoffer(a)redhat.com>
Sent: Monday, April 26, 2021 5:12 PM
To: Stier, Matthew <Matthew.Stier(a)fujitsu.com>
Cc: Benny Zlotnik <bzlotnik(a)redhat.com>; users(a)ovirt.org
Subject: Re: [ovirt-users] Re: Attempting to detach a storage domain
On Tue, Apr 27, 2021 at 12:48 AM Matthew.Stier(a)fujitsu.com <Matthew.Stier(a)fujitsu.com> wrote:
>
> I'm getting tons of these in the
> hosted-engine:/var/log/ovirt-engine/engine.log
>
> 2021-04-26 16:40:40,260-05 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedThreadFactory-engineScheduled-Thread-8) [] EVENT_ID:
> VDS_BROKER_COMMAND_FAILURE(10,802), VDSM xxxx.yyyy.zzz command Get
> Host Capabilities failed: Internal JSON-RPC error: {'reason':
> 'internal error: Duplicate key'}
This looks like bad response from vdsm. Can you share the vdsm log from the host xxxx.yyyy.zzz?
/var/log/vdsm/vdsm.log
Nir
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z2GOGEZ2DZX...
3 years, 8 months
Re: Attempting to detach a storage domain
by Nir Soffer
On Tue, Apr 27, 2021 at 12:48 AM Matthew.Stier(a)fujitsu.com
<Matthew.Stier(a)fujitsu.com> wrote:
>
> I'm getting tons of these in the hosted-engine:/var/log/ovirt-engine/engine.log
>
> 2021-04-26 16:40:40,260-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-8) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM xxxx.yyyy.zzz command Get Host Capabilities failed: Internal JSON-RPC error: {'reason': 'internal error: Duplicate key'}
This looks like bad response from vdsm. Can you share the vdsm log from
the host xxxx.yyyy.zzz?
/var/log/vdsm/vdsm.log
Nir
3 years, 8 months
Re: Attempting to detach a storage domain
by Benny Zlotnik
I think engine.log would be a good place to start, it would likely
tell us on which host there's a problem (if there is one)
On Sun, Apr 25, 2021 at 10:22 AM Matthew.Stier(a)fujitsu.com
<Matthew.Stier(a)fujitsu.com> wrote:
>
> Which logs, on which hosts should I be looking through?
>
> I have a hint of that from my research, but all 121 hosts are up and running.
>
> -----Original Message-----
> From: Benny Zlotnik <bzlotnik(a)redhat.com>
> Sent: Sunday, April 25, 2021 1:53 AM
> To: Stier, Matthew <Matthew.Stier(a)fujitsu.com>
> Cc: users(a)ovirt.org
> Subject: Re: [ovirt-users] Attempting to detach a storage domain
>
> What do the logs say?
> This usually means that not all hosts were able to disconnect from it
>
> On Sun, Apr 25, 2021 at 9:45 AM Matthew.Stier(a)fujitsu.com <Matthew.Stier(a)fujitsu.com> wrote:
> >
> > Ovirt: 4.3.10
> >
> > Storage: iSCSI
> >
> >
> >
> > Problem: Attempting to place storage domain into ‘maintenance’ in preparation for detachment and destruction, has left it hung in a ‘Preparation for maintenance’ state.
> >
> >
> >
> > I have three storage domain I need to put into maintenance, detach and delete. When I placed place the first, and smallest (100GB) into maintenance mode, it switched to ‘Preparing for Maintenance’, and has stuck there for hours.
> >
> >
> >
> > Early on, I was able to re-activate it, but I do want to remove for use somewhere else, and I want make sure I can remove it, before I try to do the same with two 11TB storage domains.
> >
> > _______________________________________________
> > Users mailing list -- users(a)ovirt.org
> > To unsubscribe send an email to users-leave(a)ovirt.org Privacy
> > Statement: https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct:
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/HY3LWX7H
> > MN56SDHNQWB3MDXLLU7GNLGX/
>
3 years, 8 months
How to reduce load during create VM
by ovirt.org@nevim.eu
Hi everyone, please, can you limit the "Create VM" process somewhere so that setting up quota does not load the server so much, especially storage when duplicating the template disk?
3 years, 8 months
oVirt on GCP
by mario.gandara@gmail.com
Hello, guys!
I'm trying to deploy oVirt on GCP for the last two weeks.
Someone did it on GCP using nested virtualization?
My scenario is:
One standalone oVirt Engine and two hosts.
In the future I imagine to scale this solution.
Thank you.
Mário Gândara
3 years, 8 months
Problem with ISCSI multipathing identifying faild path to a LUN every few seconds.
by Gal Villaret
Hi All,
In the past few days, we have been having problems with hosts becoming unmanageable due to multipathd identifying false failed paths and VDSM crashing because of it.
We were running version 4.4.1 and upgrading to 4.4.5(engine) and 4.4.6(nodes) seems to have resolved the VDSM.
However, currently, we see that the multipathing events continue.
From what we have observed, the events start in correlation to the host reporting on low swap space. The low swap space seems to be related to Commvault backup operation running. By running top on the host while a backup operation is running I can see swap being consumed to 100% although there is plenty of RAM available.
After the multipath events start happening the only means of stopping it was to reboot the host.
This is the warning in oVirt UI:
Apr 24, 2021, 9:14:07 PM - Available swap memory of host Ovirt-Node2 [953 MB] is under defined threshold [1024 MB].
This is the first appearance of the multipath event in /var/log/messages:
Apr 24 21:14:58 ovirt-node2 kernel: sd 21:0:0:9: [sdgi] tag#77 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=15s
Apr 24 21:14:58 ovirt-node2 kernel: sd 21:0:0:9: [sdgi] tag#77 Sense Key : Aborted Command [current]
Apr 24 21:14:58 ovirt-node2 kernel: sd 21:0:0:9: [sdgi] tag#77 <<vendor>>ASC=0xc1 ASCQ=0x1
Apr 24 21:14:58 ovirt-node2 kernel: sd 21:0:0:9: [sdgi] tag#77 CDB: Read(16) 88 00 00 00 00 00 79 ac 75 a0 00 00 06 00 00 00
Apr 24 21:14:58 ovirt-node2 kernel: blk_update_request: I/O error, dev sdgi, sector 2041345440 op 0x0:(READ) flags 0x4200 phys_seg 192 prio class 0
Apr 24 21:14:58 ovirt-node2 kernel: device-mapper: multipath: 253:20: Failing path 131:224.
Apr 24 21:14:58 ovirt-node2 multipathd[3044]: sdgi: mark as failed
Apr 24 21:14:58 ovirt-node2 multipathd[3044]: 3600000e00d2c0000002cb4a8000b0000: remaining active paths: 11
Apr 24 21:15:03 ovirt-node2 multipathd[3044]: 3600000e00d2c0000002cb4a8000b0000: sdgi - tur checker reports path is up
Apr 24 21:15:03 ovirt-node2 multipathd[3044]: 131:224: reinstated
Apr 24 21:15:03 ovirt-node2 multipathd[3044]: 3600000e00d2c0000002cb4a8000b0000: remaining active paths: 12
Apr 24 21:15:03 ovirt-node2 kernel: device-mapper: multipath: 253:20: Reinstating path 131:224.
Apr 24 21:15:03 ovirt-node2 kernel: sd 21:0:0:9: alua: port group 8091 state A preferred supports toluSNA
Apr 24 21:15:03 ovirt-node2 kernel: sd 21:0:0:9: alua: port group 8091 state A preferred supports toluSNA
Apr 24 21:15:13 ovirt-node2 kernel: sd 13:0:0:9: [sdau] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=15s
Apr 24 21:15:13 ovirt-node2 kernel: sd 13:0:0:9: [sdau] tag#25 Sense Key : Aborted Command [current]
Apr 24 21:15:13 ovirt-node2 kernel: sd 13:0:0:9: [sdau] tag#25 <<vendor>>ASC=0xc1 ASCQ=0x1
Apr 24 21:15:13 ovirt-node2 kernel: sd 13:0:0:9: [sdau] tag#25 CDB: Read(16) 88 00 00 00 00 00 79 ac 75 a0 00 00 06 00 00 00
Apr 24 21:15:13 ovirt-node2 kernel: blk_update_request: I/O error, dev sdau, sector 2041345440 op 0x0:(READ) flags 0x4200 phys_seg 192 prio class 0
Apr 24 21:15:13 ovirt-node2 kernel: device-mapper: multipath: 253:20: Failing path 66:224.
Apr 24 21:15:13 ovirt-node2 multipathd[3044]: sdau: mark as failed
Underlying storage is Fujitsu DX200 S5 with all SSD drives.
Each host has two 10Gbit network adapters dedicated to ISCSI.
Any help with this would be highly appreciated.
Thanks,
Gal Villaret
3 years, 8 months