August 2019 - Users - oVirt List Archives

large size OVA import fails
by jason.l.cox＠l3harris.com 22 Aug '19

22 Aug '19

I have a fairly large OVA (~200GB) that was exported from oVirt4.3.5. I'm trying to import it into a new cluster, also oVirt4.3.5. The import starts fine but fails again and again. Everything I can find online appears to be outdated, mentioning incorrect log file locations and saying virt-v2v does the import. On the engine in /var/log/ovirt-engine/engine.log I can see where it is doing the CreateImageVDSCommand, then a few outputs concerning adding the disk, which end with USER_ADD_DISK_TO_VM_FINISHED_SUCCESS, then the ansible command: 2019-08-20 15:40:38,653-04 Executing Ansible command: /usr/bin/ansible-playbook --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory8416464991088315694 --extra-vars=ovirt_import_ova_path="/mnt/vm_backups/myvm.ova" --extra-vars=ovirt_import_ova_disks="['/rhev/data-center/mnt/glusterSD/myhost.mydomain.com:_vmstore/59502c8b-fd1e-482b-bff7-39c699c196b3/images/886a3313-19a9-435d-aeac-64c2d507bb54/465ce2ba-8883-4378-bae7-e231047ea09d']" --extra-vars=ovirt_import_ova_image_mappings="{}" /usr/share/ovirt-engine/playbooks/ovirt-ova-import.yml [Logfile: /var/log/ovirt-engine/ova/ovirt-import-ova-ansible-20190820154038-myhost.mydomain.com-25f6ac6f-9bdc-4301-b896-d357712dbf01.log] Then nothing about the import until: 2019-08-20 16:11:08,859-04 INFO [org.ovirt.engine.core.bll.exportimport.ImportVmFromOvaCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [3321d4f6] Lock freed to object 'EngineLock:{exclusiveLocks='[myvm=VM_NAME, 464a25ba-8f0a-421d-a6ab-13eff67b4c96=VM]', sharedLocks=''}' 2019-08-20 16:11:08,894-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [3321d4f6] EVENT_ID: IMPORTEXPORT_IMPORT_VM_FAILED(1,153), Failed to import Vm myvm to Data Center Default, Cluster Default I've found the import logs on the engine, in /var/log/ovirt-engine/ova, but the ovirt-import-ova-ansible*.logs for the imports of concern only contain: 2019-08-20 19:59:48,799 p=44701 u=ovirt | Using /usr/share/ovirt-engine/playbooks/ansible.cfg as config file 2019-08-20 19:59:49,271 p=44701 u=ovirt | PLAY [all] ********************************************************************* 2019-08-20 19:59:49,280 p=44701 u=ovirt | TASK [ovirt-ova-extract : Run extraction script] ******************************* Watching the host selected for the import, I can see the qemu-img convert process running, but then the engine frees the lock on the VM and reports the import as having failed. However, the qemu-img process continues to run on the host. I don't know where else to look to try and find out what's going on and I cannot see anything that says why the import failed. Since the qemu-img process on the host is still running after the engine log shows the lock has been freed and import failed, I'm guessing what's happening is on the engine side. Looking at the time between the start of the ansible command and when the lock is freed it is consistently around 30 minutes. # first try 2019-08-20 15:40:38,653-04 ansible command start 2019-08-20 16:11:08,859-04 lock freed 31 minutes # second try 2019-08-20 19:59:48,463-04 ansible command start 2019-08-20 20:30:21,697-04 lock freed 30 minutes, 33 seconds # third try 2019-08-21 09:16:42,706-04 ansible command start 2019-08-21 09:46:47,103-04 lock freed 30 minutes, 45 seconds With that in mind, I took a look at the available configuration keys from engine-config --list. After getting each the only one set to ~30 minutes and looks like it could be the problem was SSHInactivityHardTimeoutSeconds (set to 1800 seconds). I set it to 3600 and tried the import again, but it still failed at ~30 minutes, so that's apparently not the correct key. Also, just FYI, I also tried to import the ova using virt-v2v, but that fails immediately: virt-v2v: error: expecting XML expression to return an integer (expression: rasd:Parent/text(), matching string: 00000000-0000-0000-0000-000000000000) If reporting bugs, run virt-v2v with debugging enabled and include the complete output: virt-v2v -v -x [...] Does virt-v2v not support OVAs created by the oVirt 'export to ova' option? So my main question is, Is there a timeout for VM imports through the engine web UI? And if so, is if configurable? Thanks in advance.

2 1

ovn networking
by Staniforth, Paul 22 Aug '19

22 Aug '19

Hello I the latest release of the engine 4.3.5.5-1.el7 ovn-nbctl show ovn-sbctl show both seem to work but produce the errors net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5) PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4) should libibverbs be a dependency? Paul S. To view the terms under which this email is distributed, please go to:- http://leedsbeckett.ac.uk/disclaimer/email/

3 5

VM won't migrate back to original node
by Alexander Schichow 22 Aug '19

22 Aug '19

I set up a high available VM which is using a shareable LUN. When migrating said VM from the original Node everything is fine. When I try to migrate it back, I get an error. If I suspend or shutdown the VM, it returns to the original host. Then it repeats. I'm really out of ideas so maybe any of you happen to have any? I would really appreciate that. Thanks

2 1

Custum driver in oVirt node image?
by thomas＠hoberg.net 22 Aug '19

22 Aug '19

Using 3 Mini-ITX Atoms J5005 boards to build a three node HC test environment (actually there is a fourth to simulate a remote DC) These boards have a RealTek 8169 Gbit controller, which for hardware/firmware or driver reasons comes up just fine on cold boots, but struggles on warm-boots or whenever the interface is re-configured, as is the case when the OVS overlay is configured, ... * unless link-speed autonegotiation is enabled after which it works just fine. So for a normal system, I can just put a proper ETHTOOL_OPTS="autoneg on" into /etc/sysconfig/network-scrkipts/ifcfg-<device-name> to avoid trouble on warm boots, but once the overlay network is put on top, the network manager no longer controls the device and the standard "ignore" setting of driver/hardware evidently causes the link to fail: dmesg reports the interface toggling and ovs isn't happy either. The Ethernet switch is unmanaged, so there is nothing I can do there to speed up or eliminate negotiations, and I currently see three options: 1. Find/be told where to renable Ethernet link speed autonegotiation with the OVS/VDSM/?? so the RTL8169 is happy again 2. Use a known USB3 GBit NIC (no space in the chassis to put e.g. an Intel NIC) 3. Make an USB3 2.5Gbit NIC based on RealTek's 8156 work, but currently that requires compiling from source and loading it at boot Option 3 works just fine with a CentOS base, but I got tons of problem making oVirt hosted engine work with that. Option 3 for the oVirt node image seems to class with the read-only nature of the oVirt node imge: While I can load a pre-compile driver interactively, I can't make it stick with dracut -f. I haven't immediately found a way to patch/fix/build derivative oVirt node images with that driver included and am somewhat to impatient to wait until it's mainstream, so any help would be appreciated. I've verified option 2 with a single USB3 GBit adapter natively supported by the kernel that I already had available, but I then opted to jump the gun and buy the slightly more expensive 2.5GBit variant as a better match to a Gluster environment: A mistake in this context as it turned out.

2 2

[ANN] oVirt 4.3.6 Third Release Candidate is now available for testing
by Sandro Bonazzola 22 Aug '19

22 Aug '19

The oVirt Project is pleased to announce the availability of the oVirt 4.3.6 Third Release Candidate for testing, as of August 22nd, 2019. This update is a release candidate of the sixth in a series of stabilization updates to the 4.3 series. This is pre-release software. This pre-release should not to be used in production. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 7.7 or later (but <8) * CentOS Linux (or similar) 7.7 or later (but <8) * oVirt Node 4.3 (available for x86_64 only) will be made available when CentOS 7.7 will be released. See the release notes [1] for known issues, new features and bugs fixed. Notes: - oVirt Appliance is already available - oVirt Node is not yet available, pending CentOS 7.7 release to be available Additional Resources: * Read more about the oVirt 4.3.6 release highlights: http://www.ovirt.org/release/4.3.6/ * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.3.6/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo(a)redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

1 0

Re: Where to configure iscsi initiator name?
by Gianluca Cecchi 22 Aug '19

22 Aug '19

On Wed, Aug 21, 2019 at 7:07 PM Dan Poltawski <dan.poltawski(a)tnp.net.uk> wrote: > When I added the first node a 'random' initiator name was generated of > form: > > # cat /etc/iscsi/initiatorname.iscsi > InitiatorName=iqn.1994-05.com.redhat:[RANDOM] > > Having attempted to add another node, this node has another initiator name > generated and can't access the storage. Is there a way to configure this > initiator name to a static value which will be configured when new nodes > get added to the cluster? Or is there some reason for this i'm missing? > > thanks, > > Dan > oVirt nodes are iSCSI clients and each node needs to have a different InitiatorName value. Eventually you can modify initiatorname of a node before running discovery against oVirt and rebooting it (or restarting iscsi services). You can edit the file or use the iscsi-iname command. It must be named with an iqn name in the format of iqn.YYYY-MM.reverse.domain.name:OptionalIdentifier Typically you configure your iSCSI storage array giving access to the lun to be used as a storage domain to all oVirt hosts initiators. Or you can use chap authentication or both Se also here: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/ht… https://www.certdepot.net/rhel7-configure-iscsi-target-initiator-persistent… HIH, Gianluca

1 0

Data Center Contending/Non responsive - No SPM
by Matthew B 21 Aug '19

21 Aug '19

Hello, I'm having a problem with oVirt - we're running 4.3 on CentOS 7: [root@ovirt ~]# rpm -q ovirt-engine; uname -r ovirt-engine-4.3.4.3-1.el7.noarch 3.10.0-957.21.2.el7.x86_64 Currently the Data Center alternates between Non responsive and contending status, and SPM selection fails. The error in the events tab is: VDSM compute5.domain command HSMGetTaskStatusVDS failed: Volume does not exist: (u'2bffb8d0-dfb5-4b08-9c6b-716e11f280c2',) Full error: 2019-08-21 13:53:53,507-0700 ERROR (tasks/8) [storage.TaskManager.Task] (Task='3076bb8c-7462-4573-a832-337da478ae0e') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 333, in startSpm self._upgradePool(expectedDomVersion, __securityOverride=True) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 484, in _upgradePool str(targetDomVersion)) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1108, in _convertDomain targetFormat) File "/usr/lib/python2.7/site-packages/vdsm/storage/formatconverter.py", line 447, in convert converter(repoPath, hostId, imageRepo, isMsd) File "/usr/lib/python2.7/site-packages/vdsm/storage/formatconverter.py", line 405, in v5DomainConverter domain.convert_volumes_metadata(target_version) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 813, in convert_volumes_metadata for vol in self.iter_volumes(): File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 764, in iter_volumes yield self.produceVolume(img_id, vol_id) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 846, in produceVolume volUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterVolume.py", line 45, in __init__ volUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 817, in __init__ self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 71, in __init__ volUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86, in __init__ self.validate() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 112, in validate self.validateVolumePath() File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 131, in validateVolumePath raise se.VolumeDoesNotExist(self.volUUID) VolumeDoesNotExist: Volume does not exist: (u'2bffb8d0-dfb5-4b08-9c6b-716e11f280c2',) Most of our VMs are still running, but we can't start or restart any VMs. All of the domains show as down since SPM selection fails. Any thoughts? Thanks, -Matthew

1 0

Re: Where to configure iscsi initiator name?
by Vincent Royer 21 Aug '19

21 Aug '19

You have to add all the initiators for each node to the iscsi LUN. At least thats how I got it working. Then I think it's recommended to setup multipath. However take this with a grain because iscsi did not work well for me. On Wed, Aug 21, 2019, 10:08 AM Dan Poltawski <dan.poltawski(a)tnp.net.uk> wrote: > When I added the first node a 'random' initiator name was generated of > form: > > # cat /etc/iscsi/initiatorname.iscsi > InitiatorName=iqn.1994-05.com.redhat:[RANDOM] > > Having attempted to add another node, this node has another initiator name > generated and can't access the storage. Is there a way to configure this > initiator name to a static value which will be configured when new nodes > get added to the cluster? Or is there some reason for this i'm missing? > > thanks, > > Dan > ------------------------------ > > The Networking People (TNP) Limited. Registered office: Network House, > Caton Rd, Lancaster, LA1 3PE. Registered in England & Wales with company > number: 07667393 > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you have received this email in error please notify the system manager. > This message contains confidential information and is intended only for the > individual named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and > delete this e-mail from your system. If you are not the intended recipient > you are notified that disclosing, copying, distributing or taking any > action in reliance on the contents of this information is strictly > prohibited. > _______________________________________________ > Users mailing list -- users(a)ovirt.org > To unsubscribe send an email to users-leave(a)ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/UKNN77NZJ4ZTB… >

1 0

hosted engine setup, iSCSI no LUNs shown
by billyburly＠gmail.com 21 Aug '19

21 Aug '19

I'm trying to setup the hosted engine on top of iSCSI storage. It successfully logs in and gets the target, however the process errors out claiming there are no LUNs. But if you look on the host, the disks were added to the system. [ INFO ] TASK [ovirt.hosted_engine_setup : iSCSI discover with REST API] [ INFO ] ok: [localhost] The following targets have been found: [1] iqn.2001-04.com.billdurr.durrnet.vm-int:vmdata TPGT: 1, portals: 192.168.47.10:3260 Please select a target (1) [1]: 1 [ INFO ] Getting iSCSI LUNs list ... [ INFO ] TASK [ovirt.hosted_engine_setup : Get iSCSI LUNs] [ INFO ] ok: [localhost] [ ERROR ] Cannot find any LUN on the selected target [ ERROR ] Unable to get target list Here's what the config in targetcli looks like [root@vm1 ~]# targetcli ls o- / ..................................................................... [...] o- backstores .......................................................... [...] | o- block .............................................. [Storage Objects: 2] | | o- p_iscsi_lun1 .............. [/dev/drbd0 (62.0GiB) write-thru activated] | | | o- alua ............................................... [ALUA Groups: 1] | | | o- default_tg_pt_gp ................... [ALUA state: Active/optimized] | | o- p_iscsi_lun2 ............. [/dev/drbd1 (310.6GiB) write-thru activated] | | o- alua ............................................... [ALUA Groups: 1] | | o- default_tg_pt_gp ................... [ALUA state: Active/optimized] | o- fileio ............................................. [Storage Objects: 0] | o- pscsi .............................................. [Storage Objects: 0] | o- ramdisk ............................................ [Storage Objects: 0] o- iscsi ........................................................ [Targets: 1] | o- iqn.2001-04.com.billdurr.durrnet.vm-int:vmdata ................ [TPGs: 1] | o- tpg1 .............................................. [gen-acls, no-auth] | o- acls ...................................................... [ACLs: 0] | o- luns ...................................................... [LUNs: 2] | | o- lun0 ......... [block/p_iscsi_lun1 (/dev/drbd0) (default_tg_pt_gp)] | | o- lun1 ......... [block/p_iscsi_lun2 (/dev/drbd1) (default_tg_pt_gp)] | o- portals ................................................ [Portals: 1] | o- 192.168.47.10:3260 ........................................... [OK] o- loopback ..................................................... [Targets: 0] o- srpt ......................................................... [Targets: 0] The two LUNs show up on the host after the hosted engine setup tries to enumerate the LUNs for the target [root@vm1 ~]# lsscsi [0:0:0:0] storage HP P420i 8.32 - [0:1:0:0] disk HP LOGICAL VOLUME 8.32 /dev/sda [0:1:0:1] disk HP LOGICAL VOLUME 8.32 /dev/sdb [0:1:0:2] disk HP LOGICAL VOLUME 8.32 /dev/sdc [11:0:0:0] disk LIO-ORG p_iscsi_lun1 4.0 /dev/sdd [11:0:0:1] disk LIO-ORG p_iscsi_lun2 4.0 /dev/sde

3 2

Reaction and timeouts to APD (All Paths Down)
by Gianluca Cecchi 21 Aug '19

21 Aug '19

Hello, similar questions/arguments took already place in the past, but I think it could be a good point to dig more if possible. I focus on block based storage and my environments have iSCSI based with multipath, connected to Equallogic. Currently I have set no_path_retry to 4, so I have a 20 seconds timeout (polling_interval=5). Sometimes some network planned activity (not under my domain) that should have no impact "doesn't go so well" and I could have longer delays (also 60-70 seconds) and so oVirt reactions where I see soft fencing, VMs going into paused state or "question mark" state... Comparing to vSphere, the same events don't apparently cause anything (and I see same lost paths events in datastore and ESXi host monitoring--> Events pane). This basically depends on APD default timeout that seems to be 140 seconds. This difference of behavior has the effect of showing an apparent better SLA of vSphere during these short time outages and a sort of show stopper for extending oVirt implementation.... Interesting vSphere KBs here: Storage device has entered the All Paths Down state (2032934) https://kb.vmware.com/s/article/2032934 Containing: "Note: By default, the APD timeout is set to 140 seconds." All Paths Down timeout for a storage device has expired (2032940) https://kb.vmware.com/s/article/2032940 Path redundancy to the storage device is degraded (1009555) https://kb.vmware.com/s/article/1009555 Storage device has recovered from the APD state (2032945) https://kb.vmware.com/s/article/2032945 So the question is what kind of real risks I have if I simulate that behavior and set a no_path_retry value so that polling_interval x no_path_retry = 140 (in default config of polling_interval=5 it would mean no_path_retry = 28) BTW: I also have an environment based on RHV 4.3.5 and iSCSI and in parallel I opened a case (02452597) for asking clarifications/chances remaining in supported configuration, so for logs and so on it could help going into it for Red Hat developers Thanks, Gianluca

1 0