large size OVA import fails
by jason.l.cox@l3harris.com
I have a fairly large OVA (~200GB) that was exported from oVirt4.3.5. I'm trying to import it into a new cluster, also oVirt4.3.5. The import starts fine but fails again and again.
Everything I can find online appears to be outdated, mentioning incorrect log file locations and saying virt-v2v does the import.
On the engine in /var/log/ovirt-engine/engine.log I can see where it is doing the CreateImageVDSCommand, then a few outputs concerning adding the disk, which end with USER_ADD_DISK_TO_VM_FINISHED_SUCCESS, then the ansible command:
2019-08-20 15:40:38,653-04
Executing Ansible command: /usr/bin/ansible-playbook --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory8416464991088315694 --extra-vars=ovirt_import_ova_path="/mnt/vm_backups/myvm.ova" --extra-vars=ovirt_import_ova_disks="['/rhev/data-center/mnt/glusterSD/myhost.mydomain.com:_vmstore/59502c8b-fd1e-482b-bff7-39c699c196b3/images/886a3313-19a9-435d-aeac-64c2d507bb54/465ce2ba-8883-4378-bae7-e231047ea09d']" --extra-vars=ovirt_import_ova_image_mappings="{}" /usr/share/ovirt-engine/playbooks/ovirt-ova-import.yml [Logfile: /var/log/ovirt-engine/ova/ovirt-import-ova-ansible-20190820154038-myhost.mydomain.com-25f6ac6f-9bdc-4301-b896-d357712dbf01.log]
Then nothing about the import until:
2019-08-20 16:11:08,859-04 INFO [org.ovirt.engine.core.bll.exportimport.ImportVmFromOvaCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [3321d4f6] Lock freed to object 'EngineLock:{exclusiveLocks='[myvm=VM_NAME, 464a25ba-8f0a-421d-a6ab-13eff67b4c96=VM]', sharedLocks=''}'
2019-08-20 16:11:08,894-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [3321d4f6] EVENT_ID: IMPORTEXPORT_IMPORT_VM_FAILED(1,153), Failed to import Vm myvm to Data Center Default, Cluster Default
I've found the import logs on the engine, in /var/log/ovirt-engine/ova, but the ovirt-import-ova-ansible*.logs for the imports of concern only contain:
2019-08-20 19:59:48,799 p=44701 u=ovirt | Using /usr/share/ovirt-engine/playbooks/ansible.cfg as config file
2019-08-20 19:59:49,271 p=44701 u=ovirt | PLAY [all] *********************************************************************
2019-08-20 19:59:49,280 p=44701 u=ovirt | TASK [ovirt-ova-extract : Run extraction script] *******************************
Watching the host selected for the import, I can see the qemu-img convert process running, but then the engine frees the lock on the VM and reports the import as having failed. However, the qemu-img process continues to run on the host. I don't know where else to look to try and find out what's going on and I cannot see anything that says why the import failed.
Since the qemu-img process on the host is still running after the engine log shows the lock has been freed and import failed, I'm guessing what's happening is on the engine side.
Looking at the time between the start of the ansible command and when the lock is freed it is consistently around 30 minutes.
# first try
2019-08-20 15:40:38,653-04 ansible command start
2019-08-20 16:11:08,859-04 lock freed
31 minutes
# second try
2019-08-20 19:59:48,463-04 ansible command start
2019-08-20 20:30:21,697-04 lock freed
30 minutes, 33 seconds
# third try
2019-08-21 09:16:42,706-04 ansible command start
2019-08-21 09:46:47,103-04 lock freed
30 minutes, 45 seconds
With that in mind, I took a look at the available configuration keys from engine-config --list. After getting each the only one set to ~30 minutes and looks like it could be the problem was SSHInactivityHardTimeoutSeconds (set to 1800 seconds). I set it to 3600 and tried the import again, but it still failed at ~30 minutes, so that's apparently not the correct key.
Also, just FYI, I also tried to import the ova using virt-v2v, but that fails immediately:
virt-v2v: error: expecting XML expression to return an integer (expression:
rasd:Parent/text(), matching string: 00000000-0000-0000-0000-000000000000)
If reporting bugs, run virt-v2v with debugging enabled and include the
complete output:
virt-v2v -v -x [...]
Does virt-v2v not support OVAs created by the oVirt 'export to ova' option?
So my main question is, Is there a timeout for VM imports through the engine web UI?
And if so, is if configurable?
Thanks in advance.
5 years, 3 months
ovn networking
by Staniforth, Paul
Hello
I the latest release of the engine 4.3.5.5-1.el7
ovn-nbctl show
ovn-sbctl show
both seem to work but produce the errors
net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)
PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)
should libibverbs be a dependency?
Paul S.
To view the terms under which this email is distributed, please go to:-
http://leedsbeckett.ac.uk/disclaimer/email/
5 years, 3 months
VM won't migrate back to original node
by Alexander Schichow
I set up a high available VM which is using a shareable LUN. When migrating said VM from the original Node everything is fine. When I try to migrate it back, I get an error.
If I suspend or shutdown the VM, it returns to the original host. Then it repeats. I'm really out of ideas so maybe any of you happen to have any? I would really appreciate that.
Thanks
5 years, 3 months
Custum driver in oVirt node image?
by thomas@hoberg.net
Using 3 Mini-ITX Atoms J5005 boards to build a three node HC test environment (actually there is a fourth to simulate a remote DC)
These boards have a RealTek 8169 Gbit controller, which for hardware/firmware or driver reasons comes up just fine on cold boots, but struggles on warm-boots or whenever the interface is re-configured, as is the case when the OVS overlay is configured, ...
* unless link-speed autonegotiation is enabled
after which it works just fine.
So for a normal system, I can just put a proper ETHTOOL_OPTS="autoneg on" into /etc/sysconfig/network-scrkipts/ifcfg-<device-name> to avoid trouble on warm boots, but once the overlay network is put on top, the network manager no longer controls the device and the standard "ignore" setting of driver/hardware evidently causes the link to fail: dmesg reports the interface toggling and ovs isn't happy either.
The Ethernet switch is unmanaged, so there is nothing I can do there to speed up or eliminate negotiations, and I currently see three options:
1. Find/be told where to renable Ethernet link speed autonegotiation with the OVS/VDSM/?? so the RTL8169 is happy again
2. Use a known USB3 GBit NIC (no space in the chassis to put e.g. an Intel NIC)
3. Make an USB3 2.5Gbit NIC based on RealTek's 8156 work, but currently that requires compiling from source and loading it at boot
Option 3 works just fine with a CentOS base, but I got tons of problem making oVirt hosted engine work with that.
Option 3 for the oVirt node image seems to class with the read-only nature of the oVirt node imge: While I can load a pre-compile driver interactively, I can't make it stick with dracut -f.
I haven't immediately found a way to patch/fix/build derivative oVirt node images with that driver included and am somewhat to impatient to wait until it's mainstream, so any help would be appreciated.
I've verified option 2 with a single USB3 GBit adapter natively supported by the kernel that I already had available, but I then opted to jump the gun and buy the slightly more expensive 2.5GBit variant as a better match to a Gluster environment: A mistake in this context as it turned out.
5 years, 3 months
[ANN] oVirt 4.3.6 Third Release Candidate is now available for testing
by Sandro Bonazzola
The oVirt Project is pleased to announce the availability of the oVirt
4.3.6 Third Release Candidate for testing, as of August 22nd, 2019.
This update is a release candidate of the sixth in a series of
stabilization updates to the 4.3 series.
This is pre-release software. This pre-release should not to be used in
production.
This release is available now on x86_64 architecture for:
* Red Hat Enterprise Linux 7.7 or later (but <8)
* CentOS Linux (or similar) 7.7 or later (but <8)
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
for:
* Red Hat Enterprise Linux 7.7 or later (but <8)
* CentOS Linux (or similar) 7.7 or later (but <8)
* oVirt Node 4.3 (available for x86_64 only) will be made available when
CentOS 7.7 will be released.
See the release notes [1] for known issues, new features and bugs fixed.
Notes:
- oVirt Appliance is already available
- oVirt Node is not yet available, pending CentOS 7.7 release to be
available
Additional Resources:
* Read more about the oVirt 4.3.6 release highlights:
http://www.ovirt.org/release/4.3.6/
* Get more oVirt Project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.3.6/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>*Red Hat respects your work life balance.
Therefore there is no need to answer this email out of your office hours.*
5 years, 3 months
Re: Where to configure iscsi initiator name?
by Gianluca Cecchi
On Wed, Aug 21, 2019 at 7:07 PM Dan Poltawski <dan.poltawski(a)tnp.net.uk>
wrote:
> When I added the first node a 'random' initiator name was generated of
> form:
>
> # cat /etc/iscsi/initiatorname.iscsi
> InitiatorName=iqn.1994-05.com.redhat:[RANDOM]
>
> Having attempted to add another node, this node has another initiator name
> generated and can't access the storage. Is there a way to configure this
> initiator name to a static value which will be configured when new nodes
> get added to the cluster? Or is there some reason for this i'm missing?
>
> thanks,
>
> Dan
>
oVirt nodes are iSCSI clients and each node needs to have a different
InitiatorName value.
Eventually you can modify initiatorname of a node before running discovery
against oVirt and rebooting it (or restarting iscsi services).
You can edit the file or use the iscsi-iname command.
It must be named with an iqn name in the format of
iqn.YYYY-MM.reverse.domain.name:OptionalIdentifier
Typically you configure your iSCSI storage array giving access to the lun
to be used as a storage domain to all oVirt hosts initiators. Or you can
use chap authentication or both
Se also here:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/...
https://www.certdepot.net/rhel7-configure-iscsi-target-initiator-persiste...
HIH,
Gianluca
5 years, 3 months
Data Center Contending/Non responsive - No SPM
by Matthew B
Hello,
I'm having a problem with oVirt - we're running 4.3 on CentOS 7:
[root@ovirt ~]# rpm -q ovirt-engine; uname -r
ovirt-engine-4.3.4.3-1.el7.noarch
3.10.0-957.21.2.el7.x86_64
Currently the Data Center alternates between Non responsive and contending
status, and SPM selection fails.
The error in the events tab is:
VDSM compute5.domain command HSMGetTaskStatusVDS failed: Volume does not
exist: (u'2bffb8d0-dfb5-4b08-9c6b-716e11f280c2',)
Full error:
2019-08-21 13:53:53,507-0700 ERROR (tasks/8) [storage.TaskManager.Task]
(Task='3076bb8c-7462-4573-a832-337da478ae0e') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
in _run
return fn(*args, **kargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336,
in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 333, in
startSpm
self._upgradePool(expectedDomVersion, __securityOverride=True)
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 484, in
_upgradePool
str(targetDomVersion))
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1108, in
_convertDomain
targetFormat)
File "/usr/lib/python2.7/site-packages/vdsm/storage/formatconverter.py",
line 447, in convert
converter(repoPath, hostId, imageRepo, isMsd)
File "/usr/lib/python2.7/site-packages/vdsm/storage/formatconverter.py",
line 405, in v5DomainConverter
domain.convert_volumes_metadata(target_version)
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 813,
in convert_volumes_metadata
for vol in self.iter_volumes():
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 764, in
iter_volumes
yield self.produceVolume(img_id, vol_id)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 846, in
produceVolume
volUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterVolume.py",
line 45, in __init__
volUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 817,
in __init__
self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
71, in __init__
volUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86,
in __init__
self.validate()
File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 112,
in validate
self.validateVolumePath()
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
131, in validateVolumePath
raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist:
(u'2bffb8d0-dfb5-4b08-9c6b-716e11f280c2',)
Most of our VMs are still running, but we can't start or restart any VMs.
All of the domains show as down since SPM selection fails. Any thoughts?
Thanks,
-Matthew
5 years, 3 months
Re: Where to configure iscsi initiator name?
by Vincent Royer
You have to add all the initiators for each node to the iscsi LUN. At
least thats how I got it working. Then I think it's recommended to setup
multipath.
However take this with a grain because iscsi did not work well for me.
On Wed, Aug 21, 2019, 10:08 AM Dan Poltawski <dan.poltawski(a)tnp.net.uk>
wrote:
> When I added the first node a 'random' initiator name was generated of
> form:
>
> # cat /etc/iscsi/initiatorname.iscsi
> InitiatorName=iqn.1994-05.com.redhat:[RANDOM]
>
> Having attempted to add another node, this node has another initiator name
> generated and can't access the storage. Is there a way to configure this
> initiator name to a static value which will be configured when new nodes
> get added to the cluster? Or is there some reason for this i'm missing?
>
> thanks,
>
> Dan
> ------------------------------
>
> The Networking People (TNP) Limited. Registered office: Network House,
> Caton Rd, Lancaster, LA1 3PE. Registered in England & Wales with company
> number: 07667393
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/UKNN77NZJ4Z...
>
5 years, 3 months
hosted engine setup, iSCSI no LUNs shown
by billyburly@gmail.com
I'm trying to setup the hosted engine on top of iSCSI storage. It successfully logs in and gets the target, however the process errors out claiming there are no LUNs. But if you look on the host, the disks were added to the system.
[ INFO ] TASK [ovirt.hosted_engine_setup : iSCSI discover with REST API]
[ INFO ] ok: [localhost]
The following targets have been found:
[1] iqn.2001-04.com.billdurr.durrnet.vm-int:vmdata
TPGT: 1, portals:
192.168.47.10:3260
Please select a target (1) [1]: 1
[ INFO ] Getting iSCSI LUNs list
...
[ INFO ] TASK [ovirt.hosted_engine_setup : Get iSCSI LUNs]
[ INFO ] ok: [localhost]
[ ERROR ] Cannot find any LUN on the selected target
[ ERROR ] Unable to get target list
Here's what the config in targetcli looks like
[root@vm1 ~]# targetcli ls
o- / ..................................................................... [...]
o- backstores .......................................................... [...]
| o- block .............................................. [Storage Objects: 2]
| | o- p_iscsi_lun1 .............. [/dev/drbd0 (62.0GiB) write-thru activated]
| | | o- alua ............................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ................... [ALUA state: Active/optimized]
| | o- p_iscsi_lun2 ............. [/dev/drbd1 (310.6GiB) write-thru activated]
| | o- alua ............................................... [ALUA Groups: 1]
| | o- default_tg_pt_gp ................... [ALUA state: Active/optimized]
| o- fileio ............................................. [Storage Objects: 0]
| o- pscsi .............................................. [Storage Objects: 0]
| o- ramdisk ............................................ [Storage Objects: 0]
o- iscsi ........................................................ [Targets: 1]
| o- iqn.2001-04.com.billdurr.durrnet.vm-int:vmdata ................ [TPGs: 1]
| o- tpg1 .............................................. [gen-acls, no-auth]
| o- acls ...................................................... [ACLs: 0]
| o- luns ...................................................... [LUNs: 2]
| | o- lun0 ......... [block/p_iscsi_lun1 (/dev/drbd0) (default_tg_pt_gp)]
| | o- lun1 ......... [block/p_iscsi_lun2 (/dev/drbd1) (default_tg_pt_gp)]
| o- portals ................................................ [Portals: 1]
| o- 192.168.47.10:3260 ........................................... [OK]
o- loopback ..................................................... [Targets: 0]
o- srpt ......................................................... [Targets: 0]
The two LUNs show up on the host after the hosted engine setup tries to enumerate the LUNs for the target
[root@vm1 ~]# lsscsi
[0:0:0:0] storage HP P420i 8.32 -
[0:1:0:0] disk HP LOGICAL VOLUME 8.32 /dev/sda
[0:1:0:1] disk HP LOGICAL VOLUME 8.32 /dev/sdb
[0:1:0:2] disk HP LOGICAL VOLUME 8.32 /dev/sdc
[11:0:0:0] disk LIO-ORG p_iscsi_lun1 4.0 /dev/sdd
[11:0:0:1] disk LIO-ORG p_iscsi_lun2 4.0 /dev/sde
5 years, 3 months
Reaction and timeouts to APD (All Paths Down)
by Gianluca Cecchi
Hello,
similar questions/arguments took already place in the past, but I think it
could be a good point to dig more if possible.
I focus on block based storage and my environments have iSCSI based with
multipath, connected to Equallogic.
Currently I have set no_path_retry to 4, so I have a 20 seconds timeout
(polling_interval=5).
Sometimes some network planned activity (not under my domain) that should
have no impact "doesn't go so well" and I could have longer delays (also
60-70 seconds) and so oVirt reactions where I see soft fencing, VMs going
into paused state or "question mark" state...
Comparing to vSphere, the same events don't apparently cause anything (and
I see same lost paths events in datastore and ESXi host monitoring-->
Events pane).
This basically depends on APD default timeout that seems to be 140 seconds.
This difference of behavior has the effect of showing an apparent better
SLA of vSphere during these short time outages and a sort of show stopper
for extending oVirt implementation....
Interesting vSphere KBs here:
Storage device has entered the All Paths Down state (2032934)
https://kb.vmware.com/s/article/2032934
Containing:
"Note: By default, the APD timeout is set to 140 seconds."
All Paths Down timeout for a storage device has expired (2032940)
https://kb.vmware.com/s/article/2032940
Path redundancy to the storage device is degraded (1009555)
https://kb.vmware.com/s/article/1009555
Storage device has recovered from the APD state (2032945)
https://kb.vmware.com/s/article/2032945
So the question is what kind of real risks I have if I simulate that
behavior and set a no_path_retry value so that polling_interval x
no_path_retry = 140
(in default config of polling_interval=5 it would mean no_path_retry = 28)
BTW: I also have an environment based on RHV 4.3.5 and iSCSI and in
parallel I opened a case (02452597) for asking clarifications/chances
remaining in supported configuration, so for logs and so on it could help
going into it for Red Hat developers
Thanks,
Gianluca
5 years, 3 months