Engine restore errors out on "Wait for OVF_STORE disk content"

Hello, when trying to deploy the engine using "hosted-engine --deploy --restore-from-file=myenginebackup" the ansible playbook errors out at [ INFO ] TASK [ovirt.hosted_engine_setup : Trigger hosted engine OVF update and enable the serial console] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Wait until OVF update finishes] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Parse OVF_STORE disk list] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Check OVF_STORE volume status] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Wait for OVF_STORE disk content] [ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000141645 s, 72.3 MB/s', u'tar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2019-05-08 15:21:47.595195', u'_ansible_item_label': {u'image_id': u'65fd6c57-033c-4c95-87c1-b16c26e4bc98', u'name': u'OVF_STORE', u'id': u'9ff8b389-5e24-4166-9842-f1d6104b662b'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2019-05-08 15:21:46.906877', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=9ff8b389-5e24-4166-9842-f1d6104b662b volumeID=65fd6c57-033c-4c95-87c1-b16c26e4bc98 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40 b18612a54.ovf", u'item': {u'image_id': u'65fd6c57-033c-4c95-87c1-b16c26e4bc98', u'name': u'OVF_STORE', u'id': u'9ff8b389-5e24-4166-9842-f1d6104b662b'}, u'delta': u'0:00:00.688318', u'invocation': {u'module_args': {u'warn': False, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=9ff8b389-5e24-4166-9842-f1d6104b662b volumeID=65fd6c57-033c-4c95-87c1-b16c26e4bc98 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000141645 s, 72.3 MB/s\ntar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False} [ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000140541 s, 72.9 MB/s', u'tar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2019-05-08 15:24:01.387469', u'_ansible_item_label': {u'image_id': u'dacf9ad8-77b9-4205-8ca2-d6877627ad4a', u'name': u'OVF_STORE', u'id': u'8691076a-8e45-4429-a18a-5faebef866cc'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2019-05-08 15:24:00.660309', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=8691076a-8e45-4429-a18a-5faebef866cc volumeID=dacf9ad8-77b9-4205-8ca2-d6877627ad4a | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40 b18612a54.ovf", u'item': {u'image_id': u'dacf9ad8-77b9-4205-8ca2-d6877627ad4a', u'name': u'OVF_STORE', u'id': u'8691076a-8e45-4429-a18a-5faebef866cc'}, u'delta': u'0:00:00.727160', u'invocation': {u'module_args': {u'warn': False, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=8691076a-8e45-4429-a18a-5faebef866cc volumeID=dacf9ad8-77b9-4205-8ca2-d6877627ad4a | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000140541 s, 72.9 MB/s\ntar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook I tried twice. Same result. Should I retry? Is it safe to use the local hosted engine for starting stopping vms? I'm kind of headless for some days :-) Best regards.

On Wed, May 8, 2019 at 3:44 PM Andreas Elvers < andreas.elvers+ovirtforum@solutions.work> wrote:
Hello,
when trying to deploy the engine using "hosted-engine --deploy --restore-from-file=myenginebackup" the ansible playbook errors out at
[ INFO ] TASK [ovirt.hosted_engine_setup : Trigger hosted engine OVF update and enable the serial console] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Wait until OVF update finishes] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Parse OVF_STORE disk list] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Check OVF_STORE volume status] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Wait for OVF_STORE disk content] [ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000141645 s, 72.3 MB/s', u'tar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2019-05-08 15:21:47.595195', u'_ansible_item_label': {u'image_id': u'65fd6c57-033c-4c95-87c1-b16c26e4bc98', u'name': u'OVF_STORE', u'id': u'9ff8b389-5e24-4166-9842-f1d6104b662b'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2019-05-08 15:21:46.906877', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=9ff8b389-5e24-4166-9842-f1d6104b662b volumeID=65fd6c57-033c-4c95-87c1-b16c26e4bc98 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40 b18612a54.ovf", u'item': {u'image_id': u'65fd6c57-033c-4c95-87c1-b16c26e4bc98', u'name': u'OVF_STORE', u'id': u'9ff8b389-5e24-4166-9842-f1d6104b662b'}, u'delta': u'0:00:00.688318', u'invocation': {u'module_args': {u'warn': False, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=9ff8b389-5e24-4166-9842-f1d6104b662b volumeID=65fd6c57-033c-4c95-87c1-b16c26e4bc98 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000141645 s, 72.3 MB/s\ntar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False} [ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000140541 s, 72.9 MB/s', u'tar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2019-05-08 15:24:01.387469', u'_ansible_item_label': {u'image_id': u'dacf9ad8-77b9-4205-8ca2-d6877627ad4a', u'name': u'OVF_STORE', u'id': u'8691076a-8e45-4429-a18a-5faebef866cc'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2019-05-08 15:24:00.660309', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=8691076a-8e45-4429-a18a-5faebef866cc volumeID=dacf9ad8-77b9-4205-8ca2-d6877627ad4a | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40 b18612a54.ovf", u'item': {u'image_id': u'dacf9ad8-77b9-4205-8ca2-d6877627ad4a', u'name': u'OVF_STORE', u'id': u'8691076a-8e45-4429-a18a-5faebef866cc'}, u'delta': u'0:00:00.727160', u'invocation': {u'module_args': {u'warn': False, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=597f329c-0296-03af-0369-000000000139 storagedomainID=f708ced4-e339-4d02-a07f-78f1a30fc2a8 imageID=8691076a-8e45-4429-a18a-5faebef866cc volumeID=dacf9ad8-77b9-4205-8ca2-d6877627ad4a | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000140541 s, 72.9 MB/s\ntar: ebb09b0e-2d03-40f0-8fa4-c40b18612a54.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
I tried twice. Same result. Should I retry?
We had a bug about that in the past: https://bugzilla.redhat.com/1644748 <https://bugzilla.redhat.com/show_bug.cgi?id=1644748> but it's reported as CLOSED CURRENTRELEASE. Can I ask which versions of ovirt-hosted-engine-setup, ovirt-ansible-hosted-engine-setup and ovirt-engine-appliance are you using? I see that you sent more than one email in the last days and in general all the issue you are reporting are due to timeouts/race conditions. Can you please provide some more info about your storage configuration?
Is it safe to use the local hosted engine for starting stopping vms? I'm kind of headless for some days :-)
Best regards. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UNBCYRKXM24UW3...
-- Simone Tiraboschi He / Him / His Principal Software Engineer Red Hat <https://www.redhat.com/> stirabos@redhat.com @redhatjobs <https://twitter.com/redhatjobs> redhatjobs <https://www.facebook.com/redhatjobs> @redhatjobs <https://instagram.com/redhatjobs> <https://red.ht/sig> <https://redhat.com/summit>

Yeah. I digged that bug report up as well. Our oVirt setup comprises of 6 oVirt hosts which form 2 datacenters Datacenter: Luise (has a ceph storage domain) node01 to node03 are a hyper converged oVirt glusterfs. This is cluster Luise01. My original is to move the engine onto this cluster. Datacenter: Default Cluster Default node04 and node05. It uses an nfs storage: see below. The engine used to live here. Datacenter Luise Cluster: Luise02 node06 is currently the only node. This cluster will gain all nodes from the Default Datacenter as soon as it is decommissioned. All nodes are by Supermicro 4 cores 48GB Intel Xeon E5506 @ 2.13 GHz They use two bonded 1Gbit Links for connectivity. I was sceptical with performance, but we are running multiple around 10-20 TYPO3 CMSes plus lesser IO hungry things in around 3 to 6 VMs per Node. They all use Ceph as storage domain. All nodes I tried to deploy to are without virtual load. Storage: ======== The glusterfs is provided by the hyper converged setup provided by oVirt Node NG. It is quite small because is just our gateway to boot the cinder VM that will guide our ceph nodes to their rbd destiny. Every Node has a 250GB SSD. This gives us 90GB for the :/engine gluster volume. There is also a 100GB volume for the supporting VMs that provide the bootstrap to cinder. There is little IO going to these volumes. The ceph is a 10Gbit connected cluster of 3 Nodes with SSD OSDs. The oVirt connect via a Cisco The NFS server is serving a Raid-10 btrfs formatted filesystem added with a bcache. My migration history so far: Tried to Migrate to Luise Datacenter I did a typo with entering Luise1 instead the correct cluster Luise01. Ironically that migration succeeded. Tried to Migrate back to Default. First one erred out with "Disk Image is not available in the cluster". Second one succeeded after following your advice to try it a second time. I removed the accidentally created Luise1 cluster. Tried to Migrate to Luise again. This time I forgot to completely shut down the hosted engine on Default Datacenter. I shut it down and tried again. All subsequent retires to deploy to the hyper converged Luise01 cluster failed. I used the Local Engine to re-install node05 which was hosting the engine I forget to shut down. I was not able to access the web interface for some reason, so I supposed it was damaged in a way. After 4 tries of deployment to our hyper converged cluster that all erred with the "Disk Image is not available" I'm not trying to go back to the Default Datacenter. Twice it erred out at "Wait for OVF_STORE disk content"

Short correction: All nodes oVirt and ceph are connected with 3 stacked Cisco 3750e providing 10 and 1Gbit ports.

I'm current on 4.3.3.1 The software versions I am using: ovirt-hosted-engine-setup: 2.3.7 ovirt-ansible-hosted-engine-setup: 1.0.17 ovirt-engine-appliance: 4.2 I think I found the bug. :-/

I updated all nodes to 4.3.3.1, but I did not remove the 4.2 repo from /etc/yum.repos.d. I did now remove the old repos, ovirt-engine-appliance is now shown as 4.3. I tried deployment to the hyper converged cluster. First try failed again with "The target Data Center does not contain the Virtual Disk.". I'm retrying. Should this fail again, I will try a deployment to the old NFS backed Default Datacenter.

I re-checked the versions with [root@node05 ~]# rpm -qa | grep engine ovirt-ansible-engine-setup-1.1.9-1.el7.noarch ovirt-engine-appliance-4.3-20190416.1.el7.x86_64 ovirt-hosted-engine-ha-2.3.1-1.el7.noarch ovirt-hosted-engine-setup-2.3.7-1.el7.noarch ovirt-ansible-hosted-engine-setup-1.0.17-1.el7.noarch python-ovirt-engine-sdk4-4.3.1-2.el7.x86_64 so probably I was already on the latest versions. I did not do any updated. First I was doing a "rpm info" to get the version.

Deployment on NFS again failed with TASK [ovirt.hosted_engine_setup : Wait for OVF_STORE disk content]. I'm lost.

Any update about this? If still stuck, please provide more information, such as the relevant parts of relevant log files (setup logs, engine/vdsm, etc.). Thanks and best regards, On Thu, May 9, 2019 at 4:07 PM Andreas Elvers <andreas.elvers+ovirtforum@solutions.work> wrote:
Deployment on NFS again failed with TASK [ovirt.hosted_engine_setup : Wait for OVF_STORE disk content].
I'm lost. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DFG5ECI2DADG7G...
-- Didi

Sorry. The thread became separated. Yeah. I finished the migration. Simone proposed I had a corrupted backup. And yes. I was backtracking my backups to that point where I switched all datacenters to 4.3 compatibility and before doing any engine migration. Lucky me, I had that backup available. I think my backups became corrupted, because I did a backup from the engine after I moved it to a cluster that was originally not defined. I had a typo in the cluster name. And used that backup and tried to get the engine back to my nfs storage. That worked. I did another backup at that place again and from that point on the backup would always break the deployment for whatever reason. The logs are available at https://lists.ovirt.org/archives/list/users@ovirt.org/thread/HXJZWN7L6WXSDLQ...

On Wed, May 22, 2019 at 3:04 PM Andreas Elvers <andreas.elvers+ovirtforum@solutions.work> wrote:
Sorry. The thread became separated. Yeah. I finished the migration. Simone proposed I had a corrupted backup. And yes. I was backtracking my backups to that point where I switched all datacenters to 4.3 compatibility and before doing any engine migration. Lucky me, I had that backup available.
I think my backups became corrupted, because I did a backup from the engine after I moved it to a cluster that was originally not defined. I had a typo in the cluster name. And used that backup and tried to get the engine back to my nfs storage. That worked. I did another backup at that place again and from that point on the backup would always break the deployment for whatever reason.
The logs are available at
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/HXJZWN7L6WXSDLQ...
Sorry that I didn't follow all the threads. It sounds like you went through a lot of hard-work with your oVirt setups lately :-( If you can now take the time, and think about changes/improvements that would have helped you, or (more importantly) prevent you from doing some mistakes you did, please open RFEs/bugs. I'll also discuss restore with Simone in private, we might come up with our own RFE. Thanks and best regards, -- Didi

No need to worry. I'm so much impressed with the quality of oVirt. It has so many different setup scenarios, mine even a tech preview style one ;-) And good idea to file a bug for usability. To wrap things up: Me typing Luise1 instead of Luise01 was my error to begin with. It triggered everything. Luise01 is our hyper converged oVirt node cluster. It is our ceph bootstrap. Yeah... How could you forget the zero.. node01, node02 and node03 are forming Luise01. I ordered deployment of the engine to node01 on Luise1 and hosted-engine happily did so. Creating a new cluster Luise1 on node01 masking the glusterfs bricks on node01 for the remaining Luise01 cluster nodes. This most probably also triggered this issue: https://lists.ovirt.org/archives/list/users@ovirt.org/thread/L3YCRPRAGPUMBZI... It has already been addressed by Nir Soffer. See https://lists.ovirt.org/archives/list/users@ovirt.org/message/OSZGHT65OUUSGK... I will create a bug report for that. A node that has already joined a cluster should not be a viable destination for an engine deployment to a new to create cluster.
participants (3)
-
Andreas Elvers
-
Simone Tiraboschi
-
Yedidyah Bar David