Attach the snapshot to the backup virtual machine and activate the disk
by smidhunraj@gmail.com
This is the doubt regarding to the rest api
<https://ovirt.org/documentation/admin-guide/chap-Backups_and_Migration.ht...>
Can you please tell me what would be the response of the api request of step 4 ((will it be a success or a error))
=====================================================================
Attach the snapshot to the backup virtual machine and activate the disk:
POST /api/vms/22222222-2222-2222-2222-222222222222/disks/ HTTP/1.1
Accept: application/xml
Content-type: application/xml
<disk id="11111111-1111-1111-1111-111111111111">
<snapshot id="11111111-1111-1111-1111-111111111111"/>
<active>true</active>
</disk>
==============================================================================================
if we are setting up a bare VM without any backup mechanism in it and try to attach a vm to it.
5 years, 4 months
Re: ovirt-engine-appliance ova
by Strahil
It looks like a kickstart error.
Maybe a parameter in the kickstart is missing ?
Best Regards,
Strahil NikolovOn Jul 23, 2019 09:08, Yedidyah Bar David <didi(a)redhat.com> wrote:
>
> On Mon, Jul 22, 2019 at 11:53 PM Jingjie Jiang <jingjie.jiang(a)oracle.com> wrote:
>>
>> Hi David,
>
>
> (Actually it's "Yedidyah" or "Didi")
>
>>
>> Thanks for your info.
>>
>> Please check my reply inline.
>>
>>
>> -Jingjie
>>
>> On 7/16/19 3:55 AM, Yedidyah Bar David wrote:
>>>
>>> On Thu, Jul 11, 2019 at 10:46 PM <jingjie.jiang(a)oracle.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can someone tell me how to generate ovirt-engine-appliance ova file in ovirt-engine-appliance-4.3-20190610.1.el7.x86_64.rpm?
>>>>
>>> You might want to check the project's source code:
>>>
>>>
>>>
>>> https://github.com/ovirt/ovirt-appliance/
>>>
>>>
>>>
>>> Or study the logs of a CI build of it:
>>>
>>>
>>>
>>> https://jenkins.ovirt.org/job/ovirt-appliance_master_build-artifacts-el7-...
>>>
>>>
>>>
>>> I never tried building it myself locally, though.
>>
>> I tried to build after checked out the source code from https://github.com/ovirt/ovirt-appliance/,
>>
>> but the build failed.
>>
>> # make
>> livemedia-creator --make-disk --ram=2048 --vcpus=4 --iso=boot.iso --ks=ovirt-engine-appliance.ks --qcow2 --image-name=ovirt-engine-appliance.qcow2
>> 2019-07-22 12:34:00,095: livemedia-creator 19.7.19-1
>> 2019-07-22 12:34:00,154: disk_size = 51GiB
>> 2019-07-22 12:34:00,154: disk_img = /var/tmp/ovirt-engine-appliance.qcow2
>> 2019-07-22 12:34:00,154: install_log = /root/ovirt/ovirt-appliance/engine-appliance/virt-install.log
>> mount: /dev/loop0 is write-protected, mounting read-only
>> Formatting '/var/tmp/ovirt-engine-appliance.qcow2', fmt=qcow2 size=54760833024 encryption=off cluster_size=65536 lazy_refcounts=off
>> 2019-07-22 12:34:10,195: Running virt-install.
>>
>> Starting install...
>> Retrieving file vmlinuz... | 6.3 MB 00:00
>> Retrieving file initrd.img... | 50 MB 00:00
>> Domain installation still in progress. You can reconnect to
>> the console to complete the installation process.
>> ......
>> 2019-07-22 12:35:15,281: Installation error detected. See logfile.
>> 2019-07-22 12:35:15,283: Shutting down LiveOS-27f2dc2b-4b30-4eb1-adcd-b5ab50fdbf55
>> Domain LiveOS-27f2dc2b-4b30-4eb1-adcd-b5ab50fdbf55 destroyed
>>
>> Domain LiveOS-27f2dc2b-4b30-4eb1-adcd-b5ab50fdbf55 has been undefined
>>
>> 2019-07-22 12:35:15,599: unmounting the iso
>> 2019-07-22 12:35:20,612: Install failed: virt_install failed
>> 2019-07-22 12:35:20,613: Removing bad disk image
>> 2019-07-22 12:35:20,613: virt_install failed
>> make: *** [ovirt-engine-appliance.qcow2] Error 1
>>
>> In the log I found the error as following from virt-install.log:
>>
>> 16:35:07,472 ERR anaconda:CmdlineError: The following mandatory spokes are not completed:#012Installation source#012Software selection
>> 16:35:07,472 DEBUG anaconda:running handleException
>> 16:35:07,473 CRIT anaconda:Traceback (most recent call last):#012#012 File "/usr/lib64/python2.7/site-packages/pyanaconda/ui/tui/simpleline/base.py", line 352, in _mainloop#012 prompt = last_screen.prompt(self._screens[-1][1])#012#012 File "/usr/lib64/python2.7/site-packages/pyanaconda/ui/tui/hubs/summary.py", line 107, in prompt#012 raise CmdlineError(errtxt)#012#012CmdlineError: The following mandatory spokes are not completed:#012Installation source#012Software selection
>> 16:35:08,020 DEBUG anaconda:Gtk cannot be initialized
>> 16:35:08,020 DEBUG anaconda:In the main thread, running exception handler
>> 16:35:08,386 NOTICE multipathd:zram0: add path (uevent)
>> 16:35:08,386 NOTICE multipathd:zram0: spurious uevent, path already in pathvec
>> 16:35:08,386 NOTICE multipathd:zram0: HDIO_GETGEO failed with 25
>> 16:35:08,386 ERR multipathd:zram0: failed to get path uid
>> 16:35:08,388 ERR multipathd:uevent trigger error
>>
>> Can you help me to fix the issue?
>
>
> Sorry, I never tried to build it myself, nor have experience with livemedia-creator. As I wrote above, I suggest to compare your output/result with that of oVirt CI. Otherwise, I'd probably start debugging by searching the net for the error messages you received.
>
> Good luck and best regards,
>
>>
>>
>>>> I tried to import ovirt-engine-appliance ova(ovirt-engine-appliance-4.3-20190610.1.el7.ova) from ovirt-engine, but I got error as following:
>>>>
>>>> Failed to load VM configuration from OVA file: /var/tmp/ovirt-engine-appliance-4.2-20190121.1.el7.ova
>>>>
>>> No idea why this failed.
>>>
>>>
>>>
>>>> I guess ovirt-engine-appliance-4.2-20190121.1.el7.ova has more than CentOS7.6.
>>>>
>>> It has CentOS + oVirt engine.
>>>
>>>
>>>
>>> The only major use for it is by hosted-engine --deploy. In theory you
>>>
>>> can try importing it elsewhere, but I do not recall reports about
>>>
>>> people that tried this and whether it works.
>>>
>>>
>>>
>>> Best regards,
>>>
>
>
> --
> Didi
5 years, 4 months
guest agent w10
by suporte@logicworks.pt
Hello,
I installed Qemu guest agent 7.4.5 on windows 10. The service is running, but I cannot see the IP address or the FQDN on the engine.
I'm running oVirt 4.3.4.3-1.el7
Am I forgetting something?
Thanks
--
Jose Ferradeira
http://www.logicworks.pt
5 years, 4 months
Request for clarification of logline
by Vrgotic, Marko
Dear oVirt,
I was not able to find out meaning behind following WARN message:
WARN [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] unmanaged disk with path '/var/run/vdsm/payload/0687ad1a-4827-48d9-8629-b29a000df280.760410382a41ff5e6b1ee9365856517e.img' is ignored
Will somebody be so kind to clarify the actual meaning of it or point me in right direction of cause?
I can provide more information if required.
Kindly awaiting your reply.
— — —
Met vriendelijke groet / Kind regards,
Marko Vrgotic
ActiveVideo
5 years, 4 months
Cinderlib managed block storage, ceph jewel
by mathias.schwenke@uni-dortmund.de
Hi.
I tried to use manged block storage to connect our oVirt cluster (Version 4.3.4.3-1.el7) to our ceph storage ( version 10.2.11). I used the instructions from https://ovirt.org/develop/release-management/features/storage/cinderlib-i...
At the moment, in the ovirt administration portal I can create and delete ceph volumes (ovirt disks) and attach them to virtual machines. If I try to launch a vm with connected ceph block storage volume, starting fails:
2019-07-16 19:39:09,251+02 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.AttachManagedBlockStorageVolumeVDSCommand] (default task-53) [7cada945] Unexpected return value: Status [code=926, message=Managed Volume Helper failed.: ('Error executing helper: Command [\'/usr/libexec/vdsm/managedvolume-helper\', \'attach\'] failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', \\\'--privsep_context\\\', \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\', \\\'/tmp/tmpB6ZBAs/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon starting\\noslo.privsep.daemon: privsep process running with uid/gid: 0/0\\noslo.privsep.daemon: privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: privsep daemon running as pid 112531\\nTraceback (most recent call last):\\n File "/usr/libexec/vdsm/managedvolume-help
er", line 154, in <module>\\n sys.exit(main(sys.argv[1:]))\\n File "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n args.command(args)\\n File "/usr/libexec/vdsm/managedvolume-helper", line 137, in attach\\n attachment = conn.connect_volume(conn_info[\\\'data\\\'])\\n File "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line 96, in connect_volume\\n run_as_root=True)\\n File "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in _execute\\n result = self.__execute(*args, **kwargs)\\n File "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute\\n return execute_root(*cmd, **kwargs)\\n File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 205, in _wrap\\n return self.channel.remote_call(name, args, kwargs)\\n File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call\\n raise exc_type(*result[2])\\noslo_concurrency.processutils.Pr
ocessExecutionError: Unexpected error while running command.\\nCommand: rbd map volume-a57dbd5c-2f66-460f-b37f-5f7dfa95d254 --pool ovirt-volumes --conf /tmp/brickrbd_TLMTkR --id ovirtcinderlib --mon_host 192.168.61.1:6789 --mon_host 192.168.61.2:6789 --mon_host 192.168.61.3:6789\\nExit code: 6\\nStdout: u\\\'RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable".\\\\nIn some cases useful info is found in syslog - try "dmesg | tail" or so.\\\\n\\\'\\nStderr: u\\\'rbd: sysfs write failed\\\\nrbd: map failed: (6) No such device or address\\\\n\\\'\\n\'',)]
2019-07-16 19:39:09,251+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.AttachManagedBlockStorageVolumeVDSCommand] (default task-53) [7cada945] Failed in 'AttachManagedBlockStorageVolumeVDS' method
After disconnecting the disk, I can delete it (the volume disappears from ceph), but the disks stays in my oVirt administration portal as cinderlib means the disk ist still connected:
2019-07-16 19:42:53,551+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engine-Thread-487362) [887b4d11-302f-4f8d-a3f9-7443a80a47ba] Running command: RemoveDiskCommand internal: false. Entities affected : ID: a57dbd5c-2f66-460f-b37f-5f7dfa95d254 Type: DiskAction group DELETE_DISK with role type USER
2019-07-16 19:42:53,559+02 INFO [org.ovirt.engine.core.bll.storage.disk.managedblock.RemoveManagedBlockStorageDiskCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-8) [] Running command: RemoveManagedBlockStorageDiskCommand internal: true.
2019-07-16 19:42:56,240+02 ERROR [org.ovirt.engine.core.common.utils.cinderlib.CinderlibExecutor] (EE-ManagedThreadFactory-commandCoordinator-Thread-8) [] cinderlib execution failed
DBReferenceError: (psycopg2.IntegrityError) update or delete on table "volumes" violates foreign key constraint "volume_attachment_volume_id_fkey" on table "volume_attachment"
2019-07-16 19:42:55,958 - cinderlib-client - INFO - Deleting volume 'a57dbd5c-2f66-460f-b37f-5f7dfa95d254' [887b4d11-302f-4f8d-a3f9-7443a80a47ba]
2019-07-16 19:42:56,099 - cinderlib-client - ERROR - Failure occurred when trying to run command 'delete_volume': (psycopg2.IntegrityError) update or delete on table "volumes" violates foreign key constraint "volume_attachment_volume_id_fkey" on table "volume_attachment"
DETAIL: Key (id)=(a57dbd5c-2f66-460f-b37f-5f7dfa95d254) is still referenced from table "volume_attachment".
[SQL: 'DELETE FROM volumes WHERE volumes.deleted = false AND volumes.id = %(id_1)s'] [parameters: {'id_1': u'a57dbd5c-2f66-460f-b37f-5f7dfa95d254'}] [887b4d11-302f-4f8d-a3f9-7443a80a47ba]
How are your experiences with oVirt, cinderlib and ceph? Should it work?
5 years, 4 months
Re: Storage domain 'Inactive' but still functional
by Strahil
If you can give directions (some kind of history) , the dev might try to reproduce this type of issue.
If it is reproduceable - a fix can be provided.
Based on my experience, if something as used as Linux LVM gets broken, the case is way hard to reproduce.
Best Regards,
Strahil NikolovOn Jul 22, 2019 10:17, Martijn Grendelman <Martijn.Grendelman(a)isaac.nl> wrote:
>
> Hi,
>
> Thanks for the tips! I didn't know about 'pvmove', thanks.
>
> In the mean time, I managed to get it fixed by restoring the VG metadata on the iSCSI server, so on the underlying Zvol directly, rather than via the iSCSI session on the oVirt host. That allowed me to perform the restore without bringing all VMs down, which was important to me, because if I had to shut down VMs, I was sure I wouldn't be able to restart them before the storage domain was back online.
>
> Of course this is a more a Linux problem than an oVirt problem, but oVirt did cause it ;-)
>
> Thanks,
> Martijn.
>
>
>
> Op 19-7-2019 om 19:06 schreef Strahil Nikolov:
>>
>> Hi Martin,
>>
>> First check what went wrong with the VG -as it could be something simple.
>> vgcfgbackup -f VGname will create a file which you can use to compare current metadata with a previous version.
>>
>> If you have Linux boxes - you can add disks from another storage and then pvmove the data inside the VM. Of course , you will need to reinstall grub on the new OS disk , or you won't be able to boot afterwards.
>> If possible, try with a test VM before proceeding with important ones.
>>
>> Backing up the VMs is very important , because working on LVM metadata is quite risky.
>> Last time I had such an issue , I was working on clustered LVs which got their PVs "Missing". For me , restore from VG backup fixed the issue - but that might not be always the case.
>>
>> Just get the vgcfgbackup's output and compare with diff or vimdiff and check what is different.
>>
>> Sadly, I think that this is more a Linux problem , than an oVirt problem.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> В четвъртък, 18 юли 2019 г., 18:51:32 ч. Гринуич+3, Martijn Grendelman <Martijn.Grendelman(a)isaac.nl> написа:
>>
>>
>> Hi!
>>
>> Thanks. Like I wrote, I have metadata backups from /etc/lvm/backup and -/archive, and I also have the current metadata as it exis
5 years, 4 months
LiveStoreageMigration failed
by Christoph Köhler
Hello,
I try to migrate a disk of a running vm from gluster 3.12.15 to gluster
3.12.15 but it fails. libGfApi set to true by engine-config.
° taking a snapshot first, is working. Then at engine-log:
2019-07-18 09:29:13,932+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Failed in
'VmReplicateDiskStartVDS' method
2019-07-18 09:29:13,936+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovvirt07 command
VmReplicateDiskStartVDS failed: Drive replication error
2019-07-18 09:29:13,936+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Command
'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand'
return value 'StatusOnlyReturn [status=Status [code=55, message=Drive
replication error]]'
2019-07-18 09:29:13,936+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] HostName = ovvirt07
2019-07-18 09:29:13,937+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Command
'VmReplicateDiskStartVDSCommand(HostName = ovvirt07,
VmReplicateDiskParameters:{hostId='3a7bf85c-e92d-4559-908e-5eed2f5608d4',
vmId='3b79d0c0-47e9-47c3-8511-980a8cfe147c',
storagePoolId='00000001-0001-0001-0001-000000000311',
srcStorageDomainId='e54d835a-d8a5-44ae-8e17-fcba1c54e46f',
targetStorageDomainId='4dabb6d6-4be5-458c-811d-6d5e87699640',
imageGroupId='d2964ff9-10f7-4b92-8327-d68f3cfd5b50',
imageId='62656632-8984-4b7e-8be1-fd2547ca0f98'})' execution failed:
VDSGenericException: VDSErrorException: Failed to
VmReplicateDiskStartVDS, error = Drive replication error, code = 55
2019-07-18 09:29:13,937+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] FINISH,
VmReplicateDiskStartVDSCommand, return: , log id: 5b2afb0b
2019-07-18 09:29:13,937+02 ERROR
[org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Failed VmReplicateDiskStart (Disk
'd2964ff9-10f7-4b92-8327-d68f3cfd5b50' , VM
'3b79d0c0-47e9-47c3-8511-980a8cfe147c')
2019-07-18 09:29:13,938+02 ERROR
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Command 'LiveMigrateDisk' id:
'8174c74c-8ab0-49fa-abfc-44d8b7c691e0' with children
[03672b60-443b-47ba-834c-ac306d7129d0,
562522fc-6691-47fe-93bf-ef2c45e85676] failed when attempting to perform
the next operation, marking as 'ACTIVE'
2019-07-18 09:29:13,938+02 ERROR
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] EngineException: Drive
replication error (Failed with error replicaErr and code 55):
org.ovirt.engine.core.common.errors.EngineException: EngineException:
Drive replication error (Failed with error replicaErr and code 55)
at
org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.replicateDiskStart(LiveMigrateDiskCommand.java:526)
[bll.jar:]
at
org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:233)
[bll.jar:]
at
org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32)
[bll.jar:]
at
org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77)
[bll.jar:]
at
org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175)
[bll.jar:]
at
org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)
[bll.jar:]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[rt.jar:1.8.0_212]
at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[rt.jar:1.8.0_212]
at
org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
[javax.enterprise.concurrent-1.0.jar:]
at
org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
[javax.enterprise.concurrent-1.0.jar:]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[rt.jar:1.8.0_212]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[rt.jar:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_212]
at
org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)
[javax.enterprise.concurrent-1.0.jar:]
at
org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
2019-07-18 09:29:13,938+02 INFO
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(EE-ManagedThreadFactory-engineScheduled-Thread-84)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Command 'LiveMigrateDisk' id:
'8174c74c-8ab0-49fa-abfc-44d8b7c691e0' child commands
'[03672b60-443b-47ba-834c-ac306d7129d0,
562522fc-6691-47fe-93bf-ef2c45e85676]' executions were completed, status
'FAILED'
2019-07-18 09:29:15,019+02 ERROR
[org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-1)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Ending command
'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure.
2019-07-18 09:29:15,019+02 ERROR
[org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-1)
[c957e011-37e0-43aa-abe7-9bb633c38c5f] Failed during live storage
migration of disk 'd2964ff9-10f7-4b92-8327-d68f3cfd5b50' of vm
'3b79d0c0-47e9-47c3-8511-980a8cfe147c', attempting to end replication
before deleting the target disk
//
LiveStorageMigration on gluster - should that work at all? Has someone
tried it?
Greetings!
Christoph Köhler
5 years, 4 months
Re: Storage domain 'Inactive' but still functional
by Martijn Grendelman
Hi,
Thanks for the tips! I didn't know about 'pvmove', thanks.
In the mean time, I managed to get it fixed by restoring the VG metadata on the iSCSI server, so on the underlying Zvol directly, rather than via the iSCSI session on the oVirt host. That allowed me to perform the restore without bringing all VMs down, which was important to me, because if I had to shut down VMs, I was sure I wouldn't be able to restart them before the storage domain was back online.
Of course this is a more a Linux problem than an oVirt problem, but oVirt did cause it ;-)
Thanks,
Martijn.
Op 19-7-2019 om 19:06 schreef Strahil Nikolov:
Hi Martin,
First check what went wrong with the VG -as it could be something simple.
vgcfgbackup -f VGname will create a file which you can use to compare current metadata with a previous version.
If you have Linux boxes - you can add disks from another storage and then pvmove the data inside the VM. Of course , you will need to reinstall grub on the new OS disk , or you won't be able to boot afterwards.
If possible, try with a test VM before proceeding with important ones.
Backing up the VMs is very important , because working on LVM metadata is quite risky.
Last time I had such an issue , I was working on clustered LVs which got their PVs "Missing". For me , restore from VG backup fixed the issue - but that might not be always the case.
Just get the vgcfgbackup's output and compare with diff or vimdiff and check what is different.
Sadly, I think that this is more a Linux problem , than an oVirt problem.
Best Regards,
Strahil Nikolov
В четвъртък, 18 юли 2019 г., 18:51:32 ч. Гринуич+3, Martijn Grendelman <Martijn.Grendelman(a)isaac.nl><mailto:Martijn.Grendelman@isaac.nl> написа:
Hi!
Thanks. Like I wrote, I have metadata backups from /etc/lvm/backup and -/archive, and I also have the current metadata as it exists on disk. What I'm most concerned about, is the proposed procedure.
I would create a backup of the VG, but I'm not sure what would be the most sensible way to do it. I could make a new iSCSI target and simply 'dd' the whole disk over, but that would take quite some time (it's 2,5 TB) and there are VMs that can't really be down for that long. And I'm not even sure that dd'ing the disk like that is a sensible strategy.
Moving disks out of the domain is currently not possible. oVirt says 'Source Storage Domain is not active'.
Thanks,
Martijn.
Op 18-7-2019 om 17:44 schreef Strahil Nikolov:
Can you check the /etc/lvm/backup and /etc/lvm/archive on your SPM host (check the other hosts, just in case you find anything useful) ?
Usually LVM makes backup of everything.
I would recommend you to:
1. Create a backup of the problematic VG
2. Compare the backup file and a file from backup/archive folders for the same VG
Check what is different with diff/vimdiff . It might give you a clue.
I had some issues (non-related to oVirt) and restoring the VG from older backup did help me .Still ,any operation on block devices should be considered risky and a proper backup is needed.
You could try to move a less important VM's disks out of this storage domain to another one.
If it succeeds - then you can evacuate all VMs away before you can start "breaking" the storage domain.
Best Regards,
Strahil Nikolov
В четвъртък, 18 юли 2019 г., 16:59:46 ч. Гринуич+3, Martijn Grendelman <martijn.grendelman(a)isaac.nl><mailto:martijn.grendelman@isaac.nl> написа:
Hi,
It appears that O365 has trouble delivering mails to this list, so two
earlier mails of mine are still somewhere in a queue and may yet be delivered.
This mail has all of the content of 3 successive mails. I apologize for this
format.
Op 18-7-2019 om 11:20 schreef Martijn Grendelman:
Op 18-7-2019 om 10:16 schreef Martijn Grendelman:
Hi,
For the first time in many months I have run into some trouble with oVirt (4.3.4.3) and I need some help.
Yesterday, I noticed one of my iSCSI storage domains was almost full, and tried to move a disk image off of it, to another domain. This failed, and somewhere in the process, the whole storage domain went to status 'Inactive'.
From engine.log:
2019-07-17 16:30:35,319+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-1836383) [] starting processDomainRecovery for domain '875847b6-29a4-4419-be92-9315f4435429:HQST0_ISCSI02'.
2019-07-17 16:30:35,337+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-1836383) [] Domain '875847b6-29a4-4419-be92-9315f4435429:HQST0_ISCSI02' was reported by all hosts in status UP as problematic. Moving the domain to NonOperational.
2019-07-17 16:30:35,410+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1836383) [5f6fd35e] EVENT_ID: SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), Storage Domain HQST0_ISCSI02 (Data Center ISAAC01) was deactivated by system because it's not visible by any of the hosts.
The thing is, the domain is still functional on all my hosts. It carries over 50 disks, and all involved VMs are up and running, and don't seem to have any problems. Also, 'iscsiadm' on all hosts seems to indiciate that everything is fine with this specific target and reading from the device with dd, or getting its size with 'blockdev' all works without issue.
When I try to reactivate the domain, these errors are logged:
2019-07-18 09:34:53,631+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-43475) [79e386e] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command ActivateStorageDomainVDS failed: Storage domain does not exist: (u'875847b6-29a4-4419-be92-9315f4435429',)
2019-07-18 09:34:53,631+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engine-Thread-43475) [79e386e] IrsBroker::Failed::ActivateStorageDomainVDS: IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Storage domain does not exist: (u'875847b6-29a4-4419-be92-9315f4435429',), code = 358
2019-07-18 09:34:53,648+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-43475) [79e386e] EVENT_ID: USER_ACTIVATE_STORAGE_DOMAIN_FAILED(967), Failed to activate Storage Domain HQST0_ISCSI02 (Data Center ISAAC01) by martijn@-authz
On the SPM host, there are errors that indicate problems with the LVM volume group:
2019-07-18 09:34:50,462+0200 INFO (jsonrpc/2) [vdsm.api] START activateStorageDomain(sdUUID=u'875847b6-29a4-4419-be92-9315f4435429', spUUID=u'aefd5844-6e01-4070-b3b9-c0d73cc40c78', options=None) from=::ffff:172.17.1.140,56570, flow_id=197dadec, task_id=51107845-d80b-47f4-aed8-345aaa49f0f8 (api:48)
2019-07-18 09:34:50,464+0200 INFO (jsonrpc/2) [storage.StoragePool] sdUUID=875847b6-29a4-4419-be92-9315f4435429 spUUID=aefd5844-6e01-4070-b3b9-c0d73cc40c78 (sp:1125)
2019-07-18 09:34:50,629+0200 WARN (jsonrpc/2) [storage.LVM] Reloading VGs failed (vgs=[u'875847b6-29a4-4419-be92-9315f4435429'] rc=5 out=[] err=[' /dev/mapper/23536316636393463: Checksum error at offset 2748693688832', " Couldn't read volume group metadata from /dev/mapper/23536316636393463.", ' Metadata location on /dev/mapper/23536316636393463 at 2748693688832 has invalid summary for VG.', ' Failed to read metadata summary from /dev/mapper/23536316636393463', ' Failed to scan VG from /dev/mapper/23536316636393463', ' Volume group "875847b6-29a4-4419-be92-9315f4435429" not found', ' Cannot process volume group 875847b6-29a4-4419-be92-9315f4435429']) (lvm:442)
2019-07-18 09:34:50,629+0200 INFO (jsonrpc/2) [vdsm.api] FINISH activateStorageDomain error=Storage domain does not exist: (u'875847b6-29a4-4419-be92-9315f4435429',) from=::ffff:172.17.1.140,56570, flow_id=197dadec, task_id=51107845-d80b-47f4-aed8-345aaa49f0f8 (api:52)
2019-07-18 09:34:50,629+0200 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='51107845-d80b-47f4-aed8-345aaa49f0f8') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
return fn(*args, **kargs)
File "<string>", line 2, in activateStorageDomain
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1262, in activateStorageDomain
pool.activateSD(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1127, in activateSD
dom = sdCache.produce(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
domain.getRealDomain()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
return findMethod(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1807, in findDomain
return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1665, in findDomainPath
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'875847b6-29a4-4419-be92-9315f4435429',)
2019-07-18 09:34:50,629+0200 INFO (jsonrpc/2) [storage.TaskManager.Task] (Task='51107845-d80b-47f4-aed8-345aaa49f0f8') aborting: Task is aborted: "Storage domain does not exist: (u'875847b6-29a4-4419-be92-9315f4435429',)" - code 358 (task:1181)
2019-07-18 09:34:50,629+0200 ERROR (jsonrpc/2) [storage.Dispatcher] FINISH activateStorageDomain error=Storage domain does not exist: (u'875847b6-29a4-4419-be92-9315f4435429',) (dispatcher:83)
I need help getting this storage domain back online. Can anyone here help me? If you need any additional information, please let me know!
It appears the VG metadata is corrupt:
/dev/mapper/23536316636393463: Checksum error at offset 2748693688832
Couldn't read volume group metadata from /dev/mapper/23536316636393463.
Metadata location on /dev/mapper/23536316636393463 at 2748693688832 has invalid summary for VG.
Failed to read metadata summary from /dev/mapper/23536316636393463
Failed to scan VG from /dev/mapper/23536316636393463
Is this fixable? If so, how?
So, I have found some information online, that suggests that PV metadata can be fixed by recreating the PV label using the correct PVID and a backup of the LVM metadata, like so:
pvcreate -u <pv_uuid> --restorefile <lvm_metadata_backup> /dev/mapper/23536316636393463
Now I have the following two files:
* An LVM metadata backup of yesterday 10:35, about 6 hours before the problem occurred.
* The actual metadata found on the PV at offset 2748693688832 (obtained with hexedit on the block device).
These are largely the same, but there are differences:
* seqno = 1854 in the backup and 1865 in the actual metadata.
* 3 logical volumes that are not present in the backup, but are in the actual metadata. I suspect that these are related to snapshots that were created for live storage migration, but I am not sure. In any case, I did NOT create any new disk images on this domain, so that can't be it.
Now, support I wanted to try the 'pvcreate' route, then:
* what would be the chances of success? Is this procedure at all advisable, or is there an alternative?
* which restore file (1854 or 1865) should I use for the restore?
* can I do this while the VG is in use? I tried running the command without --force, and it said 'Can't open /dev/mapper/23536316636393463 exclusively. Mounted filesystem?'. I didn't dare to try it with '--force'.
I could really use some advice on how to proceed. This are about 36 VMs that have one or more disks on this domain. I could bring them down, although doing so for extended amounts of time would be problematic. I want to be careful, obviously, especially since the actual storage doesn't seem to be impacted at this time. The VMs are all still running without issue, and if I'm about to embark on a dangerous journey that could cause data loss, I need a contingency / recovery plan.
Hoping someone can help...
Best regards,
Martijn Grendelman
_______________________________________________
Users mailing list -- users(a)ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to users-leave(a)ovirt.org<mailto:users-leave@ovirt.org>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKWOTMY6MKD...
--
Met vriendelijke groet,
Kind regards,
[Martijn]<mailto:martijn.grendelman@isaac.nl>
Martijn Grendelman Infrastructure Architect
T: +31 (0)40 264 94 44
[ISAAC]<https://www.isaac.nl>
ISAAC Marconilaan 16 5621 AA Eindhoven The Netherlands
T: +31 (0)40 290 89 79 www.isaac.nl<https://www.isaac.nl>
[ISAAC #1 Again!]<https://www.isaac.nl/nl/over-ons/nieuws/isaac-news/ISAAC-voor-tweede-keer...>
Dit e-mail bericht is alleen bestemd voor de geadresseerde(n). Indien dit bericht niet voor u is bedoeld wordt u verzocht de afzender hiervan op de hoogte te stellen door het bericht te retourneren en de inhoud niet te gebruiken. Aan dit bericht kunnen geen rechten worden ontleend.
--
Met vriendelijke groet,
Kind regards,
[Martijn]<mailto:martijn.grendelman@isaac.nl>
Martijn Grendelman Infrastructure Architect
T: +31 (0)40 264 94 44
[ISAAC]<https://www.isaac.nl>
ISAAC Marconilaan 16 5621 AA Eindhoven The Netherlands
T: +31 (0)40 290 89 79 www.isaac.nl<https://www.isaac.nl>
[ISAAC #1 Again!]<https://www.isaac.nl/nl/over-ons/nieuws/isaac-news/ISAAC-voor-tweede-keer...>
Dit e-mail bericht is alleen bestemd voor de geadresseerde(n). Indien dit bericht niet voor u is bedoeld wordt u verzocht de afzender hiervan op de hoogte te stellen door het bericht te retourneren en de inhoud niet te gebruiken. Aan dit bericht kunnen geen rechten worden ontleend.
5 years, 4 months