Re: Parent checkpoint ID does not match the actual leaf checkpoint
by Nir Soffer
On Sun, Jul 19, 2020 at 5:38 PM Łukasz Kołaciński <l.kolacinski(a)storware.eu>
wrote:
> Hello,
> Thanks to previous answers, I was able to make backups. Unfortunately, we
> had some infrastructure issues and after the host reboots new problems
> appeared. I am not able to do any backup using the commands that worked
> yesterday. I looked through the logs and there is something like this:
>
> 2020-07-17 15:06:30,644+02 ERROR
> [org.ovirt.engine.core.bll.StartVmBackupCommand]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54)
> [944a1447-4ea5-4a1c-b971-0bc612b6e45e] Failed to execute VM backup
> operation 'StartVmBackup': {}:
> org.ovirt.engine.core.common.errors.EngineException: EngineException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException:
> VDSGenericException: VDSErrorException: Failed to StartVmBackupVDS, error =
> Checkpoint Error: {'parent_checkpoint_id': None, 'leaf_checkpoint_id':
> 'cd078706-84c0-4370-a6ec-654ccd6a21aa', 'vm_id':
> '116aa6eb-31a1-43db-9b1e-ad6e32fb9260', 'reason': '*Parent checkpoint ID
> does not match the actual leaf checkpoint*'}, code = 1610 (Failed with
> error unexpected and code 16)
>
>
It looks like engine sent:
parent_checkpoint_id: None
This issue was fix in engine few weeks ago.
Which engine and vdsm versions are you testing?
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2114)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.StartVmBackupCommand.performVmBackupOperation(StartVmBackupCommand.java:368)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.StartVmBackupCommand.runVmBackup(StartVmBackupCommand.java:225)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.StartVmBackupCommand.performNextOperation(StartVmBackupCommand.java:199)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:80)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175)
> at
> deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
> at
> org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
> at
> org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> at
> org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)
>
>
> And the last error is:
>
> 2020-07-17 15:13:45,835+02 ERROR
> [org.ovirt.engine.core.bll.StartVmBackupCommand]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14)
> [f553c1f2-1c99-4118-9365-ba6b862da936] Failed to execute VM backup
> operation 'GetVmBackupInfo': {}:
> org.ovirt.engine.core.common.errors.EngineException: EngineException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException:
> VDSGenericException: VDSErrorException: Failed to GetVmBackupInfoVDS, error
> = No such backup Error: {'vm_id': '116aa6eb-31a1-43db-9b1e-ad6e32fb9260',
> 'backup_id': 'bf1c26f7-c3e5-437c-bb5a-255b8c1b3b73', 'reason': '*VM
> backup not exists: Domain backup job id not found: no domain backup job
> present'*}, code = 1601 (Failed with error unexpected and code 16)
>
>
This is likely a result of the first error. If starting backup failed the
backup entity
is deleted.
> (these errors are from full backup)
>
> Like I said this is very strange because everything was working correctly.
>
>
> Regards
>
> Łukasz Kołaciński
>
> Junior Java Developer
>
> e-mail: l.kolacinski(a)storware.eu
> <m.helbert(a)storware.eu>
>
>
>
>
> *[image: STORWARE]* <http://www.storware.eu/>
>
>
>
> *ul. Leszno 8/44 01-192 Warszawa www.storware.eu
> <https://www.storware.eu/>*
>
> *[image: facebook]* <https://www.facebook.com/storware>
>
> *[image: twitter]* <https://twitter.com/storware>
>
> *[image: linkedin]* <https://www.linkedin.com/company/storware>
>
> *[image: Storware_Stopka_09]*
> <https://www.youtube.com/channel/UCKvLitYPyAplBctXibFWrkw>
>
>
>
> *Storware Spółka z o.o. nr wpisu do ewidencji KRS dla M.St. Warszawa
> 000510131* *, NIP 5213672602.** Wiadomość ta jest przeznaczona jedynie
> dla osoby lub podmiotu, który jest jej adresatem i może zawierać poufne
> i/lub uprzywilejowane informacje. Zakazane jest jakiekolwiek przeglądanie,
> przesyłanie, rozpowszechnianie lub inne wykorzystanie tych informacji lub
> podjęcie jakichkolwiek działań odnośnie tych informacji przez osoby lub
> podmioty inne niż zamierzony adresat. Jeżeli Państwo otrzymali przez
> pomyłkę tę informację prosimy o poinformowanie o tym nadawcy i usunięcie
> tej wiadomości z wszelkich komputerów. **This message is intended only
> for the person or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited. If you have received this message in error, please contact
> the sender and remove the material from all of your computer systems.*
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/S3PLYPOZGT6...
>
3 years, 5 months
Gluster volumes not healing (perhaps after host maintenance?)
by David White
I discovered that the servers I purchased did not come with 10Gbps network cards, like I thought they did. So my storage network has been running on a 1Gbps connection for the past week, since I deployed the servers into the datacenter a little over a week ago. I purchased 10Gbps cards, and put one of my hosts into maintenance mode yesterday, prior to replacing the daughter card. It is now back online running fine on the 10Gbps card.
All VMs seem to be working, even when I migrate them onto cha2, which is the host I did maintenance on yesterday morning.
The other two hosts are still running on the 1Gbps connection, but I plan to do maintenance on them next week.
The oVirt manager shows that all 3 hosts are up, and that all of my volumes - and all of my bricks - are up. However, every time I look at the storage, it appears that the self-heal info for 1 of the volumes is 10 minutes, and the self-heal info for another volume is 50+ minutes.
This morning is the first time in the last couple of days that I've paid close attention to the numbers, but I don't see them going down.
When I log into each of the hosts, I do see everything is connected in gluster.
It is interesting to me, in this particular case, though that gluster on cha3 notices the hostname of 10.1.0.10 to be the IP address, and not the hostname (cha1).
The host that I did the maintenance on is cha2.
[root@cha3-storage dwhite]# gluster peer statusNumber of Peers: 2Hostname: 10.1.0.10Uuid: 87a4f344-321a-48b9-adfb-e3d2b56b8e7bState: Peer in Cluster (Connected)Hostname: cha2-storage.mgt.barredowlweb.comUuid: 93e12dee-c37d-43aa-a9e9-f4740b9cab14State: Peer in Cluster (Connected)
When I run `gluster volume heal data`, I see the following:
[root@cha3-storage dwhite]# gluster volume heal data
Launching heal operation to perform index self heal on volume data has been unsuccessful:
Commit failed on cha2-storage.mgt.barredowlweb.com. Please check log file for details.
I get the same results if I run the command on cha2, for any volume:
[root@cha2-storage dwhite]# gluster volume heal data
Launching heal operation to perform index self heal on volume data has been unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details.
[root@cha2-storage dwhite]# gluster volume heal vmstore
Launching heal operation to perform index self heal on volume vmstore has been unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details.
I see a lot of stuff like this on cha2 /var/log/glusterfs/glustershd.log:
[2021-04-24 11:33:01.319888] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 2-engine-client-0: changing port to 49153 (from 0)[2021-04-24 11:33:01.329463] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 2-engine-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}][2021-04-24 11:33:01.330075] W [MSGID: 114043] [client-handshake.c:727:client_setvolume_cbk] 2-engine-client-0: failed to set the volume [{errno=2}, {error=No such file or directory}][2021-04-24 11:33:01.330116] W [MSGID: 114007] [client-handshake.c:752:client_setvolume_cbk] 2-engine-client-0: failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid argument}][2021-04-24 11:33:01.330140] E [MSGID: 114044] [client-handshake.c:757:client_setvolume_cbk] 2-engine-client-0: SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2}, {error=No such file or directory}][2021-04-24 11:33:01.330155] I [MSGID: 114051] [client-handshake.c:879:client_setvolume_cbk] 2-engine-client-0: sending CHILD_CONNECTING event [][2021-04-24 11:33:01.640480] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 3-vmstore-client-0: changing port to 49154 (from 0)The message "W [MSGID: 114007] [client-handshake.c:752:client_setvolume_cbk] 3-vmstore-client-0: failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid argument}]" repeated 4 times between [2021-04-24 11:32:49.602164] and [2021-04-24 11:33:01.649850][2021-04-24 11:33:01.649867] E [MSGID: 114044] [client-handshake.c:757:client_setvolume_cbk] 3-vmstore-client-0: SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2}, {error=No such file or directory}][2021-04-24 11:33:01.649969] I [MSGID: 114051] [client-handshake.c:879:client_setvolume_cbk] 3-vmstore-client-0: sending CHILD_CONNECTING event [][2021-04-24 11:33:01.650095] I [MSGID: 114018] [client.c:2225:client_rpc_notify] 3-vmstore-client-0: disconnected from client, process will keep trying to connect glusterd until brick's port is available [{conn-name=vmstore-client-0}]
How do I further troubleshoot?
Sent with ProtonMail Secure Email.
3 years, 6 months
poweroff and reboot with ovirt_vm ansible module
by Nathanaël Blanchet
Hello, is there a way to poweroff or reboot (without stopped and running
state) a vm with the ovirt_vm ansible module?
--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
3 years, 6 months
[OLVM] Host non responsive after installation
by alan@softdrive.co
I am using Oracle Linux Virtualization Manager, following this guide: https://docs.oracle.com/en/virtualization/oracle-linux-virtualization-man...
After adding a host to the engine, the host becomes non responsive due to network errors:
engine.log
2021-04-27 14:53:02,255Z ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-32356) [38586e0e] Host installation failed for host 'c97604b3-5774-4260-92fd-633257aa7498', 'GPU2-2': Network error during communication with the host
Help resolving this would be much appreciated!
3 years, 6 months
Something broke & took down multiple VMs for ~20 minutes
by David White
As the subject suggestions, something in oVirt HCI broke. I have no idea what, and it recovered on its own after about 20 minutes or so.
I believe that the issue was limited to a single host (although I don't know that for sure), as we had two VMs go completely unresponsive, but a 3rd VM remained operational. For a while during the outage, I was able to log into the oVirt admin web portal, and I noticed at least 1-2 of my hosts (I have 3 hosts) showed the problematic VMs as being problematic inside of oVirt.
Reviewing the oVirt Events, I see that this basically started right when the ETL Service Started. There were no events before that point since yesterday, but right when the ETL Service started, it seems like all hell broke loose.
oVirt detected "No faulty multipaths" on any of the hosts, but then very quickly started indicating that hosts, vms, and storage targets were unavailable. See my screenshot below.
Around 30 - 35 minutes later, it appears that the Hosted Engine terminated due to a storage issue, and auto recovered on a different host. There's a 2nd screenshot beneath the first.
Everything came back up shortly before 9am, and has been stable since.
In fact, the Volume replication issues that I saw in my environment after I performed maintenance on 1 of my hosts on Friday are no longer present. It appears that the Hosted Engine sees the storage as being perfectly healthy.
How do I even begin to figure out what happened, and try to prevent it from happening again?
[Screenshot from 2021-04-26 16-36-47.png]
[Screenshot from 2021-04-26 16-44-08.png]
Sent with ProtonMail Secure Email.
3 years, 6 months
pool list vm assign user
by Dominique D
Is there a way to know how to see who the vm of a pool assigned to?
I am able on the portal to see those who are "logged-in user" but the others VM I don't know to whom they are assigned.
3 years, 6 months
oVirt 2021 Spring survey questions
by Sandro Bonazzola
Hi,
it's about the usual time of the year when we ask the community to provide
feedback with a survey.
Any questions you'd like to be asked?
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
3 years, 6 months
error id 980
by ozmen62@hotmail.com
Hi,
First of all, thanks for this great list and the people who try to help eachother.
This time my problem is about engine OVF file or engine storage setting.
How can i came with this idea, because every 2-3 hours engine try to migrate other host(From host1 to host2)
"Invalid status on Data Center B300. Setting status to Non Responsive."
when i check hosts log , see these
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
ovirt-ha-agent ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore ERROR Unable to extract HEVM OVF
i believe there is some kind of fc storage path or config problem. My storage setting is good. there is no error about them
Is there anyone who experienced about that to solve it
thanks
3 years, 6 months
Cluster level 4.5
by Don Dupuis
What version of libvirt is required for a host to be put in this cluster
level? I am using CentOS 8.3 and cpu is Cascade Lake Server. It says that
my host is only compatible with cluster version 4.2,4.3 and 4.4. I am doing
a new install of Ovirt 4.4.5. I have tried to update libvirt version but
have run into issues. Currently installed libvirt
is libvirt-6.0.0-28.module_el8.3.0+555+a55c8938.x86_64.
Don
3 years, 6 months