certs expired, now engine won't start
I seem to have gotten myself into a bit of a pickle. My ovirt host certs (4-node 4.4.1 cluster) expired recently and I didn't notice until one of my hosts got fenced on Friday. I was finally able to get it back online by rebooting, telling the ovirt-engine the host had been rebooted (so that it didn't think VMs were running on it), and then going into maintenance mode and 'enroll certificates' to renew the certs. This worked with the second host as well. But I was unable to put the third host into maintenance mode because the engine claimed "Another power management action is already in progress". Sure enough, the task list showed the engine trying to take action against this host. It was stuck in that state for over an hour, so I finally poked the ovirt-engine database directly to fail these tasks: psql engine -c "UPDATE job SET status = 'FAILED', end_time = NOW() WHERE status NOT IN ('FINISHED', 'FAILED');" psql engine -c "UPDATE step SET status = 'FAILED', end_time = NOW() WHERE status NOT IN ('FINISHED', 'FAILED');" This got the tasks to clear, but the engine UI still claimed that the stuck host still had "Another power management action is already in progress". Then I made the fatal mistake of rebooting the ovirt-engine (which was still running on the fourth host with expired certs). Now my engine won't start, even when I try to start it on one of the host that had certs renewed. The libvirt logs indicate that it's because the vnc cert is expired: qemu-kvm: -object tls-creds-x509,id=vnc-tls-creds0,dir=/etc/pki/vdsm/libvirt-vnc,endpoint=server,verify-peer=no: The server certificate /etc/pki/vdsm/libvirt-vnc/server-cert.pem has expired Since I don't care about vnc access to the engine, is there a way to get the engine running again without vnc? Alternately, is there a way that I can manually get some of the more critical VMs running without an engine running? FWIW, I tried searching the list archives, but I get a 500 server error when trying to go to lists.ovirt.org. --Mike
I went through an issue like that on an older version. I documented what I did to fix it at https://docs.google.com/document/d/1_sBabK1zTucY_oFdquT2qjE46-iNVfN09PyLXFLI.... I put that together after a lot of back and forth so I hope it is accurate and it helps. [image: CPR Image] <http://www.cpr.org> *Christopher Cross* *Linux Systems Network Administrator* Colorado Public Radio Bridges Broadcast Center 7409 S. Alton Ct. | Centennial, CO 80112 (303) 871-9191, ext. 4258 Let's connect: Facebook <https://www.facebook.com/coloradopublicradio> | Twitter <https://twitter.com/copublicradio> www.cpr.org On Mon, Nov 17, 2025 at 2:27 AM Michael Thomas <wart@caltech.edu> wrote:
I seem to have gotten myself into a bit of a pickle. My ovirt host certs (4-node 4.4.1 cluster) expired recently and I didn't notice until one of my hosts got fenced on Friday. I was finally able to get it back online by rebooting, telling the ovirt-engine the host had been rebooted (so that it didn't think VMs were running on it), and then going into maintenance mode and 'enroll certificates' to renew the certs. This worked with the second host as well. But I was unable to put the third host into maintenance mode because the engine claimed "Another power management action is already in progress". Sure enough, the task list showed the engine trying to take action against this host. It was stuck in that state for over an hour, so I finally poked the ovirt-engine database directly to fail these tasks:
psql engine -c "UPDATE job SET status = 'FAILED', end_time = NOW() WHERE status NOT IN ('FINISHED', 'FAILED');" psql engine -c "UPDATE step SET status = 'FAILED', end_time = NOW() WHERE status NOT IN ('FINISHED', 'FAILED');"
This got the tasks to clear, but the engine UI still claimed that the stuck host still had "Another power management action is already in progress". Then I made the fatal mistake of rebooting the ovirt-engine (which was still running on the fourth host with expired certs).
Now my engine won't start, even when I try to start it on one of the host that had certs renewed. The libvirt logs indicate that it's because the vnc cert is expired:
qemu-kvm: -object tls-creds-x509,id=vnc-tls-creds0,dir=/etc/pki/vdsm/libvirt-vnc,endpoint=server,verify-peer=no:
The server certificate /etc/pki/vdsm/libvirt-vnc/server-cert.pem has expired
Since I don't care about vnc access to the engine, is there a way to get the engine running again without vnc?
Alternately, is there a way that I can manually get some of the more critical VMs running without an engine running?
FWIW, I tried searching the list archives, but I get a 500 server error when trying to go to lists.ovirt.org.
--Mike _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TXLZTLC45SUAJL...
-- ...
participants (2)
-
Christopher Cross -
Michael Thomas