I suppose I just keep striking out on recent oVirt updates. Today (22 June 2019, all
previous updates have been installed successfully) I saw that an update was available for
my Enterprise Linux Host (main oVirt engine, CentOS 7 x64), and I attempted to update it.
The update failed, which had never happened before. I logged in via SSH and saw that the
updates had been halted due to "yum versionlock." I cleared versionlock and
proceeded with the update, which appeared to work successfully. I rebooted the system,
which came up a-ok. However, I can no longer reach the Administration Portal page. The
browser only hangs. I see the following:
- type in the IP address to the server and get the https://<ip
address>/ovirt-engine/sso/oauth/authorize page, which tells me:
"The FQDN used to access the system is not a valid engine FQDN. You must access the
system using the engine FQDN or one of the engine alternate FQDNs.
Click here to continue."
When I click the provided link, I get the same hanging behavior and it never loads the
login page.
- I was able to connect via Cockpit to https://<IP Address>:9090 and log in
successfully as root after SSH'ing in and restarting the engine. There are no major
issues displayed, and I was able to create a Diagnostic Report. Under Hostname > oVirt
Machines, it will actually redirect me to the login page at
https://fqdn/ovirt-engine/sso/login.html. The page will actually load after the redirect,
but when I enter my admin@internal credentials it just hangs and spins.
When I return to "Virtual Machines" in Cockpit, I have
Host/Cluster/Templates/VDSM options, I see "oVirt login in progress" with a
continually spinning circle, never actually able to authenticate.
In /var/log/ovirt-engine/ui.log, I see:
2019-06-22 19:08:02,533-04 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-4) []
Permutation name: C92E6928986552EDD0E1C99CDC0CC8AB
2019-06-22 19:08:02,533-04 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-4) []
Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot
read property 'kh' of null
at
org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider.$lambda$4(AsyncDataProvider.java:387)
at
org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider$lambda$4$Type.executed(AsyncDataProvider.java:387)
at org.ovirt.engine.ui.frontend.Frontend$2.$onFailure(Frontend.java:329)
[frontend.jar:]
at org.ovirt.engine.ui.frontend.Frontend$2.onFailure(Frontend.java:329)
[frontend.jar:]
at
org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.$onFailure(OperationProcessor.java:184)
[frontend.jar:]
at
org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.onFailure(OperationProcessor.java:184)
[frontend.jar:]
at
org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider.$handleMultipleQueriesFailure(GWTRPCCommunicationProvider.java:305)
[frontend.jar:]
at
org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$5$1.onFailure(GWTRPCCommunicationProvider.java:263)
[frontend.jar:]
at
com.google.gwt.user.client.rpc.impl.RequestCallbackAdapter.onResponseReceived(RequestCallbackAdapter.java:198)
[gwt-servlet.jar:]
at com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:233)
[gwt-servlet.jar:]
at
com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409)
[gwt-servlet.jar:]
at Unknown.eval(webadmin-0.js)
at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236) [gwt-servlet.jar:]
at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:]
at Unknown.eval(webadmin-0.js)
In engine.log, I see some update related errors:
2019-06-22 18:46:54,053-04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-22) [] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt2. command HSMGetAllTasksStatusesVDS failed:
Not SPM
2019-06-22 18:46:54,054-04 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-22) [] Command
'HSMGetAllTasksStatusesVDSCommand(HostName = ovirt2.,
VdsIdVDSCommandParametersBase:{hostId='2f1da67f-66f7-40d6-81c4-dde82d6a6dd6'})'
execution failed: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not
SPM
2019-06-22 19:01:18,875-04 ERROR [org.ovirt.engine.core.bll.host.HostUpgradeManager]
(EE-ManagedThreadFactory-hostUpdatesChecker-Thread-2) [] Failed to run check-update of
host 'ovirt1.'. Error: fatal: [ovirt1.]: FAILED! => {"changed":
false, "msg": "yum lockfile is held by another process"}
2019-06-22 19:01:18,875-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpdatesChecker]
(EE-ManagedThreadFactory-hostUpdatesChecker-Thread-2) [] Failed to check if updates are
available for host 'ovirt1.' with error message 'Failed to run check-update of
host 'ovirt1.'. Error: fatal: [ovirt1.]: FAILED! => {"changed":
false, "msg": "yum lockfile is held by another process"}
2019-06-22 19:01:18,881-04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-hostUpdatesChecker-Thread-2) [] EVENT_ID:
HOST_AVAILABLE_UPDATES_FAILED(839), Failed to check for available updates on host ovirt1.
with message 'Failed to run check-update of host 'ovirt1.'. Error: fatal:
[ovirt1.]: FAILED! => {"changed": false, "msg": "yum lockfile
is held by another process"}
2019-06-22 19:08:04,848-04 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-2) [6a9c283] Command
'org.ovirt.engine.core.bll.SshHostRebootCommand' failed: ManagedThreadFactory is
stopped
2019-06-22 19:08:04,848-04 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-2) [6a9c283] Exception:
java.lang.IllegalStateException: ManagedThreadFactory is stopped
2019-06-22 19:08:04,857-04 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-2) [6a9c283] Error during log command:
org.ovirt.engine.core.bll.SshHostRebootCommand. Exception PreparedStatementCallback;
uncategorized SQLException for SQL [select * from getclusterbyclusterid(?, ?, ?)]; SQL
state [null]; error code [0]; IJ031013: Interrupted attempting lock:
org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@7b3be2cd; nested exception is
java.sql.SQLException: IJ031013: Interrupted attempting lock:
org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@7b3be2cd
2019-06-22 19:08:04,858-04 ERROR [org.ovirt.engine.core.bll.job.ExecutionHandler]
(EE-ManagedThreadFactory-commandCoordinator-Thread-2) [6a9c283] Exception:
org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback;
uncategorized SQLException for SQL [select * from checkifjobhastasks(?)]; SQL state
[null]; error code [0]; IJ031013: Interrupted attempting lock:
org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@7b3be2cd; nested exception is
java.sql.SQLException: IJ031013: Interrupted attempting lock:
org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@7b3be2cd
2019-06-22 19:08:04,869-04 ERROR [org.ovirt.engine.core.bll.job.ExecutionHandler]
(EE-ManagedThreadFactory-commandCoordinator-Thread-2) [6a9c283] Failed to end Job
'376636f1-29cb-4bb7-adc4-1c8470ec1070', 'SshHostReboot': Failed to obtain
JDBC Connection; nested exception is java.sql.SQLException:
javax.resource.ResourceException: IJ000457: Unchecked throwable in
managedConnectionReconnected()
cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@4a38ab13[state=NORMAL
managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@7b3be2cd
connection handles=0 lastReturned=1561244884865 lastValidated=1561244613838
lastCheckedOut=1561244884865 trackByTx=false
pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@250efd73
mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@440e7f35[pool=ENGINEDataSource]
xaResource=LocalXAResourceImpl@41bc50e8[connectionListener=4a38ab13
connectionManager=49ec2180 warned=false currentXid=null productName=Postgre
SQL productVersion=10.6 jndiName=java:/ENGINEDataSource] txSync=null]
2019-06-22 19:08:05,220-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.UpgradeHostInternalCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-2) [6a9c283] Engine failed to restart
via ssh host 'ovirt1.' ('33138c5f-9e48-4426-b9e7-ae148ae16387') after
upgrade
2019-06-22 19:08:06,245-04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[49770eb6-43f1-4048-af2c-a7ebadf282bb] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to
upgrade Host ovirt1. (User: admin@internal-authz).
2019-06-22 19:10:58,333-04 ERROR [org.ovirt.engine.core.bll.Backend] (ServerService Thread
Pool -- 52) [] Error during initialization: javax.ejb.EJBException:
java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance
2019-06-22 19:24:36,687-04 ERROR [org.ovirt.engine.core.bll.Backend] (ServerService Thread
Pool -- 42) [] Error during initialization: javax.ejb.EJBException:
java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance
2019-06-22 21:14:46,279-04 ERROR [org.ovirt.engine.core.bll.Backend] (ServerService Thread
Pool -- 55) [] Error during initialization: javax.ejb.EJBException:
java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance
It appears that a lot of the system is still running, but I just can't reach the admin
page. Is there any way to roll this back or troubleshoot?