On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote:
Starting at 1:09am on Saturday the Hosted Engine has been rebooting
because
it failed it's liveliness check. This is due to the webadmin not loading.
Nothing changed as far as I can tell on the engine since it's last
successful reboot on Friday afternoon.
The engine, dwhd and httpd are all up and do not seem to be reporting
anything unusual in their respective logs. The engine can talk to the
database as I can login using the credentials in /etc/ovirt-engine/engine.co
nf.d/10-setup-database.conf and the logs on the postgres server are showing
activity.
I tried to run engine-setup but it says it's not in global maintenance even
though the hosted engine hosts agree that it is. We are on version 4.0.6.3
Server, engine and agent logs are attached
Regards,
Logan
Looking at our logs, it appears that on Friday one of your hosts ran out of
disk space in its logs or temp directory. At which point connectivity started
to be spotty. I see a bunch of attempts to migrate VMs away from that host
(ovirttest1). All of them fail. That repeats a ton of times, I forwarded to
Saturday where it appears you had a bunch of stale locks which also repeates a
bunch of time until the engine VM gets restarted.
Then I see nothing but restarts of the engine and no apparent errors in the
engine log.
The server log does however reveal this:
2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger]
(QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred while
scanning for the next triggers to fire.: org.quartz.JobPersistenceException:
Failed to obtain DB connection from data source 'NMEngineDS':
java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/
ENGINEDataSourceNoJTA' java.sql.SQLException:
javax.resource.ResourceException: IJ000470: You are trying to use a connection
factory that has been shut down: java:/ENGINEDataSourceNoJTA [See nested
exception: java.sql.SQLException: Could not retrieve datasource via JNDI url
'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
javax.resource.ResourceException: IJ000470: You are trying to use a connection
factory that has been shut down: java:/ENGINEDataSourceNoJTA]
at
org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:
168) [quartz.jar:]
at
org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:
3807) [quartz.jar:]
at
org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTriggers(JobStoreSupport.java:
2751) [quartz.jar:]
at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:
264) [quartz.jar:]
Caused by: java.sql.SQLException: Could not retrieve datasource via JNDI url
'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
javax.resource.ResourceException: IJ000470: You are trying to use a connection
factory that has been shut down: java:/ENGINEDataSourceNoJTA
at
org.quartz.utils.JNDIConnectionProvider.getConnection(JNDIConnectionProvider.java:
163) [quartz.jar:]
at
org.quartz.utils.DBConnectionManager.getConnection(DBConnectionManager.java:
108) [quartz.jar:]
at
org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:
165) [quartz.jar:]
... 3 more
Is your postgresql service running? That is the most likely source of the
engine not coming up.