[ovirt-users] WebAdmin is down, HostedEngine is up

Alexander Wels awels at redhat.com
Mon Mar 20 15:39:10 UTC 2017


On Monday, March 20, 2017 11:34:49 AM EDT Simone Tiraboschi wrote:
> On Mon, Mar 20, 2017 at 4:28 PM, Logan Kuhn <support at jac-properties.com>
> 
> wrote:
> > Yup, ovirttest1 ran out of disk space on Friday, we recovered it and
> > everything seemed completely normal.
> > 
> > the postgres service is down on the HEVM, but that is because it's on our
> > postgresql cluster, has been for weeks.  I can connect to it's database
> > from within the HEVM using the credentials stored at /etc/ovirt-engine/
> > engine.conf.d/10-setup-database.conf  I can tail the logs on the postgres
> > master and ovirt can and does connect to it.
> > 
> > However, trying from ovirttest1 I cannot connect to the engine database
> > using those same credentails, should I be able to?  It'd make sense to be
> > able to connect to it....
> 
> It could depend on how you configured your pg_hba.conf for your DBMS
> instance.
> Only the engine and dwh have to connect to the engine VM, a direct DB
> connection from the hosts is not required.
> 

I looked a little closer in the server.log. This in particular stood out:

2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger] 
(QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred while 
scanning for the next triggers to fire.: org.quartz.JobPersistenceException: 
Failed to obtain DB connection from data source 'NMEngineDS': 
java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/
ENGINEDataSourceNoJTA' java.sql.SQLException: 
javax.resource.ResourceException: IJ000470: You are trying to use a connection 
factory that has been shut down: java:/ENGINEDataSourceNoJTA [See nested 
exception: java.sql.SQLException: Could not retrieve datasource via JNDI url 
'java:/ENGINEDataSourceNoJTA' java.sql.SQLException: 
javax.resource.ResourceException: IJ000470: You are trying to use a connection 
factory that has been shut down: java:/ENGINEDataSourceNoJTA]

Grepping the code looks like NMEngineDS has something to do with the 
scheduler. Which is completely out of my realm of knowledge, so I can't help 
there.

> > Logan
> > 
> > On Mon, Mar 20, 2017 at 10:14 AM, Alexander Wels <awels at redhat.com> wrote:
> >> On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote:
> >> > Starting at 1:09am on Saturday the Hosted Engine has been rebooting
> >> 
> >> because
> >> 
> >> > it failed it's liveliness check.  This is due to the webadmin not
> >> 
> >> loading.
> >> 
> >> > Nothing changed as far as I can tell on the engine since it's last
> >> > successful reboot on Friday afternoon.
> >> > 
> >> > The engine, dwhd and httpd are all up and do not seem to be reporting
> >> > anything unusual in their respective logs.  The engine can talk to the
> >> > database as I can login using the credentials in /etc/ovirt-engine/
> >> 
> >> engine.co
> >> 
> >> > nf.d/10-setup-database.conf and the logs on the postgres server are
> >> 
> >> showing
> >> 
> >> > activity.
> >> > 
> >> > I tried to run engine-setup but it says it's not in global maintenance
> >> 
> >> even
> >> 
> >> > though the hosted engine hosts agree that it is.  We are on version
> >> 
> >> 4.0.6.3
> >> 
> >> > Server, engine and agent logs are attached
> >> > 
> >> > Regards,
> >> > Logan
> >> 
> >> Looking at our logs, it appears that on Friday one of your hosts ran out
> >> of
> >> disk space in its logs or temp directory. At which point connectivity
> >> started
> >> to be spotty. I see a bunch of attempts to migrate VMs away from that
> >> host
> >> (ovirttest1). All of them fail. That repeats a ton of times, I forwarded
> >> to
> >> Saturday where it appears you had a bunch of stale locks which also
> >> repeates a
> >> bunch of time until the engine VM gets restarted.
> >> 
> >> Then I see nothing but restarts of the engine and no apparent errors in
> >> the
> >> engine log.
> >> 
> >> The server log does however reveal this:
> >> 2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger]
> >> (QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred while
> >> scanning for the next triggers to fire.: org.quartz.JobPersistenceExcep
> >> tion:
> >> Failed to obtain DB connection from data source 'NMEngineDS':
> >> java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/
> >> ENGINEDataSourceNoJTA' java.sql.SQLException:
> >> javax.resource.ResourceException: IJ000470: You are trying to use a
> >> connection
> >> factory that has been shut down: java:/ENGINEDataSourceNoJTA [See nested
> >> exception: java.sql.SQLException: Could not retrieve datasource via JNDI
> >> url
> >> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
> >> javax.resource.ResourceException: IJ000470: You are trying to use a
> >> connection
> >> factory that has been shut down: java:/ENGINEDataSourceNoJTA]
> >> 
> >>         at
> >> 
> >> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
> >> ection(JobStoreCMT.java:
> >> 168) [quartz.jar:]
> >> 
> >>         at
> >> 
> >> org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonMan
> >> agedTXLock(JobStoreSupport.java:
> >> 3807) [quartz.jar:]
> >> 
> >>         at
> >> 
> >> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrig
> >> gers(JobStoreSupport.java:
> >> 2751) [quartz.jar:]
> >> 
> >>         at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThr
> >> 
> >> ead.java:
> >> 264) [quartz.jar:]
> >> Caused by: java.sql.SQLException: Could not retrieve datasource via JNDI
> >> url
> >> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
> >> javax.resource.ResourceException: IJ000470: You are trying to use a
> >> connection
> >> factory that has been shut down: java:/ENGINEDataSourceNoJTA
> >> 
> >>         at
> >> 
> >> org.quartz.utils.JNDIConnectionProvider.getConnection(JNDICo
> >> nnectionProvider.java:
> >> 163) [quartz.jar:]
> >> 
> >>         at
> >> 
> >> org.quartz.utils.DBConnectionManager.getConnection(DBConnect
> >> ionManager.java:
> >> 108) [quartz.jar:]
> >> 
> >>         at
> >> 
> >> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
> >> ection(JobStoreCMT.java:
> >> 165) [quartz.jar:]
> >> 
> >>         ... 3 more
> >> 
> >> Is your postgresql service running? That is the most likely source of the
> >> engine not coming up.
> > 
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users




More information about the Users mailing list