[ovirt-users] WebAdmin is down, HostedEngine is up

Logan Kuhn support at jac-properties.com
Mon Mar 20 15:39:17 UTC 2017


So that sounds like the host isn't able to communicate properly with the
HEVM.  The cluster is still in global maintenance, but the HEVM still
thinks that it isn't because the database says it isn't in global
maintenance.

Logan

On Mon, Mar 20, 2017 at 10:34 AM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

>
>
> On Mon, Mar 20, 2017 at 4:28 PM, Logan Kuhn <support at jac-properties.com>
> wrote:
>
>> Yup, ovirttest1 ran out of disk space on Friday, we recovered it and
>> everything seemed completely normal.
>>
>> the postgres service is down on the HEVM, but that is because it's on our
>> postgresql cluster, has been for weeks.  I can connect to it's database
>> from within the HEVM using the credentials stored at /etc/ovirt-engine/
>> engine.conf.d/10-setup-database.conf  I can tail the logs on the
>> postgres master and ovirt can and does connect to it.
>>
>> However, trying from ovirttest1 I cannot connect to the engine database
>> using those same credentails, should I be able to?  It'd make sense to be
>> able to connect to it....
>>
>
> It could depend on how you configured your pg_hba.conf for your DBMS
> instance.
> Only the engine and dwh have to connect to the engine VM, a direct DB
> connection from the hosts is not required.
>
>
>>
>> Logan
>>
>> On Mon, Mar 20, 2017 at 10:14 AM, Alexander Wels <awels at redhat.com>
>> wrote:
>>
>>> On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote:
>>> > Starting at 1:09am on Saturday the Hosted Engine has been rebooting
>>> because
>>> > it failed it's liveliness check.  This is due to the webadmin not
>>> loading.
>>> > Nothing changed as far as I can tell on the engine since it's last
>>> > successful reboot on Friday afternoon.
>>> >
>>> > The engine, dwhd and httpd are all up and do not seem to be reporting
>>> > anything unusual in their respective logs.  The engine can talk to the
>>> > database as I can login using the credentials in /etc/ovirt-engine/
>>> engine.co
>>> > nf.d/10-setup-database.conf and the logs on the postgres server are
>>> showing
>>> > activity.
>>> >
>>> > I tried to run engine-setup but it says it's not in global maintenance
>>> even
>>> > though the hosted engine hosts agree that it is.  We are on version
>>> 4.0.6.3
>>> >
>>> > Server, engine and agent logs are attached
>>> >
>>> > Regards,
>>> > Logan
>>>
>>> Looking at our logs, it appears that on Friday one of your hosts ran out
>>> of
>>> disk space in its logs or temp directory. At which point connectivity
>>> started
>>> to be spotty. I see a bunch of attempts to migrate VMs away from that
>>> host
>>> (ovirttest1). All of them fail. That repeats a ton of times, I forwarded
>>> to
>>> Saturday where it appears you had a bunch of stale locks which also
>>> repeates a
>>> bunch of time until the engine VM gets restarted.
>>>
>>> Then I see nothing but restarts of the engine and no apparent errors in
>>> the
>>> engine log.
>>>
>>> The server log does however reveal this:
>>> 2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger]
>>> (QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred while
>>> scanning for the next triggers to fire.: org.quartz.JobPersistenceExcep
>>> tion:
>>> Failed to obtain DB connection from data source 'NMEngineDS':
>>> java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/
>>> ENGINEDataSourceNoJTA' java.sql.SQLException:
>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>> connection
>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA [See nested
>>> exception: java.sql.SQLException: Could not retrieve datasource via JNDI
>>> url
>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>> connection
>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA]
>>>         at
>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>> ection(JobStoreCMT.java:
>>> 168) [quartz.jar:]
>>>         at
>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonMan
>>> agedTXLock(JobStoreSupport.java:
>>> 3807) [quartz.jar:]
>>>         at
>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrig
>>> gers(JobStoreSupport.java:
>>> 2751) [quartz.jar:]
>>>         at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThr
>>> ead.java:
>>> 264) [quartz.jar:]
>>> Caused by: java.sql.SQLException: Could not retrieve datasource via JNDI
>>> url
>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>> connection
>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA
>>>         at
>>> org.quartz.utils.JNDIConnectionProvider.getConnection(JNDICo
>>> nnectionProvider.java:
>>> 163) [quartz.jar:]
>>>         at
>>> org.quartz.utils.DBConnectionManager.getConnection(DBConnect
>>> ionManager.java:
>>> 108) [quartz.jar:]
>>>         at
>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>> ection(JobStoreCMT.java:
>>> 165) [quartz.jar:]
>>>         ... 3 more
>>>
>>> Is your postgresql service running? That is the most likely source of the
>>> engine not coming up.
>>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170320/cabf6bfc/attachment-0001.html>


More information about the Users mailing list