[ovirt-users] WebAdmin is down, HostedEngine is up
Simone Tiraboschi
stirabos at redhat.com
Mon Mar 20 16:02:01 UTC 2017
On Mon, Mar 20, 2017 at 4:39 PM, Logan Kuhn <support at jac-properties.com>
wrote:
> So that sounds like the host isn't able to communicate properly with the
> HEVM. The cluster is still in global maintenance, but the HEVM still
> thinks that it isn't because the database says it isn't in global
> maintenance.
>
No, it simply means that that the engine is not able to start and, if the
engine doesn't start, nothing else will update the host status in your DB.
>
> Logan
>
>
> On Mon, Mar 20, 2017 at 10:34 AM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Mar 20, 2017 at 4:28 PM, Logan Kuhn <support at jac-properties.com>
>> wrote:
>>
>>> Yup, ovirttest1 ran out of disk space on Friday, we recovered it and
>>> everything seemed completely normal.
>>>
>>> the postgres service is down on the HEVM, but that is because it's on
>>> our postgresql cluster, has been for weeks. I can connect to it's database
>>> from within the HEVM using the credentials stored at /etc/ovirt-engine/
>>> engine.conf.d/10-setup-database.conf I can tail the logs on the
>>> postgres master and ovirt can and does connect to it.
>>>
>>> However, trying from ovirttest1 I cannot connect to the engine database
>>> using those same credentails, should I be able to? It'd make sense to be
>>> able to connect to it....
>>>
>>
>> It could depend on how you configured your pg_hba.conf for your DBMS
>> instance.
>> Only the engine and dwh have to connect to the engine VM, a direct DB
>> connection from the hosts is not required.
>>
>>
>>>
>>> Logan
>>>
>>> On Mon, Mar 20, 2017 at 10:14 AM, Alexander Wels <awels at redhat.com>
>>> wrote:
>>>
>>>> On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote:
>>>> > Starting at 1:09am on Saturday the Hosted Engine has been rebooting
>>>> because
>>>> > it failed it's liveliness check. This is due to the webadmin not
>>>> loading.
>>>> > Nothing changed as far as I can tell on the engine since it's last
>>>> > successful reboot on Friday afternoon.
>>>> >
>>>> > The engine, dwhd and httpd are all up and do not seem to be reporting
>>>> > anything unusual in their respective logs. The engine can talk to the
>>>> > database as I can login using the credentials in /etc/ovirt-engine/
>>>> engine.co
>>>> > nf.d/10-setup-database.conf and the logs on the postgres server are
>>>> showing
>>>> > activity.
>>>> >
>>>> > I tried to run engine-setup but it says it's not in global
>>>> maintenance even
>>>> > though the hosted engine hosts agree that it is. We are on version
>>>> 4.0.6.3
>>>> >
>>>> > Server, engine and agent logs are attached
>>>> >
>>>> > Regards,
>>>> > Logan
>>>>
>>>> Looking at our logs, it appears that on Friday one of your hosts ran
>>>> out of
>>>> disk space in its logs or temp directory. At which point connectivity
>>>> started
>>>> to be spotty. I see a bunch of attempts to migrate VMs away from that
>>>> host
>>>> (ovirttest1). All of them fail. That repeats a ton of times, I
>>>> forwarded to
>>>> Saturday where it appears you had a bunch of stale locks which also
>>>> repeates a
>>>> bunch of time until the engine VM gets restarted.
>>>>
>>>> Then I see nothing but restarts of the engine and no apparent errors in
>>>> the
>>>> engine log.
>>>>
>>>> The server log does however reveal this:
>>>> 2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger]
>>>> (QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred while
>>>> scanning for the next triggers to fire.: org.quartz.JobPersistenceExcep
>>>> tion:
>>>> Failed to obtain DB connection from data source 'NMEngineDS':
>>>> java.sql.SQLException: Could not retrieve datasource via JNDI url
>>>> 'java:/
>>>> ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>> connection
>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA [See nested
>>>> exception: java.sql.SQLException: Could not retrieve datasource via
>>>> JNDI url
>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>> connection
>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA]
>>>> at
>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>>> ection(JobStoreCMT.java:
>>>> 168) [quartz.jar:]
>>>> at
>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonMan
>>>> agedTXLock(JobStoreSupport.java:
>>>> 3807) [quartz.jar:]
>>>> at
>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrig
>>>> gers(JobStoreSupport.java:
>>>> 2751) [quartz.jar:]
>>>> at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThr
>>>> ead.java:
>>>> 264) [quartz.jar:]
>>>> Caused by: java.sql.SQLException: Could not retrieve datasource via
>>>> JNDI url
>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>> connection
>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA
>>>> at
>>>> org.quartz.utils.JNDIConnectionProvider.getConnection(JNDICo
>>>> nnectionProvider.java:
>>>> 163) [quartz.jar:]
>>>> at
>>>> org.quartz.utils.DBConnectionManager.getConnection(DBConnect
>>>> ionManager.java:
>>>> 108) [quartz.jar:]
>>>> at
>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>>> ection(JobStoreCMT.java:
>>>> 165) [quartz.jar:]
>>>> ... 3 more
>>>>
>>>> Is your postgresql service running? That is the most likely source of
>>>> the
>>>> engine not coming up.
>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170320/dce4ee95/attachment-0001.html>
More information about the Users
mailing list