[ovirt-users] WebAdmin is down, HostedEngine is up

Simone Tiraboschi stirabos at redhat.com
Mon Mar 20 16:31:54 UTC 2017


On Mon, Mar 20, 2017 at 5:16 PM, Logan Kuhn <support at jac-properties.com>
wrote:

> That is another odd aspect about this.  The ovirt-engine service is up as
> is the httpd and ovirt-engine-dwhd services.
>
> Any ideas on how to fix it?
>

Please start running
  curl --insecure https://<your.vm.address>/ovirt-engine/services/health
from your hosts and from the engine VM itself and check the output


>
> Logan
>
> On Mon, Mar 20, 2017 at 11:02 AM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Mar 20, 2017 at 4:39 PM, Logan Kuhn <support at jac-properties.com>
>> wrote:
>>
>>> So that sounds like the host isn't able to communicate properly with the
>>> HEVM.  The cluster is still in global maintenance, but the HEVM still
>>> thinks that it isn't because the database says it isn't in global
>>> maintenance.
>>>
>>
>> No, it simply means that that the engine is not able to start and, if the
>> engine doesn't start, nothing else will update the host status in your DB.
>>
>>
>>
>>>
>>> Logan
>>>
>>>
>>> On Mon, Mar 20, 2017 at 10:34 AM, Simone Tiraboschi <stirabos at redhat.com
>>> > wrote:
>>>
>>>>
>>>>
>>>> On Mon, Mar 20, 2017 at 4:28 PM, Logan Kuhn <support at jac-properties.com
>>>> > wrote:
>>>>
>>>>> Yup, ovirttest1 ran out of disk space on Friday, we recovered it and
>>>>> everything seemed completely normal.
>>>>>
>>>>> the postgres service is down on the HEVM, but that is because it's on
>>>>> our postgresql cluster, has been for weeks.  I can connect to it's database
>>>>> from within the HEVM using the credentials stored at
>>>>> /etc/ovirt-engine/engine.conf.d/10-setup-database.conf  I can tail
>>>>> the logs on the postgres master and ovirt can and does connect to it.
>>>>>
>>>>> However, trying from ovirttest1 I cannot connect to the engine
>>>>> database using those same credentails, should I be able to?  It'd make
>>>>> sense to be able to connect to it....
>>>>>
>>>>
>>>> It could depend on how you configured your pg_hba.conf for your DBMS
>>>> instance.
>>>> Only the engine and dwh have to connect to the engine VM, a direct DB
>>>> connection from the hosts is not required.
>>>>
>>>>
>>>>>
>>>>> Logan
>>>>>
>>>>> On Mon, Mar 20, 2017 at 10:14 AM, Alexander Wels <awels at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote:
>>>>>> > Starting at 1:09am on Saturday the Hosted Engine has been rebooting
>>>>>> because
>>>>>> > it failed it's liveliness check.  This is due to the webadmin not
>>>>>> loading.
>>>>>> > Nothing changed as far as I can tell on the engine since it's last
>>>>>> > successful reboot on Friday afternoon.
>>>>>> >
>>>>>> > The engine, dwhd and httpd are all up and do not seem to be
>>>>>> reporting
>>>>>> > anything unusual in their respective logs.  The engine can talk to
>>>>>> the
>>>>>> > database as I can login using the credentials in /etc/ovirt-engine/
>>>>>> engine.co
>>>>>> > nf.d/10-setup-database.conf and the logs on the postgres server are
>>>>>> showing
>>>>>> > activity.
>>>>>> >
>>>>>> > I tried to run engine-setup but it says it's not in global
>>>>>> maintenance even
>>>>>> > though the hosted engine hosts agree that it is.  We are on version
>>>>>> 4.0.6.3
>>>>>> >
>>>>>> > Server, engine and agent logs are attached
>>>>>> >
>>>>>> > Regards,
>>>>>> > Logan
>>>>>>
>>>>>> Looking at our logs, it appears that on Friday one of your hosts ran
>>>>>> out of
>>>>>> disk space in its logs or temp directory. At which point connectivity
>>>>>> started
>>>>>> to be spotty. I see a bunch of attempts to migrate VMs away from that
>>>>>> host
>>>>>> (ovirttest1). All of them fail. That repeats a ton of times, I
>>>>>> forwarded to
>>>>>> Saturday where it appears you had a bunch of stale locks which also
>>>>>> repeates a
>>>>>> bunch of time until the engine VM gets restarted.
>>>>>>
>>>>>> Then I see nothing but restarts of the engine and no apparent errors
>>>>>> in the
>>>>>> engine log.
>>>>>>
>>>>>> The server log does however reveal this:
>>>>>> 2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger]
>>>>>> (QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred
>>>>>> while
>>>>>> scanning for the next triggers to fire.:
>>>>>> org.quartz.JobPersistenceException:
>>>>>> Failed to obtain DB connection from data source 'NMEngineDS':
>>>>>> java.sql.SQLException: Could not retrieve datasource via JNDI url
>>>>>> 'java:/
>>>>>> ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>>>> connection
>>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA [See
>>>>>> nested
>>>>>> exception: java.sql.SQLException: Could not retrieve datasource via
>>>>>> JNDI url
>>>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>>>> connection
>>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA]
>>>>>>         at
>>>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>>>>> ection(JobStoreCMT.java:
>>>>>> 168) [quartz.jar:]
>>>>>>         at
>>>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonMan
>>>>>> agedTXLock(JobStoreSupport.java:
>>>>>> 3807) [quartz.jar:]
>>>>>>         at
>>>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrig
>>>>>> gers(JobStoreSupport.java:
>>>>>> 2751) [quartz.jar:]
>>>>>>         at org.quartz.core.QuartzSchedule
>>>>>> rThread.run(QuartzSchedulerThread.java:
>>>>>> 264) [quartz.jar:]
>>>>>> Caused by: java.sql.SQLException: Could not retrieve datasource via
>>>>>> JNDI url
>>>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>>>> connection
>>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA
>>>>>>         at
>>>>>> org.quartz.utils.JNDIConnectionProvider.getConnection(JNDICo
>>>>>> nnectionProvider.java:
>>>>>> 163) [quartz.jar:]
>>>>>>         at
>>>>>> org.quartz.utils.DBConnectionManager.getConnection(DBConnect
>>>>>> ionManager.java:
>>>>>> 108) [quartz.jar:]
>>>>>>         at
>>>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>>>>> ection(JobStoreCMT.java:
>>>>>> 165) [quartz.jar:]
>>>>>>         ... 3 more
>>>>>>
>>>>>> Is your postgresql service running? That is the most likely source of
>>>>>> the
>>>>>> engine not coming up.
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170320/8c3b2de5/attachment-0001.html>


More information about the Users mailing list