On Mon, Mar 20, 2017 at 5:16 PM, Logan Kuhn <support(a)jac-properties.com>
wrote:
That is another odd aspect about this. The ovirt-engine service is
up as
is the httpd and ovirt-engine-dwhd services.
Any ideas on how to fix it?
Please start running
curl --insecure https://<your.vm.address>/ovirt-engine/services/health
from your hosts and from the engine VM itself and check the output
Logan
On Mon, Mar 20, 2017 at 11:02 AM, Simone Tiraboschi <stirabos(a)redhat.com>
wrote:
>
>
> On Mon, Mar 20, 2017 at 4:39 PM, Logan Kuhn <support(a)jac-properties.com>
> wrote:
>
>> So that sounds like the host isn't able to communicate properly with the
>> HEVM. The cluster is still in global maintenance, but the HEVM still
>> thinks that it isn't because the database says it isn't in global
>> maintenance.
>>
>
> No, it simply means that that the engine is not able to start and, if the
> engine doesn't start, nothing else will update the host status in your DB.
>
>
>
>>
>> Logan
>>
>>
>> On Mon, Mar 20, 2017 at 10:34 AM, Simone Tiraboschi <stirabos(a)redhat.com
>> > wrote:
>>
>>>
>>>
>>> On Mon, Mar 20, 2017 at 4:28 PM, Logan Kuhn <support(a)jac-properties.com
>>> > wrote:
>>>
>>>> Yup, ovirttest1 ran out of disk space on Friday, we recovered it and
>>>> everything seemed completely normal.
>>>>
>>>> the postgres service is down on the HEVM, but that is because it's
on
>>>> our postgresql cluster, has been for weeks. I can connect to it's
database
>>>> from within the HEVM using the credentials stored at
>>>> /etc/ovirt-engine/engine.conf.d/10-setup-database.conf I can tail
>>>> the logs on the postgres master and ovirt can and does connect to it.
>>>>
>>>> However, trying from ovirttest1 I cannot connect to the engine
>>>> database using those same credentails, should I be able to? It'd
make
>>>> sense to be able to connect to it....
>>>>
>>>
>>> It could depend on how you configured your pg_hba.conf for your DBMS
>>> instance.
>>> Only the engine and dwh have to connect to the engine VM, a direct DB
>>> connection from the hosts is not required.
>>>
>>>
>>>>
>>>> Logan
>>>>
>>>> On Mon, Mar 20, 2017 at 10:14 AM, Alexander Wels
<awels(a)redhat.com>
>>>> wrote:
>>>>
>>>>> On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote:
>>>>> > Starting at 1:09am on Saturday the Hosted Engine has been
rebooting
>>>>> because
>>>>> > it failed it's liveliness check. This is due to the
webadmin not
>>>>> loading.
>>>>> > Nothing changed as far as I can tell on the engine since
it's last
>>>>> > successful reboot on Friday afternoon.
>>>>> >
>>>>> > The engine, dwhd and httpd are all up and do not seem to be
>>>>> reporting
>>>>> > anything unusual in their respective logs. The engine can talk
to
>>>>> the
>>>>> > database as I can login using the credentials in
/etc/ovirt-engine/
>>>>> engine.co
>>>>> > nf.d/10-setup-database.conf and the logs on the postgres server
are
>>>>> showing
>>>>> > activity.
>>>>> >
>>>>> > I tried to run engine-setup but it says it's not in global
>>>>> maintenance even
>>>>> > though the hosted engine hosts agree that it is. We are on
version
>>>>> 4.0.6.3
>>>>> >
>>>>> > Server, engine and agent logs are attached
>>>>> >
>>>>> > Regards,
>>>>> > Logan
>>>>>
>>>>> Looking at our logs, it appears that on Friday one of your hosts ran
>>>>> out of
>>>>> disk space in its logs or temp directory. At which point
connectivity
>>>>> started
>>>>> to be spotty. I see a bunch of attempts to migrate VMs away from
that
>>>>> host
>>>>> (ovirttest1). All of them fail. That repeats a ton of times, I
>>>>> forwarded to
>>>>> Saturday where it appears you had a bunch of stale locks which also
>>>>> repeates a
>>>>> bunch of time until the engine VM gets restarted.
>>>>>
>>>>> Then I see nothing but restarts of the engine and no apparent errors
>>>>> in the
>>>>> engine log.
>>>>>
>>>>> The server log does however reveal this:
>>>>> 2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger]
>>>>> (QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred
>>>>> while
>>>>> scanning for the next triggers to fire.:
>>>>> org.quartz.JobPersistenceException:
>>>>> Failed to obtain DB connection from data source
'NMEngineDS':
>>>>> java.sql.SQLException: Could not retrieve datasource via JNDI url
>>>>> 'java:/
>>>>> ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>>> connection
>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA [See
>>>>> nested
>>>>> exception: java.sql.SQLException: Could not retrieve datasource via
>>>>> JNDI url
>>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>>> connection
>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA]
>>>>> at
>>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>>>> ection(JobStoreCMT.java:
>>>>> 168) [quartz.jar:]
>>>>> at
>>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonMan
>>>>> agedTXLock(JobStoreSupport.java:
>>>>> 3807) [quartz.jar:]
>>>>> at
>>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrig
>>>>> gers(JobStoreSupport.java:
>>>>> 2751) [quartz.jar:]
>>>>> at org.quartz.core.QuartzSchedule
>>>>> rThread.run(QuartzSchedulerThread.java:
>>>>> 264) [quartz.jar:]
>>>>> Caused by: java.sql.SQLException: Could not retrieve datasource via
>>>>> JNDI url
>>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:
>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a
>>>>> connection
>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA
>>>>> at
>>>>> org.quartz.utils.JNDIConnectionProvider.getConnection(JNDICo
>>>>> nnectionProvider.java:
>>>>> 163) [quartz.jar:]
>>>>> at
>>>>> org.quartz.utils.DBConnectionManager.getConnection(DBConnect
>>>>> ionManager.java:
>>>>> 108) [quartz.jar:]
>>>>> at
>>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn
>>>>> ection(JobStoreCMT.java:
>>>>> 165) [quartz.jar:]
>>>>> ... 3 more
>>>>>
>>>>> Is your postgresql service running? That is the most likely source
of
>>>>> the
>>>>> engine not coming up.
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>