<div dir="ltr">It hangs with no output.<div><br></div><div>We managed to get it working by manually changing it to global maintenance in the database after stopping the engine and dwhd services and dumping the database.</div><div><br></div><div>One more bit that might or might not be useful, this was being output to the postgres logs quite a bit</div><div><br></div><div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">2017-03-20 11:48:51 CDT [8246]: [9910-1] user=engine,db=engine LOG: duration: 34.999 ms execute S_10: select * from getdisksvmguid($1, $2, $3, $4)
</span><br>2017-03-20 11:48:51 CDT [8246]: [9911-1] user=engine,db=engine DETAIL: parameters: $1 = '32116ed2-da80-4e03-8f39-6435339ba674', $2 = 'f', $3 = NULL, $4 = 'f'
<br>2017-03-20 11:48:51 CDT [8246]: [9912-1] user=engine,db=engine LOG: duration: 35.894 ms execute S_10: select * from getdisksvmguid($1, $2, $3, $4)
<br>2017-03-20 11:48:51 CDT [8246]: [9913-1] user=engine,db=engine DETAIL: parameters: $1 = '32116ed2-da80-4e03-8f39-6435339ba674', $2 = 'f', $3 = NULL, $4 = 'f'
<br>2017-03-20 11:48:51 CDT [8246]: [9914-1] user=engine,db=engine LOG: duration: 35.051 ms execute S_10: select * from getdisksvmguid($1, $2, $3, $4)
<br>2017-03-20 11:48:51 CDT [8246]: [9915-1] user=engine,db=engine DETAIL: parameters: $1 = '32116ed2-da80-4e03-8f39-6435339ba674', $2 = 'f', $3 = NULL, $4 = 'f'
<br>2017-03-20 11:48:51 CDT [8246]: [9916-1] user=engine,db=engine LOG: duration: 34.951 ms execute S_10: select * from getdisksvmguid($1, $2, $3, $4)
<br>2017-03-20 11:48:51 CDT [8246]: [9917-1] user=engine,db=engine DETAIL: parameters: $1 = '32116ed2-da80-4e03-8f39-6435339ba674', $2 = 'f', $3 = NULL, $4 = 'f'
<br>2017-03-20 11:48:51 CDT [8246]: [9918-1] user=engine,db=engine LOG: duration: 36.855 ms execute S_10: select * from getdisksvmguid($1, $2, $3, $4)
<br>2017-03-20 11:48:51 CDT [8246]: [9919-1] user=engine,db=engine DETAIL: parameters: $1 = '32116ed2-da80-4e03-8f39-6435339ba674', $2 = 'f', $3 = NULL, $4 = 'f'
<br>2017-03-20 11:48:51 CDT [8246]: [9920-1] user=engine,db=engine LOG: duration: 36.687 ms execute S_10: select * from getdisksvmguid($1, $2, $3, $4)<br></span><div><br></div><div>The problem is resolved, but we are intensely curious to know why it happened and if there is a better solution going forward.</div><div><br></div><div>Logan<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 20, 2017 at 11:31 AM, Simone Tiraboschi <span dir="ltr"><<a href="mailto:stirabos@redhat.com" target="_blank">stirabos@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="gmail-">On Mon, Mar 20, 2017 at 5:16 PM, Logan Kuhn <span dir="ltr"><<a href="mailto:support@jac-properties.com" target="_blank">support@jac-properties.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span class="gmail-m_2824865906232317548gmail-"><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;font-size:12.8px">That is another odd aspect about this. The ovirt-engine service is up as is the httpd and ovirt-engine-dwhd services.</span><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="font-size:12.8px"><br></span></font></div></span><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="font-size:12.8px">Any ideas on how to fix it?</span></font></div></div></blockquote><div><br></div></span><div>Please start running</div><div> curl --insecure https://<your.vm.<wbr>address>/ovirt-engine/<wbr>services/health<br></div><div>from your hosts and from the engine VM itself and check the output</div><div><div class="gmail-h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span class="gmail-m_2824865906232317548gmail-HOEnZb"><font color="#888888"><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="font-size:12.8px"><br></span></font></div></font></span><div><span class="gmail-m_2824865906232317548gmail-HOEnZb"><font color="#888888"><font color="#000000" face="arial, helvetica, sans-serif"><span style="font-size:12.8px">Logan<br></span></font></font></span><div><div class="gmail-m_2824865906232317548gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 20, 2017 at 11:02 AM, Simone Tiraboschi <span dir="ltr"><<a href="mailto:stirabos@redhat.com" target="_blank">stirabos@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Mon, Mar 20, 2017 at 4:39 PM, Logan Kuhn <span dir="ltr"><<a href="mailto:support@jac-properties.com" target="_blank">support@jac-properties.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">So that sounds like the host isn't able to communicate properly with the HEVM. The cluster is still in global maintenance, but the HEVM still thinks that it isn't because the database says it isn't in global maintenance.</div></blockquote><div><br></div></span><div>No, it simply means that that the engine is not able to start and, if the engine doesn't start, nothing else will update the host status in your DB.</div><div><div class="gmail-m_2824865906232317548gmail-m_-4550469166058328721h5"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291HOEnZb"><font color="#888888"><div><br></div></font></span><div><span class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291HOEnZb"><font color="#888888">Logan</font></span><div><div class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291h5"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 20, 2017 at 10:34 AM, Simone Tiraboschi <span dir="ltr"><<a href="mailto:stirabos@redhat.com" target="_blank">stirabos@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Mon, Mar 20, 2017 at 4:28 PM, Logan Kuhn <span dir="ltr"><<a href="mailto:support@jac-properties.com" target="_blank">support@jac-properties.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span style="font-size:12.8px">Yup, ovirttest1 ran out of disk space on Friday, we recovered it and everything seemed completely normal.</span><div style="font-size:12.8px"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-size:12.8px"><font face="arial, helvetica, sans-serif">the postgres service is down on the HEVM, but that is because it's on our postgresql cluster, has been for weeks. I can connect to it's database from within the HEVM using the credentials stored at <span style="color:rgb(0,0,0)">/etc/ovirt-engine/<a href="http://engine.co" target="_blank">engine.co</a><wbr>nf.d/10-setup-database.conf I can tail the logs on the postgres master and ovirt can and does connect to it.</span></font></div><div style="font-size:12.8px"><font face="arial, helvetica, sans-serif"><span style="color:rgb(0,0,0)"><br></span></font></div><div style="font-size:12.8px"><font face="arial, helvetica, sans-serif"><span style="color:rgb(0,0,0)">However, trying from ovirttest1 I cannot connect to the engine database using those same credentails, should I be able to? It'd make sense to be able to connect to it....</span></font></div></div></blockquote><div><br></div></span><div>It could depend on how you configured your pg_hba.conf for your DBMS instance.</div><div>Only the engine and dwh have to connect to the engine VM, a direct DB connection from the hosts is not required.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291m_3566985714270118786h5"><div dir="ltr"><span class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291m_3566985714270118786m_-6910422670289814746gmail-HOEnZb"><font color="#888888"><div style="font-size:12.8px"><font face="arial, helvetica, sans-serif"><span style="color:rgb(0,0,0)"><br></span></font></div><div style="font-size:12.8px"><font face="arial, helvetica, sans-serif"><span style="color:rgb(0,0,0)">Logan</span></font></div></font></span><div><div class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291m_3566985714270118786m_-6910422670289814746gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 20, 2017 at 10:14 AM, Alexander Wels <span dir="ltr"><<a href="mailto:awels@redhat.com" target="_blank">awels@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291m_3566985714270118786m_-6910422670289814746gmail-m_7749797448961125126gmail-HOEnZb"><div class="gmail-m_2824865906232317548gmail-m_-4550469166058328721m_-8185034328628676291m_3566985714270118786m_-6910422670289814746gmail-m_7749797448961125126gmail-h5">On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote:<br>
> Starting at 1:09am on Saturday the Hosted Engine has been rebooting because<br>
> it failed it's liveliness check. This is due to the webadmin not loading.<br>
> Nothing changed as far as I can tell on the engine since it's last<br>
> successful reboot on Friday afternoon.<br>
><br>
> The engine, dwhd and httpd are all up and do not seem to be reporting<br>
> anything unusual in their respective logs. The engine can talk to the<br>
> database as I can login using the credentials in /etc/ovirt-engine/<a href="http://engine.co" rel="noreferrer" target="_blank">engine.co</a><br>
> nf.d/10-setup-database.conf and the logs on the postgres server are showing<br>
> activity.<br>
><br>
> I tried to run engine-setup but it says it's not in global maintenance even<br>
> though the hosted engine hosts agree that it is. We are on version 4.0.6.3<br>
><br>
> Server, engine and agent logs are attached<br>
><br>
> Regards,<br>
> Logan<br>
<br>
</div></div>Looking at our logs, it appears that on Friday one of your hosts ran out of<br>
disk space in its logs or temp directory. At which point connectivity started<br>
to be spotty. I see a bunch of attempts to migrate VMs away from that host<br>
(ovirttest1). All of them fail. That repeats a ton of times, I forwarded to<br>
Saturday where it appears you had a bunch of stale locks which also repeates a<br>
bunch of time until the engine VM gets restarted.<br>
<br>
Then I see nothing but restarts of the engine and no apparent errors in the<br>
engine log.<br>
<br>
The server log does however reveal this:<br>
2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger]<br>
(QuartzOvirtDBScheduler_Quartz<wbr>SchedulerThread) An error occurred while<br>
scanning for the next triggers to fire.: org.quartz.JobPersistenceExcep<wbr>tion:<br>
Failed to obtain DB connection from data source 'NMEngineDS':<br>
java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/<br>
ENGINEDataSourceNoJTA' java.sql.SQLException:<br>
javax.resource.ResourceExcepti<wbr>on: IJ000470: You are trying to use a connection<br>
factory that has been shut down: java:/ENGINEDataSourceNoJTA [See nested<br>
exception: java.sql.SQLException: Could not retrieve datasource via JNDI url<br>
'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:<br>
javax.resource.ResourceExcepti<wbr>on: IJ000470: You are trying to use a connection<br>
factory that has been shut down: java:/ENGINEDataSourceNoJTA]<br>
at<br>
org.quartz.impl.jdbcjobstore.J<wbr>obStoreCMT.getNonManagedTXConn<wbr>ection(JobStoreCMT.java:<br>
168) [quartz.jar:]<br>
at<br>
org.quartz.impl.jdbcjobstore.J<wbr>obStoreSupport.executeInNonMan<wbr>agedTXLock(JobStoreSupport.jav<wbr>a:<br>
3807) [quartz.jar:]<br>
at<br>
org.quartz.impl.jdbcjobstore.J<wbr>obStoreSupport.acquireNextTrig<wbr>gers(JobStoreSupport.java:<br>
2751) [quartz.jar:]<br>
at org.quartz.core.QuartzSchedule<wbr>rThread.run(QuartzSchedulerThr<wbr>ead.java:<br>
264) [quartz.jar:]<br>
Caused by: java.sql.SQLException: Could not retrieve datasource via JNDI url<br>
'java:/ENGINEDataSourceNoJTA' java.sql.SQLException:<br>
javax.resource.ResourceExcepti<wbr>on: IJ000470: You are trying to use a connection<br>
factory that has been shut down: java:/ENGINEDataSourceNoJTA<br>
at<br>
org.quartz.utils.JNDIConnectio<wbr>nProvider.getConnection(JNDICo<wbr>nnectionProvider.java:<br>
163) [quartz.jar:]<br>
at<br>
org.quartz.utils.DBConnectionM<wbr>anager.getConnection(DBConnect<wbr>ionManager.java:<br>
108) [quartz.jar:]<br>
at<br>
org.quartz.impl.jdbcjobstore.J<wbr>obStoreCMT.getNonManagedTXConn<wbr>ection(JobStoreCMT.java:<br>
165) [quartz.jar:]<br>
... 3 more<br>
<br>
Is your postgresql service running? That is the most likely source of the<br>
engine not coming up.<br>
</blockquote></div><br></div></div></div></div>
<br></div></div><span>______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>
<br></span></blockquote></div><br></div></div>
</blockquote></div><br></div></div></div></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div></div></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div></div></div>