[ovirt-users] oVirt 4.0.x - hosted-engine was not starting properly
Gervais de Montbrun
gervais at demontbrun.com
Tue Sep 27 17:23:39 UTC 2016
Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think.
The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
-> Before change:
ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
-> After change:
ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes. The change I made allows for there to be an error, and also releases apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users