Hi Martin,The entropy was super low. Somewhere around 140. I installed and configured haveged.service to start at bootup, reverted my apache changes... After a reboot, my systemctl status still says that there are 7 services queued (note that I erroneously said degraded in my previous email - the services are, in fact, queued), but the oVirt GUI comes up almost immediately and everything seems to be great.
Thank you for the tip. You solved my issue.
Cheers,
GervaisOn Sep 29, 2016, at 7:47 AM, Martin Perina <mperina@redhat.com> wrote:Hi,please take a look at my inline comments:On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun <gervais@demontbrun.com> wrote:Hey All,Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think.The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.Description:When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.Could you please share with us logs gathered by ovirt-log-collector?
It's just a guess but could you please take a look if you HE VM has enough entropy?
cat /proc/sys/kernel/random/entropy_avail If the value is low (below or around 200), you really need to install and configure some entropy generator such as havegedSolution:I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me. -> Before change:<LocationMatch ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/|OvirtEngineWeb /|ca.crt$|engine.ssh.key.txt$| rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5<IfModule deflate_module>AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml</IfModule></LocationMatch>-> After change:<LocationMatch ^/ovirt-engine($|/)>ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2<IfModule deflate_module>AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml</IfModule></LocationMatch>This one is correct for 4.0, not sure why it was not updated during upgrade from 3.6. @Simone?
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.The change I made allows for there to be an error, and also releases apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy.ThanksMartin Perina
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine.:-)
Cheers,
Gervais
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users