[ovirt-users] oVirt 4.0.x - hosted-engine was not starting properly

Gervais de Montbrun gervais at demontbrun.com
Thu Sep 29 09:10:23 EDT 2016


Hi Simone,

Yes... I guess it was not clear in my original email. I changed the numbers myself to lower the timeout and retries. With them set as they were set by ovirt (timeout=3600 retry=5) things were not working for me. 

Cheers,
Gervais



> On Sep 29, 2016, at 10:04 AM, Simone Tiraboschi <stirabos at redhat.com> wrote:
> 
> 
> 
> On Thu, Sep 29, 2016 at 12:47 PM, Martin Perina <mperina at redhat.com <mailto:mperina at redhat.com>> wrote:
> Hi,
> 
> please take a look at my inline comments:
> 
> On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun <gervais at demontbrun.com <mailto:gervais at demontbrun.com>> wrote:
> Hey All,
> 
> Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think.
> The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
> 
> Description:
> When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
> 
> ​Could you please share with us logs gathered by ovirt-log-collector?
> 
> It's just a guess but could you please take a look if you HE VM has enough entropy?
> 
>   cat /proc/sys/kernel/random/entropy_avail
> 
> If the value is low (below or around 200),  you really need to install and configure some entropy generator such as haveged
> 
> 
> Solution:
> I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
> 
> -> Before change:
>     <LocationMatch ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$|rhevm.ssh.key.txt$)>
>         ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=3600 retry=5
> 
>         <IfModule deflate_module>
>             AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml
>         </IfModule>
>     </LocationMatch>
> 
> -> After change:
>     <LocationMatch ^/ovirt-engine($|/)>
>         ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=5 retry=2
> 
>         <IfModule deflate_module>
>             AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml
>         </IfModule>
>     </LocationMatch>
> 
> ​This one is correct for 4.0​​, not sure why it was not updated during upgrade from 3.6. @Simone?
>> 
> Honestly it's
>     <LocationMatch ^/ovirt-engine($|/)>
>         ProxyPassMatch ajp://127.0.0.1:8702 <http://127.0.0.1:8702/> timeout=3600 retry=5
> 
>         <IfModule deflate_module>
>             AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml
>         </IfModule>
>     </LocationMatch>
> also on a fresh 4.0 engine from our latest engine-appliance.
>  
> 
> If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.
> The change I made allows for there to be an error, and also releases apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
> 
> ​Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy.
> 
> Thanks
> 
> Martin Perina
>> 
> I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine.
> :-)
> 
> Cheers,
> Gervais
> 
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160929/8be40c2e/attachment-0001.html>


More information about the Users mailing list