[ovirt-users] oVirt 4.0.x - hosted-engine was not starting properly

Martin Perina mperina at redhat.com
Thu Sep 29 13:11:07 UTC 2016


On Thu, Sep 29, 2016 at 3:04 PM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

>
>
> On Thu, Sep 29, 2016 at 12:47 PM, Martin Perina <mperina at redhat.com>
> wrote:
>
>> Hi,
>>
>> please take a look at my inline comments:
>>
>> On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun <
>> gervais at demontbrun.com> wrote:
>>
>>> Hey All,
>>>
>>> Since updating to 4.0.x of oVirt, I have had an issue with my hosted
>>> engine. After a some poking around, I think I have figured out my issue and
>>> thought I would share to see what others think.
>>> The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in
>>> 4.0.4.
>>>
>>> Description:
>>> When my hosted engine starts it reports that it is in a degraded state
>>> with 7 or 8 services still not started when I run systemctl status. It
>>> takes about 6 or 7 minutes to eventually start all the services and come
>>> online. If I don't set my cluster to Global-Maintenance mode it eventually
>>> thinks that my hosted-engine needs to be rebooted and restarts it before it
>>> can start everything.
>>>
>>
>> ​Could you please share with us logs gathered by ovirt-log-collector?
>>
>> It's just a guess but could you please take a look if you HE VM has
>> enough entropy?
>>
>>   cat /proc/sys/kernel/random/entropy_avail
>>
>> If the value is low (below or around 200),  you really need to install
>> and configure some entropy generator such as haveged
>>
>>
>>> Solution:
>>> I realized that Apache was the culprit and found that the proxy to the
>>> ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super
>>> long timeout with many retries. I changed the settings and now everything
>>> works for me.
>>>
>>> -> Before change:
>>>
>>>     <LocationMatch ^/(ovirt-engine($|/)|api($|/)|
>>> RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$|
>>> rhevm.ssh.key.txt$)>
>>>         ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
>>>
>>>         <IfModule deflate_module>
>>>             AddOutputFilterByType DEFLATE text/javascript text/css
>>> text/html text/xml text/json application/xml application/json
>>> application/x-yaml
>>>         </IfModule>
>>>     </LocationMatch>
>>>
>>>
>>> -> After change:
>>>
>>>     <LocationMatch ^/ovirt-engine($|/)>
>>>         ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
>>>
>>>         <IfModule deflate_module>
>>>             AddOutputFilterByType DEFLATE text/javascript text/css
>>> text/html text/xml text/json application/xml application/json
>>> application/x-yaml
>>>         </IfModule>
>>>     </LocationMatch>
>>>
>>>
>> ​This one is correct for 4.0​
>> ​, not sure why it was not updated during upgrade from 3.6. @Simone?
>>>>
>
> Honestly it's
>     <LocationMatch ^/ovirt-engine($|/)>
>         ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
>
>         <IfModule deflate_module>
>             AddOutputFilterByType DEFLATE text/javascript text/css
> text/html text/xml text/json application/xml application/json
> application/x-yaml
>         </IfModule>
>     </LocationMatch>
> also on a fresh 4.0 engine from our latest engine-appliance.
>

​Right, I missed the timeout​/retry option changes. But the important part
is why old configuration (with different LocationMatch) was not overwritten
during upgrade.


>
>>
>>> If I read the timeout settings correctly, it will wait 60 minutes with 5
>>> retries. 5 hours is way too long for my little server to hold onto all
>>> those apache processes.
>>>
>> The change I made allows for there to be an error, and also releases
>>> apache's hold on the process. Once everything is ready, apache is ready to
>>> serve requests and everything/everyone is happy. Before making the change,
>>> I just get a whitescreen in my browser and then nothing works until I
>>> restart Apache (or I end up in an endless loop of ovirt-ha services
>>> restarting my hosted-engine.
>>>
>>
>> ​Well, if you have an issue with too many apache processes waiting for
>> engine to respond, then there's some issue in engine. As I wrote above
>> please share the logs with us and check entropy.
>>
>> Thanks
>>
>> Martin Perina
>>>>
>>
>>>
>>> I noticed that this setting reverts to the original setting, so oVirt
>>> must be writing this file. Perhaps these number can be changed in oVirt? If
>>> not, I will just setup and ansible play to revert the settings with working
>>> values and restart apache on my engine.
>>> :-)
>>>
>>> Cheers,
>>> Gervais
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160929/fb123032/attachment-0001.html>


More information about the Users mailing list