[ovirt-users] oVirt 4.0.x - hosted-engine was not starting properly
Simone Tiraboschi
stirabos at redhat.com
Thu Sep 29 09:15:35 EDT 2016
On Thu, Sep 29, 2016 at 3:11 PM, Martin Perina <mperina at redhat.com> wrote:
>
>
> On Thu, Sep 29, 2016 at 3:04 PM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>> On Thu, Sep 29, 2016 at 12:47 PM, Martin Perina <mperina at redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> please take a look at my inline comments:
>>>
>>> On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun <
>>> gervais at demontbrun.com> wrote:
>>>
>>>> Hey All,
>>>>
>>>> Since updating to 4.0.x of oVirt, I have had an issue with my hosted
>>>> engine. After a some poking around, I think I have figured out my issue and
>>>> thought I would share to see what others think.
>>>> The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists
>>>> in 4.0.4.
>>>>
>>>> Description:
>>>> When my hosted engine starts it reports that it is in a degraded state
>>>> with 7 or 8 services still not started when I run systemctl status. It
>>>> takes about 6 or 7 minutes to eventually start all the services and come
>>>> online. If I don't set my cluster to Global-Maintenance mode it eventually
>>>> thinks that my hosted-engine needs to be rebooted and restarts it before it
>>>> can start everything.
>>>>
>>>
>>> Could you please share with us logs gathered by ovirt-log-collector?
>>>
>>> It's just a guess but could you please take a look if you HE VM has
>>> enough entropy?
>>>
>>> cat /proc/sys/kernel/random/entropy_avail
>>>
>>> If the value is low (below or around 200), you really need to install
>>> and configure some entropy generator such as haveged
>>>
>>>
>>>> Solution:
>>>> I realized that Apache was the culprit and found that the proxy to the
>>>> ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a
>>>> super long timeout with many retries. I changed the settings and now
>>>> everything works for me.
>>>>
>>>> -> Before change:
>>>>
>>>> <LocationMatch ^/(ovirt-engine($|/)|api($|/)|
>>>> RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$|
>>>> rhevm.ssh.key.txt$)>
>>>> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
>>>>
>>>> <IfModule deflate_module>
>>>> AddOutputFilterByType DEFLATE text/javascript text/css
>>>> text/html text/xml text/json application/xml application/json
>>>> application/x-yaml
>>>> </IfModule>
>>>> </LocationMatch>
>>>>
>>>>
>>>> -> After change:
>>>>
>>>> <LocationMatch ^/ovirt-engine($|/)>
>>>> ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
>>>>
>>>> <IfModule deflate_module>
>>>> AddOutputFilterByType DEFLATE text/javascript text/css
>>>> text/html text/xml text/json application/xml application/json
>>>> application/x-yaml
>>>> </IfModule>
>>>> </LocationMatch>
>>>>
>>>>
>>> This one is correct for 4.0
>>> , not sure why it was not updated during upgrade from 3.6. @Simone?
>>>
>>>
>>
>> Honestly it's
>> <LocationMatch ^/ovirt-engine($|/)>
>> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
>>
>> <IfModule deflate_module>
>> AddOutputFilterByType DEFLATE text/javascript text/css
>> text/html text/xml text/json application/xml application/json
>> application/x-yaml
>> </IfModule>
>> </LocationMatch>
>> also on a fresh 4.0 engine from our latest engine-appliance.
>>
>
> Right, I missed the timeout/retry option changes. But the important part
> is why old configuration (with different LocationMatch) was not overwritten
> during upgrade.
>
>
I suspect that it could got overwritten a second time to its 3.6 value in
our backup/restore procedure.
Adding Didi here.
>
>>
>>>
>>>> If I read the timeout settings correctly, it will wait 60 minutes with
>>>> 5 retries. 5 hours is way too long for my little server to hold onto all
>>>> those apache processes.
>>>>
>>> The change I made allows for there to be an error, and also releases
>>>> apache's hold on the process. Once everything is ready, apache is ready to
>>>> serve requests and everything/everyone is happy. Before making the change,
>>>> I just get a whitescreen in my browser and then nothing works until I
>>>> restart Apache (or I end up in an endless loop of ovirt-ha services
>>>> restarting my hosted-engine.
>>>>
>>>
>>> Well, if you have an issue with too many apache processes waiting for
>>> engine to respond, then there's some issue in engine. As I wrote above
>>> please share the logs with us and check entropy.
>>>
>>> Thanks
>>>
>>> Martin Perina
>>>
>>>
>>>
>>>>
>>>> I noticed that this setting reverts to the original setting, so oVirt
>>>> must be writing this file. Perhaps these number can be changed in oVirt? If
>>>> not, I will just setup and ansible play to revert the settings with working
>>>> values and restart apache on my engine.
>>>> :-)
>>>>
>>>> Cheers,
>>>> Gervais
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160929/977956c4/attachment.html>
More information about the Users
mailing list