oVirt 4.0.x - hosted-engine was not starting properly

--Apple-Mail=_36EF5248-AA3D-484E-8A71-55F0C8DF8023 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hey All, Since updating to 4.0.x of oVirt, I have had an issue with my hosted = engine. After a some poking around, I think I have figured out my issue = and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in = 4.0.4. Description: When my hosted engine starts it reports that it is in a degraded state = with 7 or 8 services still not started when I run systemctl status. It = takes about 6 or 7 minutes to eventually start all the services and come = online. If I don't set my cluster to Global-Maintenance mode it = eventually thinks that my hosted-engine needs to be rebooted and = restarts it before it can start everything. Solution: I realized that Apache was the culprit and found that the proxy to the = ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super = long timeout with many retries. I changed the settings and now = everything works for me. -> Before change: <LocationMatch = ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engi= ne.ssh.key.txt$|rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3D3600 retry=3D5 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> -> After change: <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3D5 retry=3D2 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> If I read the timeout settings correctly, it will wait 60 minutes with 5 = retries. 5 hours is way too long for my little server to hold onto all = those apache processes. The change I made allows for there to be an = error, and also releases apache's hold on the process. Once everything = is ready, apache is ready to serve requests and everything/everyone is = happy. Before making the change, I just get a whitescreen in my browser = and then nothing works until I restart Apache (or I end up in an endless = loop of ovirt-ha services restarting my hosted-engine. I noticed that this setting reverts to the original setting, so oVirt = must be writing this file. Perhaps these number can be changed in oVirt? = If not, I will just setup and ansible play to revert the settings with = working values and restart apache on my engine. :-) Cheers, Gervais --Apple-Mail=_36EF5248-AA3D-484E-8A71-55F0C8DF8023 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">Hey All,<div class=3D""><br class=3D""></div><div = class=3D"">Since updating to 4.0.x of oVirt, I have had an issue with my = hosted engine. After a some poking around, I think I have figured out my = issue and thought I would share to see what others think.</div><div = class=3D"">The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and = still exists in 4.0.4.</div><div class=3D""><br class=3D""></div><div = class=3D"">Description:</div><div class=3D"">When my hosted engine = starts it reports that it is in a degraded state with 7 or 8 services = still not started when I run systemctl status. It takes about 6 or 7 = minutes to eventually start all the services and come online. If I don't = set my cluster to Global-Maintenance mode it eventually thinks that my = hosted-engine needs to be rebooted and restarts it before it can start = everything.</div><div class=3D""><br class=3D""></div><div = class=3D"">Solution:</div><div class=3D"">I realized that Apache was the = culprit and found that the proxy to the ovirt-engine = in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long = timeout with many retries. I changed the settings and now everything = works for me.</div><div class=3D""><br class=3D""></div><div = class=3D"">-> Before change:</div><blockquote style=3D"margin: 0 0 0 = 40px; border: none; padding: 0px;" class=3D""><div class=3D""><div = class=3D""> <LocationMatch = ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engi= ne.ssh.key.txt$|rhevm.ssh.key.txt$)></div><div class=3D""> = ProxyPassMatch <a href=3D"ajp://127.0.0.1:8702" = class=3D"">ajp://127.0.0.1:8702</a> timeout=3D3600 retry=3D5</div><div = class=3D""><br class=3D""></div><div class=3D""> = <IfModule deflate_module></div><div class=3D""> = AddOutputFilterByType DEFLATE = text/javascript text/css text/html text/xml text/json application/xml = application/json application/x-yaml</div><div class=3D""> = </IfModule></div><div class=3D""> = </LocationMatch></div></div></blockquote><div class=3D""><br = class=3D""></div>-> After change:<blockquote style=3D"margin: 0 0 0 = 40px; border: none; padding: 0px;" class=3D""><div class=3D""><div = class=3D""> <LocationMatch = ^/ovirt-engine($|/)></div><div class=3D""> = ProxyPassMatch <a href=3D"ajp://127.0.0.1:8702" = class=3D"">ajp://127.0.0.1:8702</a> timeout=3D5 retry=3D2</div><div = class=3D""><br class=3D""></div><div class=3D""> = <IfModule deflate_module></div><div class=3D""> = AddOutputFilterByType DEFLATE = text/javascript text/css text/html text/xml text/json application/xml = application/json application/x-yaml</div><div class=3D""> = </IfModule></div><div class=3D""> = </LocationMatch></div></div></blockquote><div class=3D""><br = class=3D""></div>If I read the timeout settings correctly, it will wait = 60 minutes with 5 retries. 5 hours is way too long for my little server = to hold onto all those apache processes. The change I made allows for = there to be an error, and also releases apache's hold on the process. = Once everything is ready, apache is ready to serve requests and = everything/everyone is happy. Before making the change, I just get a = whitescreen in my browser and then nothing works until I restart Apache = (or I end up in an endless loop of ovirt-ha services restarting my = hosted-engine.<br class=3D""><div class=3D""><div class=3D""><div = class=3D""><br class=3D"webkit-block-placeholder"></div><div class=3D"">I = noticed that this setting reverts to the original setting, so oVirt must = be writing this file. Perhaps these number can be changed in oVirt? If = not, I will just setup and ansible play to revert the settings with = working values and restart apache on my engine.</div><div = class=3D"">:-)</div><div class=3D""> <div id=3D"signature" class=3D""><br class=3D"">Cheers,<br = class=3D"">Gervais<br class=3D""><br class=3D""><br class=3D""></div> </div> <br class=3D""></div></div></body></html>= --Apple-Mail=_36EF5248-AA3D-484E-8A71-55F0C8DF8023--

Hi, please take a look at my inline comments: On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun <gervais@demontbrun.com
wrote:
Hey All,
Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
Description: When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
Could you please share with us logs gathered by ovirt-log-collector? It's just a guess but could you please take a look if you HE VM has enough entropy? cat /proc/sys/kernel/random/entropy_avail If the value is low (below or around 200), you really need to install and configure some entropy generator such as haveged
Solution: I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
-> Before change:
<LocationMatch ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/| OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$|rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
-> After change:
<LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
This one is correct for 4.0 , not sure why it was not updated during upgrade from 3.6. @Simone?
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.
The change I made allows for there to be an error, and also releases
apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy. Thanks Martin Perina
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine. :-)
Cheers, Gervais
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

We have the same configuration under the file /etc/httpd/conf.d/z-ovirt-engine-proxy.conf for the regular engine under 3.6 and 4.0, so I do not sure if it relates to the problem. About entropy level check the bug https://bugzilla.redhat.com/show_bug.cgi?id=1357246. Best Regards On Thu, Sep 29, 2016 at 1:47 PM, Martin Perina <mperina@redhat.com> wrote:
Hi,
please take a look at my inline comments:
On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun < gervais@demontbrun.com> wrote:
Hey All,
Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
Description: When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
Could you please share with us logs gathered by ovirt-log-collector?
It's just a guess but could you please take a look if you HE VM has enough entropy?
cat /proc/sys/kernel/random/entropy_avail
If the value is low (below or around 200), you really need to install and configure some entropy generator such as haveged
Solution: I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
-> Before change:
<LocationMatch ^/(ovirt-engine($|/)|api($|/)| RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$| rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
-> After change:
<LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
This one is correct for 4.0 , not sure why it was not updated during upgrade from 3.6. @Simone?
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.
The change I made allows for there to be an error, and also releases
apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy.
Thanks
Martin Perina
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine. :-)
Cheers, Gervais
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Sep 29, 2016, at 7:47 AM, Martin Perina <mperina@redhat.com> wrote: =20 Hi, =20 please take a look at my inline comments: =20 On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun = <gervais@demontbrun.com <mailto:gervais@demontbrun.com>> wrote: Hey All, =20 Since updating to 4.0.x of oVirt, I have had an issue with my hosted = engine. After a some poking around, I think I have figured out my issue = and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists = in 4.0.4. =20 Description: When my hosted engine starts it reports that it is in a degraded state = with 7 or 8 services still not started when I run systemctl status. It = takes about 6 or 7 minutes to eventually start all the services and come = online. If I don't set my cluster to Global-Maintenance mode it = eventually thinks that my hosted-engine needs to be rebooted and = restarts it before it can start everything. =20 =E2=80=8BCould you please share with us logs gathered by = ovirt-log-collector? =20 It's just a guess but could you please take a look if you HE VM has = enough entropy? =20 cat /proc/sys/kernel/random/entropy_avail =20 If the value is low (below or around 200), you really need to install = and configure some entropy generator such as haveged =20 =20 Solution: I realized that Apache was the culprit and found that the proxy to the = ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super = long timeout with many retries. I changed the settings and now = everything works for me. =20 -> Before change: <LocationMatch = ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engi= ne.ssh.key.txt$|rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=3D3600 retry=3D5 =20 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> =20 -> After change: <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=3D5 retry=3D2 =20 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> =20 =E2=80=8BThis one is correct for 4.0=E2=80=8B=E2=80=8B, not sure why = it was not updated during upgrade from 3.6. @Simone? =E2=80=8B=20 =20 If I read the timeout settings correctly, it will wait 60 minutes with = 5 retries. 5 hours is way too long for my little server to hold onto all =
--Apple-Mail=_B51077DC-F506-4239-9A68-4C1B6F0E0FC4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Martin, The entropy was super low. Somewhere around 140. I installed and = configured haveged.service to start at bootup, reverted my apache = changes... After a reboot, my systemctl status still says that there are = 7 services queued (note that I erroneously said degraded in my previous = email - the services are, in fact, queued), but the oVirt GUI comes up = almost immediately and everything seems to be great. Thank you for the tip. You solved my issue. Cheers, Gervais those apache processes.
The change I made allows for there to be an error, and also releases = apache's hold on the process. Once everything is ready, apache is ready = to serve requests and everything/everyone is happy. Before making the = change, I just get a whitescreen in my browser and then nothing works = until I restart Apache (or I end up in an endless loop of ovirt-ha = services restarting my hosted-engine. =20 =E2=80=8BWell, if you have an issue with too many apache processes = waiting for engine to respond, then there's some issue in engine. As I = wrote above please share the logs with us and check entropy. =20 Thanks =20 Martin Perina =E2=80=8B=20 =20 I noticed that this setting reverts to the original setting, so oVirt = must be writing this file. Perhaps these number can be changed in oVirt? = If not, I will just setup and ansible play to revert the settings with = working values and restart apache on my engine. :-) =20 Cheers, Gervais =20 =20 =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20
--Apple-Mail=_B51077DC-F506-4239-9A68-4C1B6F0E0FC4 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">Hi Martin,<div class=3D""><br class=3D""></div><div = class=3D"">The entropy was super low. Somewhere around 140. I installed = and configured haveged.service to start at bootup, reverted my = apache changes... After a reboot, my systemctl status still says that = there are 7 services queued (note that I erroneously said degraded in my = previous email - the services are, in fact, queued), but the oVirt GUI = comes up almost immediately and everything seems to be great.</div><div = class=3D""><br class=3D""></div><div class=3D"">Thank you for the tip. = You solved my issue.<br class=3D""><div class=3D""> <div id=3D"signature" class=3D""><br class=3D"">Cheers,<br = class=3D"">Gervais<br class=3D""><br class=3D""><br class=3D""></div> </div> <br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On Sep 29, 2016, at 7:47 AM, Martin Perina <<a = href=3D"mailto:mperina@redhat.com" class=3D"">mperina@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif" = class=3D"gmail_default">Hi,<br class=3D""><br class=3D""></div><div = style=3D"font-family:arial,helvetica,sans-serif" = class=3D"gmail_default">please take a look at my inline comments:<br = class=3D""></div><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On Tue, Sep 27, 2016 at 7:23 PM, Gervais de = Montbrun <span dir=3D"ltr" class=3D""><<a target=3D"_blank" = href=3D"mailto:gervais@demontbrun.com" = class=3D"">gervais@demontbrun.com</a>></span> wrote:<br = class=3D""><blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px = solid rgb(204,204,204);padding-left:1ex" class=3D"gmail_quote"><div = style=3D"word-wrap:break-word" class=3D"">Hey All,<div class=3D""><br = class=3D""></div><div class=3D"">Since updating to 4.0.x of oVirt, I = have had an issue with my hosted engine. After a some poking around, I = think I have figured out my issue and thought I would share to see what = others think.</div><div class=3D"">The issue has existed with 4.0, = 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.</div><div class=3D""><br = class=3D""></div><div class=3D"">Description:</div><div class=3D"">When = my hosted engine starts it reports that it is in a degraded state with 7 = or 8 services still not started when I run systemctl status. It takes = about 6 or 7 minutes to eventually start all the services and come = online. If I don't set my cluster to Global-Maintenance mode it = eventually thinks that my hosted-engine needs to be rebooted and = restarts it before it can start everything.</div></div></blockquote><div = class=3D""><br class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"gmail_default">=E2=80=8BCould you please share with us logs = gathered by ovirt-log-collector?<br class=3D""><br class=3D"">It's just = a guess but could you please take a look if you HE VM has enough = entropy?<br class=3D""><br class=3D""> cat = /proc/sys/kernel/random/entropy_avail<br class=3D""><br = class=3D""></div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"gmail_default">If the value is low (below or around 200), = you really need to install and configure some entropy generator such as = haveged<br class=3D""><br class=3D""></div></div><blockquote = style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid = rgb(204,204,204);padding-left:1ex" class=3D"gmail_quote"><div = style=3D"word-wrap:break-word" class=3D""><div class=3D""><br = class=3D""></div><div class=3D"">Solution:</div><div class=3D"">I = realized that Apache was the culprit and found that the proxy to the = ovirt-engine in /etc/httpd/conf.d/z-ovirt-<wbr = class=3D"">engine-proxy.conf has a super long timeout with many retries. = I changed the settings and now everything works for me.</div><div = class=3D""><br class=3D""></div><div class=3D"">-> Before = change:</div><blockquote style=3D"margin:0px 0px 0px = 40px;border-width:medium;border-style:none;border-color:-moz-use-text-colo= r;padding:0px" class=3D""><div class=3D""><div class=3D""> = <LocationMatch ^/(ovirt-engine($|/)|api($|/)|<wbr = class=3D"">RHEVManagerWeb/|<wbr class=3D"">OvirtEngineWeb/|ca.crt$|<wbr = class=3D"">engine.ssh.key.txt$|rhevm.ssh.<wbr = class=3D"">key.txt$)></div><div class=3D""> = ProxyPassMatch <a class=3D"">ajp://127.0.0.1:8702</a> = timeout=3D3600 retry=3D5</div><div class=3D""><br class=3D""></div><div = class=3D""> <IfModule = deflate_module></div><div class=3D""> = AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml</div><div class=3D""> = </IfModule></div><div class=3D""> = </LocationMatch></div></div></blockquote><div class=3D""><br = class=3D""></div>-> After change:<blockquote style=3D"margin:0px 0px = 0px = 40px;border-width:medium;border-style:none;border-color:-moz-use-text-colo= r;padding:0px" class=3D""><div class=3D""><div class=3D""> = <LocationMatch ^/ovirt-engine($|/)></div><div class=3D""> = ProxyPassMatch <a class=3D"">ajp://127.0.0.1:8702</a>= timeout=3D5 retry=3D2</div><div class=3D""><br class=3D""></div><div = class=3D""> <IfModule = deflate_module></div><div class=3D""> = AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml</div><div class=3D""> = </IfModule></div><div class=3D""> = </LocationMatch></div></div></blockquote></div></blockquote><div = class=3D""><br class=3D""><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8BT= his one is correct for 4.0=E2=80=8B</div><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8B,= not sure why it was not updated during upgrade from 3.6. @Simone?<br = class=3D"">=E2=80=8B</div> </div><blockquote style=3D"margin:0px = 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" = class=3D"gmail_quote"><div style=3D"word-wrap:break-word" class=3D""><div = class=3D""><br class=3D""></div>If I read the timeout settings = correctly, it will wait 60 minutes with 5 retries. 5 hours is way too = long for my little server to hold onto all those apache processes. = </div></blockquote><blockquote style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" = class=3D"gmail_quote"><div style=3D"word-wrap:break-word" class=3D"">The = change I made allows for there to be an error, and also releases = apache's hold on the process. Once everything is ready, apache is ready = to serve requests and everything/everyone is happy. Before making the = change, I just get a whitescreen in my browser and then nothing works = until I restart Apache (or I end up in an endless loop of ovirt-ha = services restarting my hosted-engine.<br = class=3D""></div></blockquote><div class=3D""><br class=3D""><div = class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8BW= ell, if you have an issue with too many apache processes waiting for = engine to respond, then there's some issue in engine. As I wrote above = please share the logs with us and check entropy.<br class=3D""><br = class=3D""></div><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">Thanks<br = class=3D""><br class=3D""></div><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">Martin = Perina<br class=3D"">=E2=80=8B</div> </div><blockquote = style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid = rgb(204,204,204);padding-left:1ex" class=3D"gmail_quote"><div = style=3D"word-wrap:break-word" class=3D""><div class=3D""><div = class=3D""><div class=3D""><br class=3D""></div><div class=3D"">I = noticed that this setting reverts to the original setting, so oVirt must = be writing this file. Perhaps these number can be changed in oVirt? If = not, I will just setup and ansible play to revert the settings with = working values and restart apache on my engine.</div><div = class=3D"">:-)</div><div class=3D""> <div class=3D""><br class=3D"">Cheers,<br class=3D"">Gervais<br = class=3D""><br class=3D""><br class=3D""></div> </div> <br class=3D""></div></div></div><br = class=3D"">______________________________<wbr = class=3D"">_________________<br class=3D""> Users mailing list<br class=3D""> <a href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D""> <a target=3D"_blank" rel=3D"noreferrer" = href=3D"http://lists.ovirt.org/mailman/listinfo/users" = class=3D"">http://lists.ovirt.org/<wbr = class=3D"">mailman/listinfo/users</a><br class=3D""> <br class=3D""></blockquote></div><br class=3D""></div></div> </div></blockquote></div><br class=3D""></div></body></html>= --Apple-Mail=_B51077DC-F506-4239-9A68-4C1B6F0E0FC4--

On Thu, Sep 29, 2016 at 2:51 PM, Gervais de Montbrun <gervais@demontbrun.com
wrote:
Hi Martin,
The entropy was super low. Somewhere around 140. I installed and configured haveged.service to start at bootup, reverted my apache changes... After a reboot, my systemctl status still says that there are 7 services queued (note that I erroneously said degraded in my previous email - the services are, in fact, queued), but the oVirt GUI comes up almost immediately and everything seems to be great.
Take care that using havaged on a VM should not be considered a good source of entropy and the oVirt PKi is managed by the engine. http://security.stackexchange.com/questions/34523/is-it- appropriate-to-use-haveged-as-a-source-of-entropy-on-virtual-machines A better approach is the virtio-rng paravirtualised rng driver as for patch https://gerrit.ovirt.org/#/c/62334/
Thank you for the tip. You solved my issue.
Cheers, Gervais
On Sep 29, 2016, at 7:47 AM, Martin Perina <mperina@redhat.com> wrote:
Hi,
please take a look at my inline comments:
On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun < gervais@demontbrun.com> wrote:
Hey All,
Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
Description: When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
Could you please share with us logs gathered by ovirt-log-collector?
It's just a guess but could you please take a look if you HE VM has enough entropy?
cat /proc/sys/kernel/random/entropy_avail
If the value is low (below or around 200), you really need to install and configure some entropy generator such as haveged
Solution: I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
-> Before change:
<LocationMatch ^/(ovirt-engine($|/)|api($|/)| RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$| rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
-> After change:
<LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
This one is correct for 4.0 , not sure why it was not updated during upgrade from 3.6. @Simone?
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.
The change I made allows for there to be an error, and also releases
apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy.
Thanks
Martin Perina
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine. :-)
Cheers, Gervais
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Sep 29, 2016, at 10:01 AM, Simone Tiraboschi <stirabos@redhat.com> = wrote: =20 =20 =20 On Thu, Sep 29, 2016 at 2:51 PM, Gervais de Montbrun = <gervais@demontbrun.com <mailto:gervais@demontbrun.com>> wrote: Hi Martin, =20 The entropy was super low. Somewhere around 140. I installed and = configured haveged.service to start at bootup, reverted my apache = changes... After a reboot, my systemctl status still says that there are = 7 services queued (note that I erroneously said degraded in my previous = email - the services are, in fact, queued), but the oVirt GUI comes up = almost immediately and everything seems to be great. =20 =20 Take care that using havaged on a VM should not be considered a good =
= http://security.stackexchange.com/questions/34523/is-it-appropriate-to-use= -haveged-as-a-source-of-entropy-on-virtual-machines = <http://security.stackexchange.com/questions/34523/is-it-appropriate-to-us= e-haveged-as-a-source-of-entropy-on-virtual-machines> =20 A better approach is the virtio-rng paravirtualised rng driver as for =
=20 =20 Thank you for the tip. You solved my issue. =20 Cheers, Gervais =20 =20 =20
On Sep 29, 2016, at 7:47 AM, Martin Perina <mperina@redhat.com = <mailto:mperina@redhat.com>> wrote: =20 Hi, =20 please take a look at my inline comments: =20 On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun = <gervais@demontbrun.com <mailto:gervais@demontbrun.com>> wrote: Hey All, =20 Since updating to 4.0.x of oVirt, I have had an issue with my hosted = engine. After a some poking around, I think I have figured out my issue = and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists = in 4.0.4. =20 Description: When my hosted engine starts it reports that it is in a degraded = state with 7 or 8 services still not started when I run systemctl = status. It takes about 6 or 7 minutes to eventually start all the = services and come online. If I don't set my cluster to = Global-Maintenance mode it eventually thinks that my hosted-engine needs = to be rebooted and restarts it before it can start everything. =20 =E2=80=8BCould you please share with us logs gathered by = ovirt-log-collector? =20 It's just a guess but could you please take a look if you HE VM has = enough entropy? =20 cat /proc/sys/kernel/random/entropy_avail =20 If the value is low (below or around 200), you really need to = install and configure some entropy generator such as haveged =20 =20 Solution: I realized that Apache was the culprit and found that the proxy to =
--Apple-Mail=_7BF15C59-FE49-4792-B9A3-069F1A828C8D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Simone, Thanks for the info. I'll look at the solution that you suggested. Cheers, Gervais source of entropy and the oVirt PKi is managed by the engine. patch https://gerrit.ovirt.org/#/c/62334/ = <https://gerrit.ovirt.org/#/c/62334/> the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a = super long timeout with many retries. I changed the settings and now = everything works for me.
=20 -> Before change: <LocationMatch = ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engi= ne.ssh.key.txt$|rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=3D3600 retry=3D5=
=20 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> =20 -> After change: <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=3D5 retry=3D2 =20 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> =20 =E2=80=8BThis one is correct for 4.0=E2=80=8B=E2=80=8B, not sure why = it was not updated during upgrade from 3.6. @Simone? =E2=80=8B=20 =20 If I read the timeout settings correctly, it will wait 60 minutes = with 5 retries. 5 hours is way too long for my little server to hold = onto all those apache processes. The change I made allows for there to be an error, and also releases = apache's hold on the process. Once everything is ready, apache is ready = to serve requests and everything/everyone is happy. Before making the = change, I just get a whitescreen in my browser and then nothing works = until I restart Apache (or I end up in an endless loop of ovirt-ha = services restarting my hosted-engine. =20 =E2=80=8BWell, if you have an issue with too many apache processes = waiting for engine to respond, then there's some issue in engine. As I = wrote above please share the logs with us and check entropy. =20 Thanks =20 Martin Perina =E2=80=8B=20 =20 I noticed that this setting reverts to the original setting, so oVirt = must be writing this file. Perhaps these number can be changed in oVirt? = If not, I will just setup and ansible play to revert the settings with = working values and restart apache on my engine. :-) =20 Cheers, Gervais =20 =20 =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>
--Apple-Mail=_7BF15C59-FE49-4792-B9A3-069F1A828C8D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">Hi Simone,<div class=3D""><br class=3D""></div><div = class=3D"">Thanks for the info. I'll look at the solution that you = suggested.<br class=3D""><div class=3D""> <div id=3D"signature" class=3D""><br class=3D"">Cheers,<br = class=3D"">Gervais<br class=3D""><br class=3D""><br class=3D""></div> </div> <br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On Sep 29, 2016, at 10:01 AM, Simone Tiraboschi <<a = href=3D"mailto:stirabos@redhat.com" class=3D"">stirabos@redhat.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><br = class=3D"Apple-interchange-newline"><br style=3D"font-family: Helvetica; = font-size: 12px; font-style: normal; font-variant-caps: normal; = font-weight: normal; letter-spacing: normal; orphans: auto; text-align: = start; text-indent: 0px; text-transform: none; white-space: normal; = widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" = class=3D""><div class=3D"gmail_quote" style=3D"font-family: Helvetica; = font-size: 12px; font-style: normal; font-variant-caps: normal; = font-weight: normal; letter-spacing: normal; orphans: auto; text-align: = start; text-indent: 0px; text-transform: none; white-space: normal; = widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On = Thu, Sep 29, 2016 at 2:51 PM, Gervais de Montbrun<span = class=3D"Apple-converted-space"> </span><span dir=3D"ltr" = class=3D""><<a href=3D"mailto:gervais@demontbrun.com" target=3D"_blank"= class=3D"">gervais@demontbrun.com</a>></span><span = class=3D"Apple-converted-space"> </span>wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin: 0px 0px = 0px 0.8ex; border-left-width: 1px; border-left-style: solid; = border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div = style=3D"word-wrap: break-word;" class=3D"">Hi Martin,<div class=3D""><br = class=3D""></div><div class=3D"">The entropy was super low. Somewhere = around 140. I installed and configured haveged.service to start at = bootup, reverted my apache changes... After a reboot, my systemctl = status still says that there are 7 services queued (note that I = erroneously said degraded in my previous email - the services are, in = fact, queued), but the oVirt GUI comes up almost immediately and = everything seems to be great.</div><div class=3D""><br = class=3D""></div></div></blockquote><div class=3D"gmail_quote"><br = class=3D""></div>Take care that using havaged on a VM should not be = considered a good source of entropy and the oVirt PKi is managed by the = engine.<br class=3D""><a = href=3D"http://security.stackexchange.com/questions/34523/is-it-appropriat= e-to-use-haveged-as-a-source-of-entropy-on-virtual-machines" = target=3D"_blank" class=3D"">http://security.stackexchange.<wbr = class=3D"">com/questions/34523/is-it-<wbr = class=3D"">appropriate-to-use-haveged-as-<wbr = class=3D"">a-source-of-entropy-on-<wbr class=3D"">virtual-machines</a><br = class=3D""><br class=3D"">A better approach is the virtio-rng = paravirtualised rng driver as for patch <a = href=3D"https://gerrit.ovirt.org/#/c/62334/" = class=3D"">https://gerrit.ovirt.org/#/c/62334/</a></div><div = class=3D"gmail_quote" style=3D"font-family: Helvetica; font-size: 12px; = font-style: normal; font-variant-caps: normal; font-weight: normal; = letter-spacing: normal; orphans: auto; text-align: start; text-indent: = 0px; text-transform: none; white-space: normal; widows: auto; = word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class=3D""><div = class=3D""> </div><blockquote class=3D"gmail_quote" style=3D"margin: = 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; = border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div = style=3D"word-wrap: break-word;" class=3D""><div class=3D""></div><div = class=3D"">Thank you for the tip. You solved my issue.<br class=3D""><div = class=3D""><div class=3D""><br class=3D"">Cheers,<br class=3D"">Gervais<br= class=3D""><br class=3D""><br class=3D""></div></div><div class=3D""><div= class=3D""><br class=3D""><div class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D"">On Sep 29, 2016, at 7:47 AM, Martin Perina = <<a href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>> wrote:</div><br class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"">Hi,<br = class=3D""><br class=3D""></div><div class=3D"">please take a look at my = inline comments:<br class=3D""></div><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote">On Tue, Sep 27, 2016 at 7:23 PM, = Gervais de Montbrun<span = class=3D"Apple-converted-space"> </span><span dir=3D"ltr" = class=3D""><<a href=3D"mailto:gervais@demontbrun.com" target=3D"_blank"= class=3D"">gervais@demontbrun.com</a>></span><span = class=3D"Apple-converted-space"> </span>wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin: 0px 0px = 0px 0.8ex; border-left-width: 1px; border-left-style: solid; = border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div = style=3D"word-wrap: break-word;" class=3D"">Hey All,<div class=3D""><br = class=3D""></div><div class=3D"">Since updating to 4.0.x of oVirt, I = have had an issue with my hosted engine. After a some poking around, I = think I have figured out my issue and thought I would share to see what = others think.</div><div class=3D"">The issue has existed with 4.0, = 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.</div><div class=3D""><br = class=3D""></div><div class=3D"">Description:</div><div class=3D"">When = my hosted engine starts it reports that it is in a degraded state with 7 = or 8 services still not started when I run systemctl status. It takes = about 6 or 7 minutes to eventually start all the services and come = online. If I don't set my cluster to Global-Maintenance mode it = eventually thinks that my hosted-engine needs to be rebooted and = restarts it before it can start everything.</div></div></blockquote><div = class=3D""><br class=3D""><div class=3D"">=E2=80=8BCould you please = share with us logs gathered by ovirt-log-collector?<br class=3D""><br = class=3D"">It's just a guess but could you please take a look if you HE = VM has enough entropy?<br class=3D""><br class=3D""> <span = class=3D"Apple-converted-space"> </span>cat = /proc/sys/kernel/random/entrop<wbr class=3D"">y_avail<br class=3D""><br = class=3D""></div><div class=3D"">If the value is low (below or around = 200), you really need to install and configure some entropy = generator such as haveged<br class=3D""><br = class=3D""></div></div><blockquote class=3D"gmail_quote" style=3D"margin: = 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; = border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div = style=3D"word-wrap: break-word;" class=3D""><div class=3D""><br = class=3D""></div><div class=3D"">Solution:</div><div class=3D"">I = realized that Apache was the culprit and found that the proxy to the = ovirt-engine in /etc/httpd/conf.d/z-ovirt-e<wbr = class=3D"">ngine-proxy.conf has a super long timeout with many retries. = I changed the settings and now everything works for me.</div><div = class=3D""><br class=3D""></div><div class=3D"">-> Before = change:</div><blockquote style=3D"margin: 0px 0px 0px 40px; = border-width: medium; border-style: none; padding: 0px;" class=3D""><div = class=3D""><div class=3D""> <span = class=3D"Apple-converted-space"> </span><LocationMatch = ^/(ovirt-engine($|/)|api($|/)|<wbr = class=3D"">RHEVManagerWeb/|OvirtEngineWeb<wbr = class=3D"">/|ca.crt$|engine.ssh.key.txt$|<wbr = class=3D"">rhevm.ssh.key.txt$)></div><div class=3D""> = <span = class=3D"Apple-converted-space"> </span>ProxyPassMatch<span = class=3D"Apple-converted-space"> </span><a = class=3D"">ajp://127.0.0.1:8702</a><span = class=3D"Apple-converted-space"> </span>timeout=3D3600 = retry=3D5</div><div class=3D""><br class=3D""></div><div class=3D""> = <span = class=3D"Apple-converted-space"> </span><IfModule = deflate_module></div><div class=3D""> = <span = class=3D"Apple-converted-space"> </span>AddOutputFilterByType = DEFLATE text/javascript text/css text/html text/xml text/json = application/xml application/json application/x-yaml</div><div = class=3D""> <span = class=3D"Apple-converted-space"> </span></IfModule></div><div = class=3D""> <span = class=3D"Apple-converted-space"> </span></LocationMatch></div><= /div></blockquote><div class=3D""><br class=3D""></div>-> After = change:<blockquote style=3D"margin: 0px 0px 0px 40px; border-width: = medium; border-style: none; padding: 0px;" class=3D""><div class=3D""><div= class=3D""> <span = class=3D"Apple-converted-space"> </span><LocationMatch = ^/ovirt-engine($|/)></div><div class=3D""> = <span = class=3D"Apple-converted-space"> </span>ProxyPassMatch<span = class=3D"Apple-converted-space"> </span><a = class=3D"">ajp://127.0.0.1:8702</a><span = class=3D"Apple-converted-space"> </span>timeout=3D5 = retry=3D2</div><div class=3D""><br class=3D""></div><div class=3D""> = <span = class=3D"Apple-converted-space"> </span><IfModule = deflate_module></div><div class=3D""> = <span = class=3D"Apple-converted-space"> </span>AddOutputFilterByType = DEFLATE text/javascript text/css text/html text/xml text/json = application/xml application/json application/x-yaml</div><div = class=3D""> <span = class=3D"Apple-converted-space"> </span></IfModule></div><div = class=3D""> <span = class=3D"Apple-converted-space"> </span></LocationMatch></div><= /div></blockquote></div></blockquote><div class=3D""><br class=3D""><div = style=3D"font-family: arial, helvetica, sans-serif; display: inline;" = class=3D"">=E2=80=8BThis one is correct for 4.0=E2=80=8B</div><div = style=3D"font-family: arial, helvetica, sans-serif; display: inline;" = class=3D"">=E2=80=8B, not sure why it was not updated during upgrade = from 3.6. @Simone?<br class=3D"">=E2=80=8B</div> </div><blockquote = class=3D"gmail_quote" style=3D"margin: 0px 0px 0px 0.8ex; = border-left-width: 1px; border-left-style: solid; border-left-color: = rgb(204, 204, 204); padding-left: 1ex;"><div style=3D"word-wrap: = break-word;" class=3D""><div class=3D""><br class=3D""></div>If I read = the timeout settings correctly, it will wait 60 minutes with 5 retries. = 5 hours is way too long for my little server to hold onto all those = apache processes.</div></blockquote><blockquote class=3D"gmail_quote" = style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: 1px; = border-left-style: solid; border-left-color: rgb(204, 204, 204); = padding-left: 1ex;"><div style=3D"word-wrap: break-word;" class=3D"">The = change I made allows for there to be an error, and also releases = apache's hold on the process. Once everything is ready, apache is ready = to serve requests and everything/everyone is happy. Before making the = change, I just get a whitescreen in my browser and then nothing works = until I restart Apache (or I end up in an endless loop of ovirt-ha = services restarting my hosted-engine.<br = class=3D""></div></blockquote><div class=3D""><br class=3D""><div = style=3D"font-family: arial, helvetica, sans-serif; display: inline;" = class=3D"">=E2=80=8BWell, if you have an issue with too many apache = processes waiting for engine to respond, then there's some issue in = engine. As I wrote above please share the logs with us and check = entropy.<br class=3D""><br class=3D""></div><div style=3D"font-family: = arial, helvetica, sans-serif; display: inline;" class=3D"">Thanks<br = class=3D""><br class=3D""></div><div style=3D"font-family: arial, = helvetica, sans-serif; display: inline;" class=3D"">Martin Perina<br = class=3D"">=E2=80=8B</div> </div><blockquote class=3D"gmail_quote" = style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: 1px; = border-left-style: solid; border-left-color: rgb(204, 204, 204); = padding-left: 1ex;"><div style=3D"word-wrap: break-word;" class=3D""><div = class=3D""><div class=3D""><div class=3D""><br class=3D""></div><div = class=3D"">I noticed that this setting reverts to the original setting, = so oVirt must be writing this file. Perhaps these number can be changed = in oVirt? If not, I will just setup and ansible play to revert the = settings with working values and restart apache on my engine.</div><div = class=3D"">:-)</div><div class=3D""><div class=3D""><br = class=3D"">Cheers,<br class=3D"">Gervais<br class=3D""><br class=3D""><br = class=3D""></div></div><br class=3D""></div></div></div><br = class=3D"">______________________________<wbr = class=3D"">_________________<br class=3D"">Users mailing list<br = class=3D""><a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""><a rel=3D"noreferrer" = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a></blockquote></div></div></div></div></block= quote></div></div></div></div></div></blockquote></div></div></blockquote>= </div><br class=3D""></div></body></html>= --Apple-Mail=_7BF15C59-FE49-4792-B9A3-069F1A828C8D--

On Thu, Sep 29, 2016 at 12:47 PM, Martin Perina <mperina@redhat.com> wrote:
Hi,
please take a look at my inline comments:
On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun < gervais@demontbrun.com> wrote:
Hey All,
Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
Description: When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
Could you please share with us logs gathered by ovirt-log-collector?
It's just a guess but could you please take a look if you HE VM has enough entropy?
cat /proc/sys/kernel/random/entropy_avail
If the value is low (below or around 200), you really need to install and configure some entropy generator such as haveged
Solution: I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
-> Before change:
<LocationMatch ^/(ovirt-engine($|/)|api($|/)| RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$| rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
-> After change:
<LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
This one is correct for 4.0 , not sure why it was not updated during upgrade from 3.6. @Simone?
Honestly it's <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch> also on a fresh 4.0 engine from our latest engine-appliance.
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.
The change I made allows for there to be an error, and also releases
apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy.
Thanks
Martin Perina
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine. :-)
Cheers, Gervais
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Sep 29, 2016, at 10:04 AM, Simone Tiraboschi <stirabos@redhat.com> = wrote: =20 =20 =20 On Thu, Sep 29, 2016 at 12:47 PM, Martin Perina <mperina@redhat.com = <mailto:mperina@redhat.com>> wrote: Hi, =20 please take a look at my inline comments: =20 On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun = <gervais@demontbrun.com <mailto:gervais@demontbrun.com>> wrote: Hey All, =20 Since updating to 4.0.x of oVirt, I have had an issue with my hosted = engine. After a some poking around, I think I have figured out my issue = and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists = in 4.0.4. =20 Description: When my hosted engine starts it reports that it is in a degraded state = with 7 or 8 services still not started when I run systemctl status. It = takes about 6 or 7 minutes to eventually start all the services and come = online. If I don't set my cluster to Global-Maintenance mode it = eventually thinks that my hosted-engine needs to be rebooted and = restarts it before it can start everything. =20 =E2=80=8BCould you please share with us logs gathered by = ovirt-log-collector? =20 It's just a guess but could you please take a look if you HE VM has = enough entropy? =20 cat /proc/sys/kernel/random/entropy_avail =20 If the value is low (below or around 200), you really need to install = and configure some entropy generator such as haveged =20 =20 Solution: I realized that Apache was the culprit and found that the proxy to the = ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super = long timeout with many retries. I changed the settings and now = everything works for me. =20 -> Before change: <LocationMatch = ^/(ovirt-engine($|/)|api($|/)|RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engi= ne.ssh.key.txt$|rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=3D3600 retry=3D5 =20 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> =20 -> After change: <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 <> timeout=3D5 retry=3D2 =20 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> =20 =E2=80=8BThis one is correct for 4.0=E2=80=8B=E2=80=8B, not sure why = it was not updated during upgrade from 3.6. @Simone? =E2=80=8B =20 Honestly it's <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 <http://127.0.0.1:8702/> = timeout=3D3600 retry=3D5 =20 <IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml </IfModule> </LocationMatch> also on a fresh 4.0 engine from our latest engine-appliance. =20 =20 If I read the timeout settings correctly, it will wait 60 minutes with = 5 retries. 5 hours is way too long for my little server to hold onto all =
--Apple-Mail=_19C0B956-90E3-4499-AC9F-B50879BD6D0A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Simone, Yes... I guess it was not clear in my original email. I changed the = numbers myself to lower the timeout and retries. With them set as they = were set by ovirt (timeout=3D3600 retry=3D5) things were not working for = me.=20 Cheers, Gervais those apache processes.
The change I made allows for there to be an error, and also releases = apache's hold on the process. Once everything is ready, apache is ready = to serve requests and everything/everyone is happy. Before making the = change, I just get a whitescreen in my browser and then nothing works = until I restart Apache (or I end up in an endless loop of ovirt-ha = services restarting my hosted-engine. =20 =E2=80=8BWell, if you have an issue with too many apache processes = waiting for engine to respond, then there's some issue in engine. As I = wrote above please share the logs with us and check entropy. =20 Thanks =20 Martin Perina =E2=80=8B=20 =20 I noticed that this setting reverts to the original setting, so oVirt = must be writing this file. Perhaps these number can be changed in oVirt? = If not, I will just setup and ansible play to revert the settings with = working values and restart apache on my engine. :-) =20 Cheers, Gervais =20 =20 =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20
--Apple-Mail=_19C0B956-90E3-4499-AC9F-B50879BD6D0A Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">Hi Simone,<div class=3D""><br class=3D""></div><div = class=3D"">Yes... I guess it was not clear in my original email. I = changed the numbers myself to lower the timeout and retries. With them = set as they were set by ovirt (timeout=3D3600 retry=3D5) things were not = working for me. <br class=3D""><div class=3D""> <div id=3D"signature" class=3D""><br class=3D"">Cheers,<br = class=3D"">Gervais<br class=3D""><br class=3D""><br class=3D""></div> </div> <br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On Sep 29, 2016, at 10:04 AM, Simone Tiraboschi <<a = href=3D"mailto:stirabos@redhat.com" class=3D"">stirabos@redhat.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><br class=3D""><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote">On Thu, Sep 29, 2016 at 12:47 PM, = Martin Perina <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div = dir=3D"ltr" class=3D""><div class=3D"">Hi,<br class=3D""><br = class=3D""></div><div class=3D"">please take a look at my inline = comments:<br class=3D""></div><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote"><span class=3D"gmail-">On Tue, Sep = 27, 2016 at 7:23 PM, Gervais de Montbrun <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:gervais@demontbrun.com" target=3D"_blank"= class=3D"">gervais@demontbrun.com</a>></span> wrote:<br = class=3D""><blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px = solid rgb(204,204,204);padding-left:1ex" class=3D"gmail_quote"><div = style=3D"word-wrap:break-word" class=3D"">Hey All,<div class=3D""><br = class=3D""></div><div class=3D"">Since updating to 4.0.x of oVirt, I = have had an issue with my hosted engine. After a some poking around, I = think I have figured out my issue and thought I would share to see what = others think.</div><div class=3D"">The issue has existed with 4.0, = 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.</div><div class=3D""><br = class=3D""></div><div class=3D"">Description:</div><div class=3D"">When = my hosted engine starts it reports that it is in a degraded state with 7 = or 8 services still not started when I run systemctl status. It takes = about 6 or 7 minutes to eventually start all the services and come = online. If I don't set my cluster to Global-Maintenance mode it = eventually thinks that my hosted-engine needs to be rebooted and = restarts it before it can start = everything.</div></div></blockquote></span><div class=3D""><br = class=3D""><div class=3D"">=E2=80=8BCould you please share with us logs = gathered by ovirt-log-collector?<br class=3D""><br class=3D"">It's just = a guess but could you please take a look if you HE VM has enough = entropy?<br class=3D""><br class=3D""> cat = /proc/sys/kernel/random/<wbr class=3D"">entropy_avail<br class=3D""><br = class=3D""></div><div class=3D"">If the value is low (below or around = 200), you really need to install and configure some entropy = generator such as haveged<br class=3D""><br class=3D""></div></div><span = class=3D"gmail-"><blockquote style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" = class=3D"gmail_quote"><div style=3D"word-wrap:break-word" class=3D""><div = class=3D""><br class=3D""></div><div class=3D"">Solution:</div><div = class=3D"">I realized that Apache was the culprit and found that the = proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-e<wbr = class=3D"">ngine-proxy.conf has a super long timeout with many retries. = I changed the settings and now everything works for me.</div><div = class=3D""><br class=3D""></div><div class=3D"">-> Before = change:</div><blockquote style=3D"margin:0px 0px 0px = 40px;border-width:medium;border-style:none;padding:0px" class=3D""><div = class=3D""><div class=3D""> <LocationMatch = ^/(ovirt-engine($|/)|api($|/)|<wbr = class=3D"">RHEVManagerWeb/|OvirtEngineWeb<wbr = class=3D"">/|ca.crt$|engine.ssh.key.txt$|<wbr = class=3D"">rhevm.ssh.key.txt$)></div><div class=3D""> = ProxyPassMatch <a class=3D"">ajp://127.0.0.1:8702</a> = timeout=3D3600 retry=3D5</div><div class=3D""><br class=3D""></div><div = class=3D""> <IfModule = deflate_module></div><div class=3D""> = AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml</div><div class=3D""> = </IfModule></div><div class=3D""> = </LocationMatch></div></div></blockquote><div class=3D""><br = class=3D""></div>-> After change:<blockquote style=3D"margin:0px 0px = 0px 40px;border-width:medium;border-style:none;padding:0px" = class=3D""><div class=3D""><div class=3D""> = <LocationMatch ^/ovirt-engine($|/)></div><div class=3D""> = ProxyPassMatch <a class=3D"">ajp://127.0.0.1:8702</a>= timeout=3D5 retry=3D2</div><div class=3D""><br class=3D""></div><div = class=3D""> <IfModule = deflate_module></div><div class=3D""> = AddOutputFilterByType DEFLATE text/javascript text/css = text/html text/xml text/json application/xml application/json = application/x-yaml</div><div class=3D""> = </IfModule></div><div class=3D""> = </LocationMatch></div></div></blockquote></div></blockquote></span><= div class=3D""><br class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BThis one is correct for 4.0=E2=80=8B</div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8B, not sure why it was not updated during upgrade = from 3.6. @Simone?<br = class=3D"">=E2=80=8B</div></div></div></div></div></blockquote><div = class=3D""><br class=3D""></div><div class=3D"">Honestly it's</div><div = class=3D""><div class=3D""> <LocationMatch = ^/ovirt-engine($|/)></div><div class=3D""> = ProxyPassMatch ajp://<a href=3D"http://127.0.0.1:8702/" = class=3D"">127.0.0.1:8702</a> timeout=3D3600 retry=3D5</div><div = class=3D""><br class=3D""></div><div class=3D""> = <IfModule deflate_module></div><div class=3D""> = AddOutputFilterByType DEFLATE = text/javascript text/css text/html text/xml text/json application/xml = application/json application/x-yaml</div><div class=3D""> = </IfModule></div><div class=3D""> = </LocationMatch></div></div><div class=3D"">also on a fresh 4.0 = engine from our latest engine-appliance.</div><div = class=3D""> </div><blockquote class=3D"gmail_quote" = style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid = rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><span = class=3D"gmail-"><blockquote style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" = class=3D"gmail_quote"><div style=3D"word-wrap:break-word" class=3D""><div = class=3D""><br class=3D""></div>If I read the timeout settings = correctly, it will wait 60 minutes with 5 retries. 5 hours is way too = long for my little server to hold onto all those apache processes. = </div></blockquote><blockquote style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" = class=3D"gmail_quote"><div style=3D"word-wrap:break-word" class=3D"">The = change I made allows for there to be an error, and also releases = apache's hold on the process. Once everything is ready, apache is ready = to serve requests and everything/everyone is happy. Before making the = change, I just get a whitescreen in my browser and then nothing works = until I restart Apache (or I end up in an endless loop of ovirt-ha = services restarting my hosted-engine.<br = class=3D""></div></blockquote></span><div class=3D""><br class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BWell, if you have an issue with too many apache = processes waiting for engine to respond, then there's some issue in = engine. As I wrote above please share the logs with us and check = entropy.<br class=3D""><br class=3D""></div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">Thanks<br class=3D""><br class=3D""></div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">Martin Perina<br class=3D"">=E2=80=8B</div> </div><blockqu= ote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid = rgb(204,204,204);padding-left:1ex" class=3D"gmail_quote"><span = class=3D"gmail-"><div style=3D"word-wrap:break-word" class=3D""><div = class=3D""><div class=3D""><div class=3D""><br class=3D""></div><div = class=3D"">I noticed that this setting reverts to the original setting, = so oVirt must be writing this file. Perhaps these number can be changed = in oVirt? If not, I will just setup and ansible play to revert the = settings with working values and restart apache on my engine.</div><div = class=3D"">:-)</div><div class=3D""> <div class=3D""><br class=3D"">Cheers,<br class=3D"">Gervais<br = class=3D""><br class=3D""><br class=3D""></div> </div> <br class=3D""></div></div></div><br = class=3D""></span>______________________________<wbr = class=3D"">_________________<br class=3D""> Users mailing list<br class=3D""> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> <a rel=3D"noreferrer" = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> <br class=3D""></blockquote></div><br class=3D""></div></div> </blockquote></div><br class=3D""></div></div> </div></blockquote></div><br class=3D""></div></body></html>= --Apple-Mail=_19C0B956-90E3-4499-AC9F-B50879BD6D0A--

On Thu, Sep 29, 2016 at 3:04 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Thu, Sep 29, 2016 at 12:47 PM, Martin Perina <mperina@redhat.com> wrote:
Hi,
please take a look at my inline comments:
On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun < gervais@demontbrun.com> wrote:
Hey All,
Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
Description: When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
Could you please share with us logs gathered by ovirt-log-collector?
It's just a guess but could you please take a look if you HE VM has enough entropy?
cat /proc/sys/kernel/random/entropy_avail
If the value is low (below or around 200), you really need to install and configure some entropy generator such as haveged
Solution: I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
-> Before change:
<LocationMatch ^/(ovirt-engine($|/)|api($|/)| RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$| rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
-> After change:
<LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
This one is correct for 4.0 , not sure why it was not updated during upgrade from 3.6. @Simone?
Honestly it's <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch> also on a fresh 4.0 engine from our latest engine-appliance.
Right, I missed the timeout/retry option changes. But the important part is why old configuration (with different LocationMatch) was not overwritten during upgrade.
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.
The change I made allows for there to be an error, and also releases
apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy.
Thanks
Martin Perina
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine. :-)
Cheers, Gervais
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Thu, Sep 29, 2016 at 3:11 PM, Martin Perina <mperina@redhat.com> wrote:
On Thu, Sep 29, 2016 at 3:04 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Thu, Sep 29, 2016 at 12:47 PM, Martin Perina <mperina@redhat.com> wrote:
Hi,
please take a look at my inline comments:
On Tue, Sep 27, 2016 at 7:23 PM, Gervais de Montbrun < gervais@demontbrun.com> wrote:
Hey All,
Since updating to 4.0.x of oVirt, I have had an issue with my hosted engine. After a some poking around, I think I have figured out my issue and thought I would share to see what others think. The issue has existed with 4.0, 4.0.1, 4.0.2, 4.0.3, and still exists in 4.0.4.
Description: When my hosted engine starts it reports that it is in a degraded state with 7 or 8 services still not started when I run systemctl status. It takes about 6 or 7 minutes to eventually start all the services and come online. If I don't set my cluster to Global-Maintenance mode it eventually thinks that my hosted-engine needs to be rebooted and restarts it before it can start everything.
Could you please share with us logs gathered by ovirt-log-collector?
It's just a guess but could you please take a look if you HE VM has enough entropy?
cat /proc/sys/kernel/random/entropy_avail
If the value is low (below or around 200), you really need to install and configure some entropy generator such as haveged
Solution: I realized that Apache was the culprit and found that the proxy to the ovirt-engine in /etc/httpd/conf.d/z-ovirt-engine-proxy.conf has a super long timeout with many retries. I changed the settings and now everything works for me.
-> Before change:
<LocationMatch ^/(ovirt-engine($|/)|api($|/)| RHEVManagerWeb/|OvirtEngineWeb/|ca.crt$|engine.ssh.key.txt$| rhevm.ssh.key.txt$)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
-> After change:
<LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=5 retry=2
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch>
This one is correct for 4.0 , not sure why it was not updated during upgrade from 3.6. @Simone?
Honestly it's <LocationMatch ^/ovirt-engine($|/)> ProxyPassMatch ajp://127.0.0.1:8702 timeout=3600 retry=5
<IfModule deflate_module> AddOutputFilterByType DEFLATE text/javascript text/css text/html text/xml text/json application/xml application/json application/x-yaml </IfModule> </LocationMatch> also on a fresh 4.0 engine from our latest engine-appliance.
Right, I missed the timeout/retry option changes. But the important part is why old configuration (with different LocationMatch) was not overwritten during upgrade.
I suspect that it could got overwritten a second time to its 3.6 value in our backup/restore procedure. Adding Didi here.
If I read the timeout settings correctly, it will wait 60 minutes with 5 retries. 5 hours is way too long for my little server to hold onto all those apache processes.
The change I made allows for there to be an error, and also releases
apache's hold on the process. Once everything is ready, apache is ready to serve requests and everything/everyone is happy. Before making the change, I just get a whitescreen in my browser and then nothing works until I restart Apache (or I end up in an endless loop of ovirt-ha services restarting my hosted-engine.
Well, if you have an issue with too many apache processes waiting for engine to respond, then there's some issue in engine. As I wrote above please share the logs with us and check entropy.
Thanks
Martin Perina
I noticed that this setting reverts to the original setting, so oVirt must be writing this file. Perhaps these number can be changed in oVirt? If not, I will just setup and ansible play to revert the settings with working values and restart apache on my engine. :-)
Cheers, Gervais
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (4)
-
Artyom Lukianov
-
Gervais de Montbrun
-
Martin Perina
-
Simone Tiraboschi