Hi all,
I was working through the installation of ovirt-engine today (after
spending more time than I care to admit struggling with networking & DNS
issues - VPNs, dnsmasq, "classic" network start-up and iptables/firewall
rules can interract with each other in strange and surprising ways).
Anyway - I went through the engine set-up successfully, and got the
expected message at the end: "**** Installation completed successfully
******" with a message to visit the engine web application to finish set-up.
Unfortunately, when I connected (after resolving networking issues) to
the server in question, I got a "Service temporarily unavailable" error
(503) from Apache.
in httpd's error.log, I have:
[Fri Sep 21 13:37:03 2012] [error] (111)Connection refused: proxy:
AJP: attempt to connect to 127.0.0.1:8009 (localhost) failed
[Fri Sep 21 13:37:03 2012] [error] ap_proxy_connect_backend disabling worker for
(localhost)
[Fri Sep 21 13:37:03 2012] [error] proxy: AJP: failed to make connection to backend:
localhost
When I try to restart the ovirt-engine service, I get the following in
journalctl:
Sep 21 13:34:44 clare.neary.home engine-service.py[5172]: The engine
PID file "/var/run/ovirt-engine.pid" already exists.
Sep 21 13:34:44 clare.neary.home systemd[1]: PID 1264 read from file
/var/run/ovirt-engine.pid does not exist.
Sep 21 13:34:44 clare.neary.home systemd[1]: Unit ovirt-engine.service entered failed
state.
I tried to clean up and restart, but engine-cleanup failed:
[root@clare ovirt-engine]# engine-cleanup -u
Stopping JBoss service... [ DONE ]
Error: Couldn't connect to the database server.Check that connection is working and
rerun the cleanup utility
Error: Cleanup failed.
please check log at /var/log/ovirt-engine/engine-cleanup_2012_09_21_14_02_37.log
It turns out, in /var/log/messages, that I have these error messages:
Sep 21 14:00:59 clare pg_ctl[5298]: FATAL: could not create shared
memory segment: Invalid argument
Sep 21 14:00:59 clare pg_ctl[5298]: DETAIL: Failed system call was shmget(key=5432001,
size=36519936, 03600).
Sep 21 14:00:59 clare pg_ctl[5298]: HINT: This error usually means that PostgreSQL's
request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can
either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce
the request size (currently 36519936 bytes), reduce PostgreSQL's shared memory usage,
perhaps by reducing shared_buffers or max_connections.
Sep 21 14:00:59 clare pg_ctl[5298]: If the request size is already small, it's
possible that it is less than your kernel's SHMMIN parameter, in which case raising
the request size or reconfiguring SHMMIN is called for.
Sep 21 14:00:59 clare pg_ctl[5298]: The PostgreSQL documentation contains more
information about shared memory configuration.
Sep 21 14:01:03 clare pg_ctl[5298]: pg_ctl: could not start server
Sep 21 14:01:03 clare pg_ctl[5298]: Examine the log output.
Sep 21 14:01:03 clare systemd[1]: postgresql.service: control process exited, code=exited
status=1
Sep 21 14:01:03 clare systemd[1]: Unit postgresql.service entered failed state.
I increased the kernel's SHMMAX, and engine-cleanup worked correctly.
Has anyone else experienced this issue?
When I re-run engine-setup, I also got stuck when reconfiguring NFS -
when engine-setup asked me if I wanted to configure the NFS domain, I
said "yes", but then it refused to accept my input of "/mnt/iso" since
it was already in /etc/exports - perhaps engine-cleanup should also
remove ISO shares managed by ovirt-engine, or else handle more
gracefully when someone enters an existing export? The only fix I found
was to interrupt and restart the engine set-up.
Also, I have no idea whether allowing oVirt to manage iptables will keep
any extra rules I have added (specifically for DNS services on port 53
UDP) which I added to the iptables config. I didn't take the risk of
allowing it to reconfigure iptables the second time.
After all that, I got an error when starting the JBoss service:
Starting JBoss Service... [ ERROR ]
Error: Can't start the ovirt-engine service
Please check log file /var/log/ovirt-engine/engine-setup_2012_09_21_14_28_11.log for more
information
And when I checked that log file:
2012-09-21 14:30:02::DEBUG::common_utils::790::root:: starting
ovirt-engine
2012-09-21 14:30:02::DEBUG::common_utils::835::root:: executing action ovirt-engine on
service start
2012-09-21 14:30:02::DEBUG::common_utils::309::root:: Executing command -->
'/sbin/service ovirt-engine start'
2012-09-21 14:30:02::DEBUG::common_utils::335::root:: output =
2012-09-21 14:30:02::DEBUG::common_utils::336::root:: stderr = Redirecting to
/bin/systemctl start ovirt-engine.service
Job failed. See system journal and 'systemctl status' for details.
2012-09-21 14:30:02::DEBUG::common_utils::337::root:: retcode = 1
2012-09-21 14:30:02::DEBUG::setup_sequences::62::root:: Traceback (most recent call
last):
File "/usr/share/ovirt-engine/scripts/setup_sequences.py", line 60, in run
function()
File "/bin/engine-setup", line 1535, in _startJboss
srv.start(True)
File "/usr/share/ovirt-engine/scripts/common_utils.py", line 795, in start
raise Exception(output_messages.ERR_FAILED_START_SERVICE % self.name)
Exception: Error: Can't start the ovirt-engine service
And when I check the system journal, we're back to the service starts,
but the PID mentioned in the PID file does not exist.
Any pointers into how I might debug this issue? I haven't found anything
similar in a troubleshooting page, so perhaps it's not a common error?
Cheers,
Dave.
--
Dave Neary
Community Action and Impact
Open Source and Standards, Red Hat
Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13