[Users] Installation problem

Fri Sep 21 12:38:19 UTC 2012

Hi all,

I was working through the installation of ovirt-engine today (after 
spending more time than I care to admit struggling with networking & DNS 
issues - VPNs, dnsmasq, "classic" network start-up and iptables/firewall 
rules can interract with each other in strange and surprising ways).

Anyway - I went through the engine set-up successfully, and got the 
expected message at the end: "**** Installation completed successfully 
******" with a message to visit the engine web application to finish set-up.

Unfortunately, when I connected (after resolving networking issues) to 
the server in question, I got a "Service temporarily unavailable" error 
(503) from Apache.

in httpd's error.log, I have:
>  [Fri Sep 21 13:37:03 2012] [error] (111)Connection refused: proxy: AJP: attempt to connect to 127.0.0.1:8009 (localhost) failed
>  [Fri Sep 21 13:37:03 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
>  [Fri Sep 21 13:37:03 2012] [error] proxy: AJP: failed to make connection to backend: localhost

When I try to restart the ovirt-engine service, I get the following in 
journalctl:
>  Sep 21 13:34:44 clare.neary.home engine-service.py[5172]: The engine PID file "/var/run/ovirt-engine.pid" already exists.
>  Sep 21 13:34:44 clare.neary.home systemd[1]: PID 1264 read from file /var/run/ovirt-engine.pid does not exist.
>  Sep 21 13:34:44 clare.neary.home systemd[1]: Unit ovirt-engine.service entered failed state.

I tried to clean up and restart, but engine-cleanup failed:
> [root at clare ovirt-engine]# engine-cleanup -u
>
> Stopping JBoss service...                                [ DONE ]
>
> Error: Couldn't connect to the database server.Check that connection is working and rerun the cleanup utility
> Error: Cleanup failed.
> please check log at /var/log/ovirt-engine/engine-cleanup_2012_09_21_14_02_37.log

It turns out, in /var/log/messages, that I have these error messages:
> Sep 21 14:00:59 clare pg_ctl[5298]: FATAL:  could not create shared memory segment: Invalid argument
> Sep 21 14:00:59 clare pg_ctl[5298]: DETAIL:  Failed system call was shmget(key=5432001, size=36519936, 03600).
> Sep 21 14:00:59 clare pg_ctl[5298]: HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request size (currently 36519936 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
> Sep 21 14:00:59 clare pg_ctl[5298]: If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
> Sep 21 14:00:59 clare pg_ctl[5298]: The PostgreSQL documentation contains more information about shared memory configuration.
> Sep 21 14:01:03 clare pg_ctl[5298]: pg_ctl: could not start server
> Sep 21 14:01:03 clare pg_ctl[5298]: Examine the log output.
> Sep 21 14:01:03 clare systemd[1]: postgresql.service: control process exited, code=exited status=1
> Sep 21 14:01:03 clare systemd[1]: Unit postgresql.service entered failed state.

I increased the kernel's SHMMAX, and engine-cleanup worked correctly.

Has anyone else experienced this issue?

When I re-run engine-setup, I also got stuck when reconfiguring NFS - 
when engine-setup asked me if I wanted to configure the NFS domain, I 
said "yes", but then it refused to accept my input of "/mnt/iso" since 
it was already in /etc/exports - perhaps engine-cleanup should also 
remove ISO shares managed by ovirt-engine, or else handle more 
gracefully when someone enters an existing export? The only fix I found 
was to interrupt and restart the engine set-up.

Also, I have no idea whether allowing oVirt to manage iptables will keep 
any extra rules I have added (specifically for DNS services on port 53 
UDP) which I added to the iptables config. I didn't take the risk of 
allowing it to reconfigure iptables the second time.

After all that, I got an error when starting the JBoss service:

> Starting JBoss Service...                             [ ERROR ]
> Error: Can't start the ovirt-engine service
> Please check log file /var/log/ovirt-engine/engine-setup_2012_09_21_14_28_11.log for more information

And when I checked that log file:
> 2012-09-21 14:30:02::DEBUG::common_utils::790::root:: starting ovirt-engine
> 2012-09-21 14:30:02::DEBUG::common_utils::835::root:: executing action ovirt-engine on service start
> 2012-09-21 14:30:02::DEBUG::common_utils::309::root:: Executing command --> '/sbin/service ovirt-engine start'
> 2012-09-21 14:30:02::DEBUG::common_utils::335::root:: output =
> 2012-09-21 14:30:02::DEBUG::common_utils::336::root:: stderr = Redirecting to /bin/systemctl start  ovirt-engine.service
> Job failed. See system journal and 'systemctl status' for details.
>
> 2012-09-21 14:30:02::DEBUG::common_utils::337::root:: retcode = 1
> 2012-09-21 14:30:02::DEBUG::setup_sequences::62::root:: Traceback (most recent call last):
>   File "/usr/share/ovirt-engine/scripts/setup_sequences.py", line 60, in run
>     function()
>   File "/bin/engine-setup", line 1535, in _startJboss
>     srv.start(True)
>   File "/usr/share/ovirt-engine/scripts/common_utils.py", line 795, in start
>     raise Exception(output_messages.ERR_FAILED_START_SERVICE % self.name)
> Exception: Error: Can't start the ovirt-engine service

And when I check the system journal, we're back to the service starts, 
but the PID mentioned in the PID file does not exist.

Any pointers into how I might debug this issue? I haven't found anything 
similar in a troubleshooting page, so perhaps it's not a common error?

Cheers,
Dave.

-- 
Dave Neary
Community Action and Impact
Open Source and Standards, Red Hat
Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13