[ovirt-users] Adding another host to my cluster

Fri May 13 12:37:08 UTC 2016

Hi Charles,

I think the problem I am having is due to the setup failing and not
something in vdsm configs as I have never gotten this server to start up
properly and the BRIDGE ethernet interface + ovirt routes are not setup.

I put the logs here:
https://www.dropbox.com/sh/5ugyykqh1lgru9l/AACXxRYWr3tgd0WbBVFW5twHa?dl=0

hosted-engine--deploy-logs.zip # Logs from when I tried to deploy and it
failed
vdsm.tar.gz # /var/log/vdsm

Output from running vdsm from the command line:

[root at cultivar2 log]# su -s /bin/bash vdsm
[vdsm at cultivar2 log]$ python /usr/share/vdsm/vdsm
(PID: 6521) I am the actual vdsm 4.17.26-1.el7
cultivar2.grove.silverorange.com (3.10.0-327.el7.x86_64)
VDSM will run with cpu affinity: frozenset([1])
/usr/bin/taskset --all-tasks --pid --cpu-list 1 6521 (cwd None)
SUCCESS: <err> = ''; <rc> = 0
Starting scheduler vdsm.Scheduler
started
Run and protect:
registerDomainStateChangeCallback(callbackFunc=<functools.partial object at
0x381b158>)
Run and protect: registerDomainStateChangeCallback, Return response: None
Trying to connect to Super Vdsm
Preparing MOM interface
Using named unix socket /var/run/vdsm/mom-vdsm.sock
Unregistering all secrests
trying to connect libvirt
recovery: started
Setting channels' timeout to 30 seconds.
Starting VM channels listener thread.
Listening at 0.0.0.0:54321
Adding detector <rpc.bindingxmlrpc.XmlDetector instance at 0x3b4ecb0>
recovery: completed in 0s
Adding detector <yajsonrpc.stompreactor.StompDetector instance at 0x382e5a8>
Starting executor
Starting worker jsonrpc.Executor/0
Worker started
Starting worker jsonrpc.Executor/1
Worker started
Starting worker jsonrpc.Executor/2
Worker started
Starting worker jsonrpc.Executor/3
Worker started
Starting worker jsonrpc.Executor/4
Worker started
Starting worker jsonrpc.Executor/5
Worker started
Starting worker jsonrpc.Executor/6
Worker started
Starting worker jsonrpc.Executor/7
Worker started
XMLRPC server running
Starting executor
Starting worker periodic/0
Worker started
Starting worker periodic/1
Worker started
Starting worker periodic/2
Worker started
Starting worker periodic/3
Worker started
trying to connect libvirt
Panic: Connect to supervdsm service failed: [Errno 2] No such file or
directory
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsm.py", line 78, in _connect
    utils.retry(self._manager.connect, Exception, timeout=60, tries=3)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 959, in retry
    return func()
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in
connect
    conn = Client(self._address, authkey=self._authkey)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in
Client
    c = SocketClient(address)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in
SocketClient
    s.connect(address)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
Killed

Thanks for the help. It's really appreciated.

Cheers,
Gervais

On Fri, May 13, 2016 at 12:55 AM, Charles Tassell <ctassell at gmail.com>
wrote:

> Hi Gervais,
>
>   Hmm, can you tar up the logfiles (/var/log/vdsm/* on the host you are
> installing on) and put them somewhere to look at?  Also, I found that
> starting VDSM from the command line is useful as it sometimes spits out
> error messages that don't show up in the logs.  I think the command I used
> was:
> su -s /bin/bash vdsm
> python /usr/share/vdsm/vdsm
>
> My problem was that I customized the logging settings in /etc/vdsm/*conf
> to try and tone down the debugging stuff and had a syntax error.
>
>
> On 16-05-12 10:24 PM, Gervais de Montbrun wrote:
>
> Hi Charles,
>
> Thanks for the suggestion.
>
> I cleaned up again using the bash script from the
> recoving-from-failed-install link below, then reinstalled (yum install
> ovirt-hosted-engine-setup).
>
> I enabled NetworkManager and firewalld as you suggested. The install stops
> very early on with an error:
> [ ERROR ] Failed to execute stage 'Programs detection': hosted-engine
> cannot be deployed while NetworkManager is running, please stop and disable
> it before proceeding
>
> I disabled and stopped NetworkManager and tried again. Same result. :(
>
> Any more guesses?
>
> Cheers,
> Gervais
>
>
>
> On May 12, 2016, at 9:08 PM, Charles Tassell <ctassell at gmail.com> wrote:
>
> Hey Gervais,
>
> Try enabling NetworkManager and firewalld before doing the hosted-engine
> --deploy.  I have run into problems with oVirt trying to perform tasks on
> hosts where firewalld is disabled, so maybe you are running into a similar
> problem.  Also, I think the setup script will disable NetworkManager if it
> needs to.  I know I didn't manually disable it on any of the boxes I
> installed on.
>
> On 16-05-12 04:49 PM, users-request at ovirt.org wrote:
>
> Message: 1
> Date: Thu, 12 May 2016 14:22:12 -0300
> From: Gervais de Montbrun <gervais at demontbrun.com>
> To: Wee Sritippho <wee.s at forest.go.th>
> Cc: users <users at ovirt.org>
> Subject: Re: [ovirt-users] Adding another host to my cluster
> Message-ID: <28B7FC74-5C52-4F60-B9F3-39A36621A7CA at demontbrun.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Wee
> (and others)
>
> Thanks for the reply. I tried what you suggested, but I am in the exact
> same state. :-(
>
> I don't want to completely remove my hosted engine setup as it is working
> on the two other hosts in my cluster. I did not run the rm -rf stes listed
> here (
> <https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>
> https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
> <
> https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>)
> that would wipe my hosted_engine nfs mount. If you know that this is 100%
> necessary, please let me know.
>
> I did:
> hosted-engine --clean-metadata --force-cleanup --host-id=3
> run the bash script to remove all of the ovirt packages and config files
> reinstalled ovirt-hosted-engine-setup
> ran "hosted-engine --deploy"
>
> I'm back exactly where I started. Is there a way to run just the network
> configuration part of the deploy?
>
> Since the last attempt, I did upgrade my hosted engine and my cluster is
> now running oVirt 3.6.5.
>
> Cheers,
> Gervais
>
>
>
> On May 12, 2016, at 11:50 AM, Wee Sritippho <wee.s at forest.go.th> wrote:
>
> Hi,
>
> I used to have a similar problem where one of my host can't be deployed
> due to the absence of ovirtmgmt bridge. Simone said it's a bug (
> <https://bugzilla.redhat.com/1323465>https://bugzilla.redhat.com/1323465 <
> https://bugzilla.redhat.com/1323465> ) which would be fixed in 3.6.6.
>
> This is what I've done to solve it:
>
> 1. In the web UI, set the failed host to maintenance.
> 2. Remove it.
> 3. In that host, run a script from
> <https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>
> https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
> <
> https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
> >
> 4. Install ovirt-hosted-engine-setup again.
> 5. Redeploy again.
>
> Hope that helps
>
> On 11 ??????? 2016 22 ?????? 48 ???? 58 ?????? GMT+07:00, Gervais de
> Montbrun < <gervais at demontbrun.com>gervais at demontbrun.com> wrote:
> Hi Folks,
>
> I hate to reply to my own message, but I'm really hoping someone can help
> me with my issue
> http://lists.ovirt.org/pipermail/users/2016-May/039690.html <
> http://lists.ovirt.org/pipermail/users/2016-May/039690.html>
>
> Does anyone have a suggestion for me? If there is any more information
> that I can provide that would help you to help me, please advise.
>
> Cheers,
> Gervais
>
>
>
> On May 9, 2016, at 1:42 PM, Gervais de Montbrun <gervais at demontbrun.com
> <mailto:gervais at demontbrun.com>> wrote:
>
> Hi All,
>
> I'm trying to add a third host into my oVirt cluster. I have hosted engine
> setup on the first two. It's failing to finish the hosted-engine --deploy
> on this third host. I wiped the server and did a CentOS 7 minimum install
> and ran it again to have a clean machine.
>
> My setup:
> CentOS 7 clean install
> yum install -y http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
> <http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm>
> yum install -y ovirt-hosted-engine-setup
> yum upgrade -y && reboot
> systemctl disable NetworkManager ; systemctl stop NetworkManager ;
> systemctl disable firewalld ; systemctl stop firewalld
> hosted-engine --deploy
>
> hosted-engine --deploy always throws an error:
> [ ERROR ] The VDSM host was found in a failed state. Please check engine
> and bootstrap installation logs.
> [ ERROR ] Unable to add Cultivar2 to the manager
> and then echo's
> [ INFO  ] Waiting for VDSM hardware info
> ...
> [ ERROR ] Failed to execute stage 'Closing up': VDSM did not start within
> 120 seconds
> [ INFO  ] Stage: Clean up
> [ INFO  ] Generating answer file
> '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160509131103.conf'
> [ INFO  ] Stage: Pre-termination
> [ INFO  ] Stage: Termination
> [ ERROR ] Hosted Engine deployment failed: this system is not reliable,
> please check the issue, fix and redeploy
>     Log file is located at
> /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160509130658-qb8ev0.log
>
> Full output of hosted-engine --deploy included in the attached zip file.
> I've also included vdsm.log (There is more than one tries worth of tries
> in there).
> You'll also find the ovirt-hosted-engine-setup-20160509130658-qb8ev0.log
> listed above.
>
> This is my "test" setup. Cultivar0 is my first host and my nfs server for
> storage. I have two hosts in the setup already and everything is working
> fine. The host does show up in the oVirt admin, but shows "Installed Failed"
> <PastedGraphic-1.png>
>
> Trying to reinstall from within the interface just fails again.
>
> The ovirt bridge interface is not configured and there are no config files
> in /etc/sysconfi/network-scripts related to ovirt.
>
> OS:
> [root at cultivar2 ovirt-hosted-engine-setup]# cat /etc/redhat-release
> CentOS Linux release 7.2.1511 (Core)
>
> [root at cultivar2 ovirt-hosted-engine-setup]# uname -a
> Linux cultivar2.grove.silverorange.com <
> http://cultivar2.grove.silverorange.com/> 3.10.0-327.13.1.el7.x86_64 #1
> SMP Thu Mar 31 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> Versions:
> [root at cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i ovirt
> libgovirt-0.3.3-1.el7_2.1.x86_64
> ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
> ovirt-host-deploy-1.4.1-1.el7.centos.noarch
> ovirt-vmconsole-1.0.0-1.el7.centos.noarch
> ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
> ovirt-release36-007-1.noarch
> ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
> ovirt-setup-lib-1.0.1-1.el7.centos.noarch
> ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch
> [root at cultivar2 ovirt-hosted-engine-setup]#
> [root at cultivar2 ovirt-hosted-engine-setup]#
> [root at cultivar2 ovirt-hosted-engine-setup]#
> [root at cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i virt
> libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
> virt-viewer-2.0-6.el7.x86_64
> libgovirt-0.3.3-1.el7_2.1.x86_64
> libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
> ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
> fence-virt-0.3.2-2.el7.x86_64
> virt-what-1.13-6.el7.x86_64
> libvirt-python-1.2.17-2.el7.x86_64
> libvirt-daemon-1.2.17-13.el7_2.4.x86_64
> libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
> libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
> libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
> libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
> libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
> ovirt-host-deploy-1.4.1-1.el7.centos.noarch
> virt-v2v-1.28.1-1.55.el7.centos.2.x86_64
> ovirt-vmconsole-1.0.0-1.el7.centos.noarch
> ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
> libvirt-client-1.2.17-13.el7_2.4.x86_64
> libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
> ovirt-release36-007-1.noarch
> libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
> libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
> ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
> ovirt-setup-lib-1.0.1-1.el7.centos.noarch
> ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch
>
> I also have a series of stuck tasks that I can't clear related to the host
> that can't be added... This is a secondary issue and I don't want to get
> off track, but they look like this:
> <PastedGraphic-2.png>
>
> I'd appreciate any help that can be offered.
>
> Cheers,
> Gervais
>
>
> Gervais de Montbrun
> Systems Administrator  / silverorange Inc.
>
> Phone                                    +1 902 367 4532 ext. 104
> <%2B1%20902%20367%204532%20ext.%20104> <tel:+1 902 367 4532 ext. 104
> <%2B1%20902%20367%204532%20ext.%20104>>
> Mobile                                    +1 902 978 0009 <tel:+1 902 978
> 0009>
>
> <hosted-engine--deploy-logs.zip>
>
>
>
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users <
> http://lists.ovirt.org/mailman/listinfo/users>
>
> --
> Wee
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160513/66ab6086/attachment-0001.html>