[ovirt-users] Adding another host to my cluster

Fri May 13 12:53:19 UTC 2016

Hi Gervais,

   Okay, I see two problems: there are some leftover direcyories causing 
issues and for some reason VDSM seems to be trying to bind to a port 
something is already running on (probably an older version of VDSM.)  
Try removing the duplicate dirs (rmdir 
/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286 and 
/rhev/data-center/mnt - if they aren't empty don't rm -rf them because 
they might be mounted from your production servers.  Just mv -i them to 
/root or somewhere.)

   Next shutdown the vdsm service with "service vdsm stop" (I think, 
might be service stop vdsm, I don't use CentOS much) and kill any 
running vdsm processes (ps ax |grep vdsm)  The error that I saw was:

MainThread::ERROR::2016-05-13 
08:58:38,262::clientIF::128::vds::(__init__) failed to init clientIF, 
shutting down storage dispatcher
MainThread::ERROR::2016-05-13 08:58:38,289::vdsm::171::vds::(run) 
Exception raised
Traceback (most recent call last):
   File "/usr/share/vdsm/vdsm", line 169, in run
     serve_clients(log)
   File "/usr/share/vdsm/vdsm", line 102, in serve_clients
     cif = clientIF.getInstance(irs, log, scheduler)
   File "/usr/share/vdsm/clientIF.py", line 193, in getInstance
     cls._instance = clientIF(irs, log, scheduler)
   File "/usr/share/vdsm/clientIF.py", line 123, in __init__
     self._createAcceptor(host, port)
   File "/usr/share/vdsm/clientIF.py", line 201, in _createAcceptor
     port, sslctx)
   File "/usr/share/vdsm/protocoldetector.py", line 170, in __init__
     sock = _create_socket(host, port)
   File "/usr/share/vdsm/protocoldetector.py", line 40, in _create_socket
     server_socket.bind(addr[0][4])
   File "/usr/lib64/python2.7/socket.py", line 224, in meth
     return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use

If you get the same error, do a netstat -lnp and compare it to the same 
from a working box to see if something else is running on the VDSM port.

On 2016-05-13 09:37 AM, Gervais de Montbrun wrote:
> Hi Charles,
>
> I think the problem I am having is due to the setup failing and not 
> something in vdsm configs as I have never gotten this server to start 
> up properly and the BRIDGE ethernet interface + ovirt routes are not 
> setup.
>
> I put the logs here: 
> https://www.dropbox.com/sh/5ugyykqh1lgru9l/AACXxRYWr3tgd0WbBVFW5twHa?dl=0
>
> hosted-engine--deploy-logs.zip# Logs from when I tried to deploy and 
> it failed
> vdsm.tar.gz# /var/log/vdsm
>
> Output from running vdsm from the command line:
>
>     [root at cultivar2 log]# su -s /bin/bash vdsm
>     [vdsm at cultivar2 log]$ python /usr/share/vdsm/vdsm
>     (PID: 6521) I am the actual vdsm 4.17.26-1.el7
>     cultivar2.grove.silverorange.com
>     <http://cultivar2.grove.silverorange.com/> (3.10.0-327.el7.x86_64)
>     VDSM will run with cpu affinity: frozenset([1])
>     /usr/bin/taskset --all-tasks --pid --cpu-list 1 6521 (cwd None)
>     SUCCESS: <err> = ''; <rc> = 0
>     Starting scheduler vdsm.Scheduler
>     started
>     Run and protect:
>     registerDomainStateChangeCallback(callbackFunc=<functools.partial
>     object at 0x381b158>)
>     Run and protect: registerDomainStateChangeCallback, Return
>     response: None
>     Trying to connect to Super Vdsm
>     Preparing MOM interface
>     Using named unix socket /var/run/vdsm/mom-vdsm.sock
>     Unregistering all secrests
>     trying to connect libvirt
>     recovery: started
>     Setting channels' timeout to 30 seconds.
>     Starting VM channels listener thread.
>     Listening at 0.0.0.0:54321 <http://0.0.0.0:54321>
>     Adding detector <rpc.bindingxmlrpc.XmlDetector instance at 0x3b4ecb0>
>     recovery: completed in 0s
>     Adding detector <yajsonrpc.stompreactor.StompDetector instance at
>     0x382e5a8>
>     Starting executor
>     Starting worker jsonrpc.Executor/0
>     Worker started
>     Starting worker jsonrpc.Executor/1
>     Worker started
>     Starting worker jsonrpc.Executor/2
>     Worker started
>     Starting worker jsonrpc.Executor/3
>     Worker started
>     Starting worker jsonrpc.Executor/4
>     Worker started
>     Starting worker jsonrpc.Executor/5
>     Worker started
>     Starting worker jsonrpc.Executor/6
>     Worker started
>     Starting worker jsonrpc.Executor/7
>     Worker started
>     XMLRPC server running
>     Starting executor
>     Starting worker periodic/0
>     Worker started
>     Starting worker periodic/1
>     Worker started
>     Starting worker periodic/2
>     Worker started
>     Starting worker periodic/3
>     Worker started
>     trying to connect libvirt
>     Panic: Connect to supervdsm service failed: [Errno 2] No such file
>     or directory
>     Traceback (most recent call last):
>       File "/usr/share/vdsm/supervdsm.py", line 78, in _connect
>         utils.retry(self._manager.connect, Exception, timeout=60, tries=3)
>       File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 959,
>     in retry
>         return func()
>       File "/usr/lib64/python2.7/multiprocessing/managers.py", line
>     500, in connect
>         conn = Client(self._address, authkey=self._authkey)
>       File "/usr/lib64/python2.7/multiprocessing/connection.py", line
>     173, in Client
>         c = SocketClient(address)
>       File "/usr/lib64/python2.7/multiprocessing/connection.py", line
>     308, in SocketClient
>         s.connect(address)
>       File "/usr/lib64/python2.7/socket.py", line 224, in meth
>         return getattr(self._sock,name)(*args)
>     error: [Errno 2] No such file or directory
>     Killed
>
>
> Thanks for the help. It's really appreciated.
>
> Cheers,
> Gervais
>
> On Fri, May 13, 2016 at 12:55 AM, Charles Tassell <ctassell at gmail.com 
> <mailto:ctassell at gmail.com>> wrote:
>
>     Hi Gervais,
>
>       Hmm, can you tar up the logfiles (/var/log/vdsm/* on the host
>     you are installing on) and put them somewhere to look at?  Also, I
>     found that starting VDSM from the command line is useful as it
>     sometimes spits out error messages that don't show up in the
>     logs.  I think the command I used was:
>     su -s /bin/bash vdsm
>     python /usr/share/vdsm/vdsm
>
>     My problem was that I customized the logging settings in
>     /etc/vdsm/*conf to try and tone down the debugging stuff and had a
>     syntax error.
>
>
>     On 16-05-12 10:24 PM, Gervais de Montbrun wrote:
>>     Hi Charles,
>>
>>     Thanks for the suggestion.
>>
>>     I cleaned up again using the bash script from the
>>     recoving-from-failed-install link below, then reinstalled (yum
>>     install ovirt-hosted-engine-setup).
>>
>>     I enabled NetworkManager and firewalld as you suggested. The
>>     install stops very early on with an error:
>>     [ ERROR ] Failed to execute stage 'Programs detection':
>>     hosted-engine cannot be deployed while NetworkManager is running,
>>     please stop and disable it before proceeding
>>
>>     I disabled and stopped NetworkManager and tried again. Same
>>     result. :(
>>
>>     Any more guesses?
>>
>>     Cheers,
>>     Gervais
>>
>>
>>
>>>     On May 12, 2016, at 9:08 PM, Charles Tassell <ctassell at gmail.com
>>>     <mailto:ctassell at gmail.com>> wrote:
>>>
>>>     Hey Gervais,
>>>
>>>     Try enabling NetworkManager and firewalld before doing the
>>>     hosted-engine --deploy.  I have run into problems with oVirt
>>>     trying to perform tasks on hosts where firewalld is disabled, so
>>>     maybe you are running into a similar problem.  Also, I think the
>>>     setup script will disable NetworkManager if it needs to.  I know
>>>     I didn't manually disable it on any of the boxes I installed on.
>>>
>>>     On 16-05-12 04:49 PM, users-request at ovirt.org
>>>     <mailto:users-request at ovirt.org> wrote:
>>>>     Message: 1
>>>>     Date: Thu, 12 May 2016 14:22:12 -0300
>>>>     From: Gervais de Montbrun <gervais at demontbrun.com
>>>>     <mailto:gervais at demontbrun.com>>
>>>>     To: Wee Sritippho <wee.s at forest.go.th <mailto:wee.s at forest.go.th>>
>>>>     Cc: users <users at ovirt.org <mailto:users at ovirt.org>>
>>>>     Subject: Re: [ovirt-users] Adding another host to my cluster
>>>>     Message-ID:
>>>>     <28B7FC74-5C52-4F60-B9F3-39A36621A7CA at demontbrun.com
>>>>     <mailto:28B7FC74-5C52-4F60-B9F3-39A36621A7CA at demontbrun.com>>
>>>>     Content-Type: text/plain; charset="utf-8"
>>>>
>>>>     Hi Wee
>>>>     (and others)
>>>>
>>>>     Thanks for the reply. I tried what you suggested, but I am in
>>>>     the exact same state. :-(
>>>>
>>>>     I don't want to completely remove my hosted engine setup as it
>>>>     is working on the two other hosts in my cluster. I did not run
>>>>     the rm -rf stes listed here
>>>>     (https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
>>>>     <https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>)
>>>>     that would wipe my hosted_engine nfs mount. If you know that
>>>>     this is 100% necessary, please let me know.
>>>>
>>>>     I did:
>>>>     hosted-engine --clean-metadata --force-cleanup --host-id=3
>>>>     run the bash script to remove all of the ovirt packages and
>>>>     config files
>>>>     reinstalled ovirt-hosted-engine-setup
>>>>     ran "hosted-engine --deploy"
>>>>
>>>>     I'm back exactly where I started. Is there a way to run just
>>>>     the network configuration part of the deploy?
>>>>
>>>>     Since the last attempt, I did upgrade my hosted engine and my
>>>>     cluster is now running oVirt 3.6.5.
>>>>
>>>>     Cheers,
>>>>     Gervais
>>>>
>>>>
>>>>
>>>>>     On May 12, 2016, at 11:50 AM, Wee Sritippho
>>>>>     <wee.s at forest.go.th <mailto:wee.s at forest.go.th>> wrote:
>>>>>
>>>>>     Hi,
>>>>>
>>>>>     I used to have a similar problem where one of my host can't be
>>>>>     deployed due to the absence of ovirtmgmt bridge. Simone said
>>>>>     it's a bug ( https://bugzilla.redhat.com/1323465
>>>>>     <https://bugzilla.redhat.com/1323465> ) which would be fixed
>>>>>     in 3.6.6.
>>>>>
>>>>>     This is what I've done to solve it:
>>>>>
>>>>>     1. In the web UI, set the failed host to maintenance.
>>>>>     2. Remove it.
>>>>>     3. In that host, run a script from
>>>>>     https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
>>>>>     <https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>
>>>>>     4. Install ovirt-hosted-engine-setup again.
>>>>>     5. Redeploy again.
>>>>>
>>>>>     Hope that helps
>>>>>
>>>>>     On 11 ??????? 2016 22 ?????? 48 ???? 58 ?????? GMT+07:00,
>>>>>     Gervais de Montbrun <gervais at demontbrun.com
>>>>>     <mailto:gervais at demontbrun.com>> wrote:
>>>>>     Hi Folks,
>>>>>
>>>>>     I hate to reply to my own message, but I'm really hoping
>>>>>     someone can help me with my issue
>>>>>     http://lists.ovirt.org/pipermail/users/2016-May/039690.html
>>>>>     <http://lists.ovirt.org/pipermail/users/2016-May/039690.html>
>>>>>
>>>>>     Does anyone have a suggestion for me? If there is any more
>>>>>     information that I can provide that would help you to help me,
>>>>>     please advise.
>>>>>
>>>>>     Cheers,
>>>>>     Gervais
>>>>>
>>>>>
>>>>>
>>>>>>     On May 9, 2016, at 1:42 PM, Gervais de Montbrun
>>>>>>     <gervais at demontbrun.com <mailto:gervais at demontbrun.com>
>>>>>>     <mailto:gervais at demontbrun.com
>>>>>>     <mailto:gervais at demontbrun.com>>> wrote:
>>>>>>
>>>>>>     Hi All,
>>>>>>
>>>>>>     I'm trying to add a third host into my oVirt cluster. I have
>>>>>>     hosted engine setup on the first two. It's failing to finish
>>>>>>     the hosted-engine --deploy on this third host. I wiped the
>>>>>>     server and did a CentOS 7 minimum install and ran it again to
>>>>>>     have a clean machine.
>>>>>>
>>>>>>     My setup:
>>>>>>     CentOS 7 clean install
>>>>>>     yum install -y
>>>>>>     http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
>>>>>>     <http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm>
>>>>>>     yum install -y ovirt-hosted-engine-setup
>>>>>>     yum upgrade -y && reboot
>>>>>>     systemctl disable NetworkManager ; systemctl stop
>>>>>>     NetworkManager ; systemctl disable firewalld ; systemctl stop
>>>>>>     firewalld
>>>>>>     hosted-engine --deploy
>>>>>>
>>>>>>     hosted-engine --deploy always throws an error:
>>>>>>     [ ERROR ] The VDSM host was found in a failed state. Please
>>>>>>     check engine and bootstrap installation logs.
>>>>>>     [ ERROR ] Unable to add Cultivar2 to the manager
>>>>>>     and then echo's
>>>>>>     [ INFO  ] Waiting for VDSM hardware info
>>>>>>     ...
>>>>>>     [ ERROR ] Failed to execute stage 'Closing up': VDSM did not
>>>>>>     start within 120 seconds
>>>>>>     [ INFO  ] Stage: Clean up
>>>>>>     [ INFO  ] Generating answer file
>>>>>>     '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160509131103.conf'
>>>>>>     [ INFO  ] Stage: Pre-termination
>>>>>>     [ INFO  ] Stage: Termination
>>>>>>     [ ERROR ] Hosted Engine deployment failed: this system is not
>>>>>>     reliable, please check the issue, fix and redeploy
>>>>>>         Log file is located at
>>>>>>     /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160509130658-qb8ev0.log
>>>>>>
>>>>>>     Full output of hosted-engine --deploy included in the
>>>>>>     attached zip file.
>>>>>>     I've also included vdsm.log (There is more than one tries
>>>>>>     worth of tries in there).
>>>>>>     You'll also find the
>>>>>>     ovirt-hosted-engine-setup-20160509130658-qb8ev0.log listed above.
>>>>>>
>>>>>>     This is my "test" setup. Cultivar0 is my first host and my
>>>>>>     nfs server for storage. I have two hosts in the setup already
>>>>>>     and everything is working fine. The host does show up in the
>>>>>>     oVirt admin, but shows "Installed Failed"
>>>>>>     <PastedGraphic-1.png>
>>>>>>
>>>>>>     Trying to reinstall from within the interface just fails again.
>>>>>>
>>>>>>     The ovirt bridge interface is not configured and there are no
>>>>>>     config files in /etc/sysconfi/network-scripts related to ovirt.
>>>>>>
>>>>>>     OS:
>>>>>>     [root at cultivar2 ovirt-hosted-engine-setup]# cat
>>>>>>     /etc/redhat-release
>>>>>>     CentOS Linux release 7.2.1511 (Core)
>>>>>>
>>>>>>     [root at cultivar2 ovirt-hosted-engine-setup]# uname -a
>>>>>>     Linux cultivar2.grove.silverorange.com
>>>>>>     <http://cultivar2.grove.silverorange.com>
>>>>>>     <http://cultivar2.grove.silverorange.com/>
>>>>>>     3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC
>>>>>>     2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>
>>>>>>     Versions:
>>>>>>     [root at cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i
>>>>>>     ovirt
>>>>>>     libgovirt-0.3.3-1.el7_2.1.x86_64
>>>>>>     ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
>>>>>>     ovirt-host-deploy-1.4.1-1.el7.centos.noarch
>>>>>>     ovirt-vmconsole-1.0.0-1.el7.centos.noarch
>>>>>>     ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
>>>>>>     ovirt-release36-007-1.noarch
>>>>>>     ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
>>>>>>     ovirt-setup-lib-1.0.1-1.el7.centos.noarch
>>>>>>     ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch
>>>>>>     [root at cultivar2 ovirt-hosted-engine-setup]#
>>>>>>     [root at cultivar2 ovirt-hosted-engine-setup]#
>>>>>>     [root at cultivar2 ovirt-hosted-engine-setup]#
>>>>>>     [root at cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i
>>>>>>     virt
>>>>>>     libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
>>>>>>     virt-viewer-2.0-6.el7.x86_64
>>>>>>     libgovirt-0.3.3-1.el7_2.1.x86_64
>>>>>>     libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
>>>>>>     ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
>>>>>>     fence-virt-0.3.2-2.el7.x86_64
>>>>>>     virt-what-1.13-6.el7.x86_64
>>>>>>     libvirt-python-1.2.17-2.el7.x86_64
>>>>>>     libvirt-daemon-1.2.17-13.el7_2.4.x86_64
>>>>>>     libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
>>>>>>     libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
>>>>>>     libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
>>>>>>     libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
>>>>>>     libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
>>>>>>     ovirt-host-deploy-1.4.1-1.el7.centos.noarch
>>>>>>     virt-v2v-1.28.1-1.55.el7.centos.2.x86_64
>>>>>>     ovirt-vmconsole-1.0.0-1.el7.centos.noarch
>>>>>>     ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
>>>>>>     libvirt-client-1.2.17-13.el7_2.4.x86_64
>>>>>>     libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
>>>>>>     ovirt-release36-007-1.noarch
>>>>>>     libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
>>>>>>     libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
>>>>>>     ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
>>>>>>     ovirt-setup-lib-1.0.1-1.el7.centos.noarch
>>>>>>     ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch
>>>>>>
>>>>>>     I also have a series of stuck tasks that I can't clear
>>>>>>     related to the host that can't be added... This is a
>>>>>>     secondary issue and I don't want to get off track, but they
>>>>>>     look like this:
>>>>>>     <PastedGraphic-2.png>
>>>>>>
>>>>>>     I'd appreciate any help that can be offered.
>>>>>>
>>>>>>     Cheers,
>>>>>>     Gervais
>>>>>>
>>>>>>
>>>>>>     Gervais de Montbrun
>>>>>>     Systems Administrator  / silverorange Inc.
>>>>>>
>>>>>>     Phone +1 902 367 4532 ext. 104
>>>>>>     <tel:%2B1%20902%20367%204532%20ext.%20104> <tel:+1 902 367
>>>>>>     4532 ext. 104 <tel:%2B1%20902%20367%204532%20ext.%20104>>
>>>>>>     Mobile +1 902 978 0009 <tel:%2B1%20902%20978%200009> <tel:+1
>>>>>>     902 978 0009 <tel:%2B1%20902%20978%200009>>
>>>>>>
>>>>>>     <hosted-engine--deploy-logs.zip>
>>>>>
>>>>>
>>>>>     Users mailing list
>>>>>     Users at ovirt.org <mailto:Users at ovirt.org>
>>>>>     http://lists.ovirt.org/mailman/listinfo/users
>>>>>     <http://lists.ovirt.org/mailman/listinfo/users>
>>>>>
>>>>>     -- 
>>>>>     Wee
>>>>
>>>
>>>     _______________________________________________
>>>     Users mailing list
>>>     Users at ovirt.org <mailto:Users at ovirt.org>
>>>     http://lists.ovirt.org/mailman/listinfo/users
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160513/60ff8c6d/attachment-0001.html>