[ovirt-users] Adding another host to my cluster
Charles Tassell
ctassell at gmail.com
Fri May 13 12:53:19 UTC 2016
Hi Gervais,
Okay, I see two problems: there are some leftover direcyories causing
issues and for some reason VDSM seems to be trying to bind to a port
something is already running on (probably an older version of VDSM.)
Try removing the duplicate dirs (rmdir
/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286 and
/rhev/data-center/mnt - if they aren't empty don't rm -rf them because
they might be mounted from your production servers. Just mv -i them to
/root or somewhere.)
Next shutdown the vdsm service with "service vdsm stop" (I think,
might be service stop vdsm, I don't use CentOS much) and kill any
running vdsm processes (ps ax |grep vdsm) The error that I saw was:
MainThread::ERROR::2016-05-13
08:58:38,262::clientIF::128::vds::(__init__) failed to init clientIF,
shutting down storage dispatcher
MainThread::ERROR::2016-05-13 08:58:38,289::vdsm::171::vds::(run)
Exception raised
Traceback (most recent call last):
File "/usr/share/vdsm/vdsm", line 169, in run
serve_clients(log)
File "/usr/share/vdsm/vdsm", line 102, in serve_clients
cif = clientIF.getInstance(irs, log, scheduler)
File "/usr/share/vdsm/clientIF.py", line 193, in getInstance
cls._instance = clientIF(irs, log, scheduler)
File "/usr/share/vdsm/clientIF.py", line 123, in __init__
self._createAcceptor(host, port)
File "/usr/share/vdsm/clientIF.py", line 201, in _createAcceptor
port, sslctx)
File "/usr/share/vdsm/protocoldetector.py", line 170, in __init__
sock = _create_socket(host, port)
File "/usr/share/vdsm/protocoldetector.py", line 40, in _create_socket
server_socket.bind(addr[0][4])
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use
If you get the same error, do a netstat -lnp and compare it to the same
from a working box to see if something else is running on the VDSM port.
On 2016-05-13 09:37 AM, Gervais de Montbrun wrote:
> Hi Charles,
>
> I think the problem I am having is due to the setup failing and not
> something in vdsm configs as I have never gotten this server to start
> up properly and the BRIDGE ethernet interface + ovirt routes are not
> setup.
>
> I put the logs here:
> https://www.dropbox.com/sh/5ugyykqh1lgru9l/AACXxRYWr3tgd0WbBVFW5twHa?dl=0
>
> hosted-engine--deploy-logs.zip# Logs from when I tried to deploy and
> it failed
> vdsm.tar.gz# /var/log/vdsm
>
> Output from running vdsm from the command line:
>
> [root at cultivar2 log]# su -s /bin/bash vdsm
> [vdsm at cultivar2 log]$ python /usr/share/vdsm/vdsm
> (PID: 6521) I am the actual vdsm 4.17.26-1.el7
> cultivar2.grove.silverorange.com
> <http://cultivar2.grove.silverorange.com/> (3.10.0-327.el7.x86_64)
> VDSM will run with cpu affinity: frozenset([1])
> /usr/bin/taskset --all-tasks --pid --cpu-list 1 6521 (cwd None)
> SUCCESS: <err> = ''; <rc> = 0
> Starting scheduler vdsm.Scheduler
> started
> Run and protect:
> registerDomainStateChangeCallback(callbackFunc=<functools.partial
> object at 0x381b158>)
> Run and protect: registerDomainStateChangeCallback, Return
> response: None
> Trying to connect to Super Vdsm
> Preparing MOM interface
> Using named unix socket /var/run/vdsm/mom-vdsm.sock
> Unregistering all secrests
> trying to connect libvirt
> recovery: started
> Setting channels' timeout to 30 seconds.
> Starting VM channels listener thread.
> Listening at 0.0.0.0:54321 <http://0.0.0.0:54321>
> Adding detector <rpc.bindingxmlrpc.XmlDetector instance at 0x3b4ecb0>
> recovery: completed in 0s
> Adding detector <yajsonrpc.stompreactor.StompDetector instance at
> 0x382e5a8>
> Starting executor
> Starting worker jsonrpc.Executor/0
> Worker started
> Starting worker jsonrpc.Executor/1
> Worker started
> Starting worker jsonrpc.Executor/2
> Worker started
> Starting worker jsonrpc.Executor/3
> Worker started
> Starting worker jsonrpc.Executor/4
> Worker started
> Starting worker jsonrpc.Executor/5
> Worker started
> Starting worker jsonrpc.Executor/6
> Worker started
> Starting worker jsonrpc.Executor/7
> Worker started
> XMLRPC server running
> Starting executor
> Starting worker periodic/0
> Worker started
> Starting worker periodic/1
> Worker started
> Starting worker periodic/2
> Worker started
> Starting worker periodic/3
> Worker started
> trying to connect libvirt
> Panic: Connect to supervdsm service failed: [Errno 2] No such file
> or directory
> Traceback (most recent call last):
> File "/usr/share/vdsm/supervdsm.py", line 78, in _connect
> utils.retry(self._manager.connect, Exception, timeout=60, tries=3)
> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 959,
> in retry
> return func()
> File "/usr/lib64/python2.7/multiprocessing/managers.py", line
> 500, in connect
> conn = Client(self._address, authkey=self._authkey)
> File "/usr/lib64/python2.7/multiprocessing/connection.py", line
> 173, in Client
> c = SocketClient(address)
> File "/usr/lib64/python2.7/multiprocessing/connection.py", line
> 308, in SocketClient
> s.connect(address)
> File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 2] No such file or directory
> Killed
>
>
> Thanks for the help. It's really appreciated.
>
> Cheers,
> Gervais
>
> On Fri, May 13, 2016 at 12:55 AM, Charles Tassell <ctassell at gmail.com
> <mailto:ctassell at gmail.com>> wrote:
>
> Hi Gervais,
>
> Hmm, can you tar up the logfiles (/var/log/vdsm/* on the host
> you are installing on) and put them somewhere to look at? Also, I
> found that starting VDSM from the command line is useful as it
> sometimes spits out error messages that don't show up in the
> logs. I think the command I used was:
> su -s /bin/bash vdsm
> python /usr/share/vdsm/vdsm
>
> My problem was that I customized the logging settings in
> /etc/vdsm/*conf to try and tone down the debugging stuff and had a
> syntax error.
>
>
> On 16-05-12 10:24 PM, Gervais de Montbrun wrote:
>> Hi Charles,
>>
>> Thanks for the suggestion.
>>
>> I cleaned up again using the bash script from the
>> recoving-from-failed-install link below, then reinstalled (yum
>> install ovirt-hosted-engine-setup).
>>
>> I enabled NetworkManager and firewalld as you suggested. The
>> install stops very early on with an error:
>> [ ERROR ] Failed to execute stage 'Programs detection':
>> hosted-engine cannot be deployed while NetworkManager is running,
>> please stop and disable it before proceeding
>>
>> I disabled and stopped NetworkManager and tried again. Same
>> result. :(
>>
>> Any more guesses?
>>
>> Cheers,
>> Gervais
>>
>>
>>
>>> On May 12, 2016, at 9:08 PM, Charles Tassell <ctassell at gmail.com
>>> <mailto:ctassell at gmail.com>> wrote:
>>>
>>> Hey Gervais,
>>>
>>> Try enabling NetworkManager and firewalld before doing the
>>> hosted-engine --deploy. I have run into problems with oVirt
>>> trying to perform tasks on hosts where firewalld is disabled, so
>>> maybe you are running into a similar problem. Also, I think the
>>> setup script will disable NetworkManager if it needs to. I know
>>> I didn't manually disable it on any of the boxes I installed on.
>>>
>>> On 16-05-12 04:49 PM, users-request at ovirt.org
>>> <mailto:users-request at ovirt.org> wrote:
>>>> Message: 1
>>>> Date: Thu, 12 May 2016 14:22:12 -0300
>>>> From: Gervais de Montbrun <gervais at demontbrun.com
>>>> <mailto:gervais at demontbrun.com>>
>>>> To: Wee Sritippho <wee.s at forest.go.th <mailto:wee.s at forest.go.th>>
>>>> Cc: users <users at ovirt.org <mailto:users at ovirt.org>>
>>>> Subject: Re: [ovirt-users] Adding another host to my cluster
>>>> Message-ID:
>>>> <28B7FC74-5C52-4F60-B9F3-39A36621A7CA at demontbrun.com
>>>> <mailto:28B7FC74-5C52-4F60-B9F3-39A36621A7CA at demontbrun.com>>
>>>> Content-Type: text/plain; charset="utf-8"
>>>>
>>>> Hi Wee
>>>> (and others)
>>>>
>>>> Thanks for the reply. I tried what you suggested, but I am in
>>>> the exact same state. :-(
>>>>
>>>> I don't want to completely remove my hosted engine setup as it
>>>> is working on the two other hosts in my cluster. I did not run
>>>> the rm -rf stes listed here
>>>> (https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
>>>> <https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>)
>>>> that would wipe my hosted_engine nfs mount. If you know that
>>>> this is 100% necessary, please let me know.
>>>>
>>>> I did:
>>>> hosted-engine --clean-metadata --force-cleanup --host-id=3
>>>> run the bash script to remove all of the ovirt packages and
>>>> config files
>>>> reinstalled ovirt-hosted-engine-setup
>>>> ran "hosted-engine --deploy"
>>>>
>>>> I'm back exactly where I started. Is there a way to run just
>>>> the network configuration part of the deploy?
>>>>
>>>> Since the last attempt, I did upgrade my hosted engine and my
>>>> cluster is now running oVirt 3.6.5.
>>>>
>>>> Cheers,
>>>> Gervais
>>>>
>>>>
>>>>
>>>>> On May 12, 2016, at 11:50 AM, Wee Sritippho
>>>>> <wee.s at forest.go.th <mailto:wee.s at forest.go.th>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I used to have a similar problem where one of my host can't be
>>>>> deployed due to the absence of ovirtmgmt bridge. Simone said
>>>>> it's a bug ( https://bugzilla.redhat.com/1323465
>>>>> <https://bugzilla.redhat.com/1323465> ) which would be fixed
>>>>> in 3.6.6.
>>>>>
>>>>> This is what I've done to solve it:
>>>>>
>>>>> 1. In the web UI, set the failed host to maintenance.
>>>>> 2. Remove it.
>>>>> 3. In that host, run a script from
>>>>> https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
>>>>> <https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>
>>>>> 4. Install ovirt-hosted-engine-setup again.
>>>>> 5. Redeploy again.
>>>>>
>>>>> Hope that helps
>>>>>
>>>>> On 11 ??????? 2016 22 ?????? 48 ???? 58 ?????? GMT+07:00,
>>>>> Gervais de Montbrun <gervais at demontbrun.com
>>>>> <mailto:gervais at demontbrun.com>> wrote:
>>>>> Hi Folks,
>>>>>
>>>>> I hate to reply to my own message, but I'm really hoping
>>>>> someone can help me with my issue
>>>>> http://lists.ovirt.org/pipermail/users/2016-May/039690.html
>>>>> <http://lists.ovirt.org/pipermail/users/2016-May/039690.html>
>>>>>
>>>>> Does anyone have a suggestion for me? If there is any more
>>>>> information that I can provide that would help you to help me,
>>>>> please advise.
>>>>>
>>>>> Cheers,
>>>>> Gervais
>>>>>
>>>>>
>>>>>
>>>>>> On May 9, 2016, at 1:42 PM, Gervais de Montbrun
>>>>>> <gervais at demontbrun.com <mailto:gervais at demontbrun.com>
>>>>>> <mailto:gervais at demontbrun.com
>>>>>> <mailto:gervais at demontbrun.com>>> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I'm trying to add a third host into my oVirt cluster. I have
>>>>>> hosted engine setup on the first two. It's failing to finish
>>>>>> the hosted-engine --deploy on this third host. I wiped the
>>>>>> server and did a CentOS 7 minimum install and ran it again to
>>>>>> have a clean machine.
>>>>>>
>>>>>> My setup:
>>>>>> CentOS 7 clean install
>>>>>> yum install -y
>>>>>> http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
>>>>>> <http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm>
>>>>>> yum install -y ovirt-hosted-engine-setup
>>>>>> yum upgrade -y && reboot
>>>>>> systemctl disable NetworkManager ; systemctl stop
>>>>>> NetworkManager ; systemctl disable firewalld ; systemctl stop
>>>>>> firewalld
>>>>>> hosted-engine --deploy
>>>>>>
>>>>>> hosted-engine --deploy always throws an error:
>>>>>> [ ERROR ] The VDSM host was found in a failed state. Please
>>>>>> check engine and bootstrap installation logs.
>>>>>> [ ERROR ] Unable to add Cultivar2 to the manager
>>>>>> and then echo's
>>>>>> [ INFO ] Waiting for VDSM hardware info
>>>>>> ...
>>>>>> [ ERROR ] Failed to execute stage 'Closing up': VDSM did not
>>>>>> start within 120 seconds
>>>>>> [ INFO ] Stage: Clean up
>>>>>> [ INFO ] Generating answer file
>>>>>> '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160509131103.conf'
>>>>>> [ INFO ] Stage: Pre-termination
>>>>>> [ INFO ] Stage: Termination
>>>>>> [ ERROR ] Hosted Engine deployment failed: this system is not
>>>>>> reliable, please check the issue, fix and redeploy
>>>>>> Log file is located at
>>>>>> /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160509130658-qb8ev0.log
>>>>>>
>>>>>> Full output of hosted-engine --deploy included in the
>>>>>> attached zip file.
>>>>>> I've also included vdsm.log (There is more than one tries
>>>>>> worth of tries in there).
>>>>>> You'll also find the
>>>>>> ovirt-hosted-engine-setup-20160509130658-qb8ev0.log listed above.
>>>>>>
>>>>>> This is my "test" setup. Cultivar0 is my first host and my
>>>>>> nfs server for storage. I have two hosts in the setup already
>>>>>> and everything is working fine. The host does show up in the
>>>>>> oVirt admin, but shows "Installed Failed"
>>>>>> <PastedGraphic-1.png>
>>>>>>
>>>>>> Trying to reinstall from within the interface just fails again.
>>>>>>
>>>>>> The ovirt bridge interface is not configured and there are no
>>>>>> config files in /etc/sysconfi/network-scripts related to ovirt.
>>>>>>
>>>>>> OS:
>>>>>> [root at cultivar2 ovirt-hosted-engine-setup]# cat
>>>>>> /etc/redhat-release
>>>>>> CentOS Linux release 7.2.1511 (Core)
>>>>>>
>>>>>> [root at cultivar2 ovirt-hosted-engine-setup]# uname -a
>>>>>> Linux cultivar2.grove.silverorange.com
>>>>>> <http://cultivar2.grove.silverorange.com>
>>>>>> <http://cultivar2.grove.silverorange.com/>
>>>>>> 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC
>>>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>
>>>>>> Versions:
>>>>>> [root at cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i
>>>>>> ovirt
>>>>>> libgovirt-0.3.3-1.el7_2.1.x86_64
>>>>>> ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
>>>>>> ovirt-host-deploy-1.4.1-1.el7.centos.noarch
>>>>>> ovirt-vmconsole-1.0.0-1.el7.centos.noarch
>>>>>> ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
>>>>>> ovirt-release36-007-1.noarch
>>>>>> ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
>>>>>> ovirt-setup-lib-1.0.1-1.el7.centos.noarch
>>>>>> ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch
>>>>>> [root at cultivar2 ovirt-hosted-engine-setup]#
>>>>>> [root at cultivar2 ovirt-hosted-engine-setup]#
>>>>>> [root at cultivar2 ovirt-hosted-engine-setup]#
>>>>>> [root at cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i
>>>>>> virt
>>>>>> libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
>>>>>> virt-viewer-2.0-6.el7.x86_64
>>>>>> libgovirt-0.3.3-1.el7_2.1.x86_64
>>>>>> libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
>>>>>> ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
>>>>>> fence-virt-0.3.2-2.el7.x86_64
>>>>>> virt-what-1.13-6.el7.x86_64
>>>>>> libvirt-python-1.2.17-2.el7.x86_64
>>>>>> libvirt-daemon-1.2.17-13.el7_2.4.x86_64
>>>>>> libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
>>>>>> libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
>>>>>> libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
>>>>>> libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
>>>>>> libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
>>>>>> ovirt-host-deploy-1.4.1-1.el7.centos.noarch
>>>>>> virt-v2v-1.28.1-1.55.el7.centos.2.x86_64
>>>>>> ovirt-vmconsole-1.0.0-1.el7.centos.noarch
>>>>>> ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
>>>>>> libvirt-client-1.2.17-13.el7_2.4.x86_64
>>>>>> libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
>>>>>> ovirt-release36-007-1.noarch
>>>>>> libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
>>>>>> libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
>>>>>> ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
>>>>>> ovirt-setup-lib-1.0.1-1.el7.centos.noarch
>>>>>> ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch
>>>>>>
>>>>>> I also have a series of stuck tasks that I can't clear
>>>>>> related to the host that can't be added... This is a
>>>>>> secondary issue and I don't want to get off track, but they
>>>>>> look like this:
>>>>>> <PastedGraphic-2.png>
>>>>>>
>>>>>> I'd appreciate any help that can be offered.
>>>>>>
>>>>>> Cheers,
>>>>>> Gervais
>>>>>>
>>>>>>
>>>>>> Gervais de Montbrun
>>>>>> Systems Administrator / silverorange Inc.
>>>>>>
>>>>>> Phone +1 902 367 4532 ext. 104
>>>>>> <tel:%2B1%20902%20367%204532%20ext.%20104> <tel:+1 902 367
>>>>>> 4532 ext. 104 <tel:%2B1%20902%20367%204532%20ext.%20104>>
>>>>>> Mobile +1 902 978 0009 <tel:%2B1%20902%20978%200009> <tel:+1
>>>>>> 902 978 0009 <tel:%2B1%20902%20978%200009>>
>>>>>>
>>>>>> <hosted-engine--deploy-logs.zip>
>>>>>
>>>>>
>>>>> Users mailing list
>>>>> Users at ovirt.org <mailto:Users at ovirt.org>
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>> <http://lists.ovirt.org/mailman/listinfo/users>
>>>>>
>>>>> --
>>>>> Wee
>>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org <mailto:Users at ovirt.org>
>>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160513/60ff8c6d/attachment-0001.html>
More information about the Users
mailing list