[ovirt-users] Unable to add Hosts to Cluster

Mark Steele msteele at telvue.com
Thu Feb 22 13:14:45 UTC 2018


We have found a resolution to this issue which is a bit convoluted - and
still does not explain why this started in the first place. Once we have
prepped a HV server to be added (all the NIC's are ready, selinux,
networkmanager, firewalld, etc have been disabled, we have to do the
following:

yum install librbd1
rm /etc/yum.repos.d/CentOS-Base.repo (yes... delete the base repo...)
vi /etc/yum.repos.d/CentOS-Vault.repo
add the following:
 [vault]
 name=CentOS-$releasever - Extras
 #mirrorlist=http://vault.centos.org/?release=$releasever&arch=$basearch&repo=extras
 baseurl=http://vault.centos.org/centos/7.0.1406/os/x86_64/
 gpgcheck=0
 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

vi /etc/yum.repos.d/ovirt-3.5-dependencies.repo
 edit baseurls as show below:
 [ovirt-3.5-glusterfs-epel]
 name=GlusterFS is a clustered file-system capable of scaling to
several petabytes.
 baseurl=https://download.gluster.org/pub/gluster/glusterfs/old-releases/3.6/3.6.1/RHEL/epel-7/x86_64
 enabled=1
 skip_if_unavailable=1
 gpgcheck=0
 [ovirt-3.5-glusterfs-noarch-epel]
 name=GlusterFS is a clustered file-system capable of scaling to
several petabytes.
 baseurl=http://download.gluster.org/pub/gluster/glusterfs/old-releases/3.6/3.6.1/RHEL/epel-7/noarch
 #baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/noarch
 enabled=1
 skip_if_unavailable=1
 gpgcheck=0

yum remove ovirt-release35
yum remove vdsm
yum remove libvirt

yum install ovirt-release35-002-1
yum install libvirt-1.1.1-29.el7
yum install vdsm-4.16.7-1.gitdb83943.el7

We can then successfully add the HV into the cluster.

This is using CentOS 7.0.1406 (Core) as the host OS and oVirt 3.5.0.1-1.el6

If anyone has any questions please feel free to ask.


***
*Mark Steele*
CIO / VP Technical Operations | TelVue Corporation
TelVue - We Share Your Vision
16000 Horizon Way, Suite 100 | Mt. Laurel, NJ 08054
800.885.8886 x128 | msteele at telvue.com | http://www.telvue.com
twitter: http://twitter.com/telvue | facebook:
https://www.facebook.com/telvue

On Tue, Feb 20, 2018 at 11:24 AM, Yaniv Kaul <ykaul at redhat.com> wrote:

>
>
> On Tue, Feb 20, 2018 at 12:52 PM, Mark Steele <msteele at telvue.com> wrote:
>
>> ​Is it possible that the HostedEngine became corrupted somehow and that
>> is preventing us from adding hosts?
>>
>
> I doubt that.
> I still suspect the libvirt auth. issue.
> Nevertheless, as commented more than once, you are running on somewhat old
> version with a recent CentOS version. Not sure this combination is tested
> or anyone's running it.
>
>
>>
>> Is creating a new hosted engine an option?
>>
>
> You could backup and restore to a new HE.
> Y.
>
>
>>
>>
>> ***
>> *Mark Steele*
>> CIO / VP Technical Operations | TelVue Corporation
>> TelVue - We Share Your Vision
>> 16000 Horizon Way, Suite 100 | Mt. Laurel, NJ 08054
>> <https://maps.google.com/?q=16000+Horizon+Way,+Suite+100+%7C+Mt.+Laurel,+NJ+08054&entry=gmail&source=g>
>> 800.885.8886 x128 <(800)%20885-8886> | msteele at telvue.com | http://
>> www.telvue.com
>> twitter: http://twitter.com/telvue | facebook: https://www.facebook
>> .com/telvue
>>
>> On Mon, Feb 19, 2018 at 9:55 AM, Mark Steele <msteele at telvue.com> wrote:
>>
>>> At this point I'm wondering if there is anyone in the community that
>>> freelances and would be willing to provide remote support to resolve this
>>> issue?
>>>
>>> We are running with 1/2 our normal hosts, and not being able to add
>>> anymore back into the cluster is a serious problem.
>>>
>>> Best regards,
>>>
>>>
>>> ***
>>> *Mark Steele*
>>> CIO / VP Technical Operations | TelVue Corporation
>>> TelVue - We Share Your Vision
>>> 16000 Horizon Way, Suite 100 | Mt. Laurel, NJ 08054
>>> <https://maps.google.com/?q=16000+Horizon+Way,+Suite+100+%7C+Mt.+Laurel,+NJ+08054&entry=gmail&source=g>
>>> 800.885.8886 x128 <(800)%20885-8886> | msteele at telvue.com | http://
>>> www.telvue.com
>>> twitter: http://twitter.com/telvue | facebook: https://www.facebook
>>> .com/telvue
>>>
>>> On Sat, Feb 17, 2018 at 12:53 PM, Mark Steele <msteele at telvue.com>
>>> wrote:
>>>
>>>> Yaniv,
>>>>
>>>> I have one of my developers assisting me and we are continuing to run
>>>> into issues. This is a note from him:
>>>>
>>>> Hi, I'm trying to add a host to ovirt, but I'm running into package
>>>> dependency problems. I have existing hosts that are working and integrated
>>>> properly, and inspecting those, I am able to match the packages between the
>>>> new host and the existing, but when I then try to add the new host to
>>>> ovirt, it fails on reinstall because it's trying to install packages that
>>>> are later versions. does the installation run list from ovirt-release35
>>>> 002-1 have unspecified versions? The working hosts use libvirt-1.1.1-29,
>>>> and vdsm-4.16.7, but it's trying to install vdsm-4.16.30, which requires a
>>>> higher version of libvirt, at which point, the installation fails. is there
>>>> some way I can specify which package versions the ovirt install procedure
>>>> uses? or better yet, skip the package management step entirely?
>>>>
>>>>
>>>> ***
>>>> *Mark Steele*
>>>> CIO / VP Technical Operations | TelVue Corporation
>>>> TelVue - We Share Your Vision
>>>> 16000 Horizon Way, Suite 100 | Mt. Laurel, NJ 08054
>>>> <https://maps.google.com/?q=16000+Horizon+Way,+Suite+100+%7C+Mt.+Laurel,+NJ+08054&entry=gmail&source=g>
>>>> 800.885.8886 x128 <(800)%20885-8886> | msteele at telvue.com | http://
>>>> www.telvue.com
>>>> twitter: http://twitter.com/telvue | facebook: https://www.facebook
>>>> .com/telvue
>>>>
>>>> On Sat, Feb 17, 2018 at 2:32 AM, Yaniv Kaul <ykaul at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 16, 2018 at 11:14 PM, Mark Steele <msteele at telvue.com>
>>>>> wrote:
>>>>>
>>>>>> We are using CentOS Linux release 7.0.1406 (Core) and  oVirt Engine
>>>>>> Version: 3.5.0.1-1.el6
>>>>>>
>>>>>
>>>>> You are seeing https://bugzilla.redhat.com/show_bug.cgi?id=1444426 ,
>>>>> which is a result of a default change of libvirt and was fixed in later
>>>>> versions of oVirt than the one you are using.
>>>>> See patch https://gerrit.ovirt.org/#/c/76934/ for how it was fixed,
>>>>> you can probably configure it manually.
>>>>> Y.
>>>>>
>>>>>
>>>>>>
>>>>>> We have four other hosts that are running this same configuration
>>>>>> already. I took one host out of the cluster (forcefully) that was working
>>>>>> and now it will not add back in either - throwing the same SASL error.
>>>>>>
>>>>>> We are looking at downgrading libvirt as I've seen that somewhere
>>>>>> else - is there another version of RH I should be trying? I have a host I
>>>>>> can put it on.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ***
>>>>>> *Mark Steele*
>>>>>> CIO / VP Technical Operations | TelVue Corporation
>>>>>> TelVue - We Share Your Vision
>>>>>> 16000 Horizon Way, Suite 100 | Mt. Laurel, NJ 08054
>>>>>> <https://maps.google.com/?q=16000+Horizon+Way,+Suite+100+%7C+Mt.+Laurel,+NJ+08054&entry=gmail&source=g>
>>>>>> 800.885.8886 x128 <(800)%20885-8886> | msteele at telvue.com | http://
>>>>>> www.telvue.com
>>>>>> twitter: http://twitter.com/telvue | facebook: https://www.facebook
>>>>>> .com/telvue
>>>>>>
>>>>>> On Fri, Feb 16, 2018 at 3:31 PM, Yaniv Kaul <ykaul at redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Feb 16, 2018 6:47 PM, "Mark Steele" <msteele at telvue.com> wrote:
>>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> We recently had a network event where we lost access to our storage
>>>>>>> for a period of time. The Cluster basically shut down all our VM's and in
>>>>>>> the process we had three HV's that went offline and would not communicate
>>>>>>> properly with the cluster.
>>>>>>>
>>>>>>> We have since completely reinstalled CentOS on the hosts and
>>>>>>> attempted to install them into the cluster with no joy. We've gotten to the
>>>>>>> point where we generally get an error message in the web gui:
>>>>>>>
>>>>>>>
>>>>>>> Which EL release and which oVirt release are you using? My guess
>>>>>>> would be latest EL, with an older oVirt?
>>>>>>> Y.
>>>>>>>
>>>>>>>
>>>>>>> Stage: Misc Configuration
>>>>>>> Host hv-ausa-02 installation failed. Command returned failure code 1
>>>>>>> during SSH session 'root at 10.1.90.154'.
>>>>>>>
>>>>>>> the following is what we are seeing in the messages log:
>>>>>>>
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: libvirt: XML-RPC error :
>>>>>>> authentication failed: authentication failed
>>>>>>> Feb 16 11:39:53 hv-ausa-02 libvirtd: 2018-02-16 16:39:53.761+0000:
>>>>>>> 15231: error : virNetSASLSessionListMechanisms:390 : internal
>>>>>>> error: cannot list SASL mechanisms -4 (SASL(-4): no mechanism available:
>>>>>>> Internal Error -4 in server.c near line 1757)
>>>>>>> Feb 16 11:39:53 hv-ausa-02 libvirtd: 2018-02-16 16:39:53.761+0000:
>>>>>>> 15231: error : remoteDispatchAuthSaslInit:3411 : authentication
>>>>>>> failed: authentication failed
>>>>>>> Feb 16 11:39:53 hv-ausa-02 libvirtd: 2018-02-16 16:39:53.761+0000:
>>>>>>> 15226: error : virNetSocketReadWire:1808 : End of file while reading data:
>>>>>>> Input/output error
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: libvirt: XML-RPC error :
>>>>>>> authentication failed: authentication failed
>>>>>>> Feb 16 11:39:53 hv-ausa-02 libvirtd: 2018-02-16 16:39:53.962+0000:
>>>>>>> 15233: error : virNetSASLSessionListMechanisms:390 : internal
>>>>>>> error: cannot list SASL mechanisms -4 (SASL(-4): no mechanism available:
>>>>>>> Internal Error -4 in server.c near line 1757)
>>>>>>> Feb 16 11:39:53 hv-ausa-02 libvirtd: 2018-02-16 16:39:53.963+0000:
>>>>>>> 15233: error : remoteDispatchAuthSaslInit:3411 : authentication
>>>>>>> failed: authentication failed
>>>>>>> Feb 16 11:39:53 hv-ausa-02 libvirtd: 2018-02-16 16:39:53.963+0000:
>>>>>>> 15226: error : virNetSocketReadWire:1808 : End of file while reading data:
>>>>>>> Input/output error
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: libvirt: XML-RPC error :
>>>>>>> authentication failed: authentication failed
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: Traceback (most recent call
>>>>>>> last):
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: File "/usr/bin/vdsm-tool",
>>>>>>> line 219, in main
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: return
>>>>>>> tool_command[cmd]["command"](*args)
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: File
>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/tool/upgrade_300_networks.py",
>>>>>>> line 83, in upgrade_networks
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: networks = netinfo.networks()
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: File
>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 112, in
>>>>>>> networks
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: conn = libvirtconnection.get()
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: File
>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
>>>>>>> 159, in get
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: conn = _open_qemu_connection()
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: File
>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
>>>>>>> 95, in _open_qemu_connection
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: return
>>>>>>> utils.retry(libvirtOpen, timeout=10, sleep=0.2)
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: File
>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1108, in
>>>>>>> retry
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: return func()
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: File
>>>>>>> "/usr/lib64/python2.7/site-packages/libvirt.py", line 105, in
>>>>>>> openAuth
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: if ret is None:raise
>>>>>>> libvirtError('virConnectOpenAuth() failed')
>>>>>>> Feb 16 11:39:53 hv-ausa-02 vdsm-tool: libvirtError: authentication
>>>>>>> failed: authentication failed
>>>>>>> Feb 16 11:39:53 hv-ausa-02 systemd: vdsm-network.service: control
>>>>>>> process exited, code=exited status=1
>>>>>>> Feb 16 11:39:53 hv-ausa-02 systemd: Failed to start Virtual Desktop
>>>>>>> Server Manager network restoration.
>>>>>>> Feb 16 11:39:53 hv-ausa-02 systemd: Dependency failed for Virtual
>>>>>>> Desktop Server Manager.
>>>>>>> Feb 16 11:39:53 hv-ausa-02 systemd: Job vdsmd.service/start failed
>>>>>>> with result 'dependency'.
>>>>>>> Feb 16 11:39:53 hv-ausa-02 systemd: Unit vdsm-network.service
>>>>>>> entered failed state.
>>>>>>> Feb 16 11:39:53 hv-ausa-02 systemd: vdsm-network.service failed.
>>>>>>> Feb 16 11:40:01 hv-ausa-02 systemd: Started Session 10 of user root.
>>>>>>> Feb 16 11:40:01 hv-ausa-02 systemd: Starting Session 10 of user root.
>>>>>>> Feb 16 11:40:01 hv-ausa-02 systemd: Started Session 11 of user root.
>>>>>>> Feb 16 11:40:01 hv-ausa-02 systemd: Starting Session 11 of user root.
>>>>>>>
>>>>>>> Can someone point me in the right direction to resolve this - it
>>>>>>> seems to be a SASL issue perhaps?
>>>>>>>
>>>>>>> ***
>>>>>>> *Mark Steele*
>>>>>>> CIO / VP Technical Operations | TelVue Corporation
>>>>>>> TelVue - We Share Your Vision
>>>>>>> 16000 Horizon Way, Suite 100 | Mt. Laurel, NJ 08054
>>>>>>> <https://maps.google.com/?q=16000+Horizon+Way,+Suite+100+%7C+Mt.+Laurel,+NJ+08054&entry=gmail&source=g>
>>>>>>> 800.885.8886 x128 <(800)%20885-8886> | msteele at telvue.com | http://
>>>>>>> www.telvue.com
>>>>>>> twitter: http://twitter.com/telvue | facebook: https://www.facebook
>>>>>>> .com/telvue
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at ovirt.org
>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180222/a8037e79/attachment.html>


More information about the Users mailing list