set up ovirt 4.2 hyperconverged with glusterfs "storage network" over infiniband

Hi All! I'm try to set up last ovirt version : ovirt-release42-pre.rpm repository (because bug https://bugzilla.redhat.com/show_bug.cgi?id=1637468 , for example - it's not fixed in release42 stable) Then, I install these (and many deps) rpms: ovirt-hosted-engine-setup-2.2.30-1.el7.noarch ovirt-engine-appliance-4.2-20181026.1.el7.noarch vdsm-4.20.43-1.el7.x86_64 vdsm-gluster-4.20.43-1.el7.x86_64 vdsm-network-4.20.43-1.el7.x86_64 All from that repository, and use webui installer for create glusterfs volumes (default suggested engine, data, vmstore) and then install hosted engine on that "engine" volume. But in my case i try to setup additional "storage network" (for example, as described there: https://ovirt.org/develop/release-management/features/gluster/select-network... ) These screenshots are too old, and in 4.2 UI changed as I see, but idea are same. I have two interface on each host: one ethernet (enp59s0f0 with address from 172.16.10.0/24 with default gateway) and one "Infiniband" (no default gateway, only between cluster nodes, no routing, no external access). Really it is Intel Omni-path fabric : ----------- [root@ovirtnode1 log]# hfi1_control -i Driver Version: 10.8-0 Opa Version: 10.8.0.0.204 0: BoardId: Intel Corporation Omni-Path HFI Silicon 100 Series [integrated] 0,1: Status: 5: LinkUp 4: ACTIVE ------------- It looks like IP-over-IB interface: 6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP group default qlen 256 link/infiniband 80:00:00:02:fe:80:00:00:00:00:00:00:00:11:75:09:01:1a:ee:ea brd 00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ff inet 172.16.100.1/24 brd 172.16.100.255 scope global noprefixroute ib0 valid_lft forever preferred_lft forever It has this properties in ifcfg-ib0 file: CONNECTED_MODE=yes MTU=65520 All IP-s on that interfaces pair have DNS records on external DNS: ethernet (management network) has ovirtnode{N} names and infiniband (storage network) has an ovirtstor{N} names. During webui glusterfs setup I used ovirtstor host names, trusted pool created: 5a9a0a5f-12f4-48b1-bfbe-24c172adc65c ovirtstor5.miac Connected 41350da9-c944-41c5-afdc-46ff51ab93f6 ovirtstor6.miac Connected 0f50175e-7e47-4839-99c7-c7ced21f090c localhost Connected Then I log in to web administration console and add two other hosts by their names Name Hostname/IP Cluster Data Center Status SPM ovirtnode1 ovirtnode1 Default Default Up SPM ovirtnode5 ovirtnode5 Default Default Up Normal For this setup I have some questions: 1. Where is a webui place when I can configure that i want to use "storage network" ? I try to create second network (network->networks->new), but vdsm overwrite the ifcfg-ib0 file without that properties, as it is "like ethernet" interface: Generated by VDSM version 4.20.43-1.el7 DEVICE=ib0 ONBOOT=yes IPADDR=172.16.100.5 NETMASK=255.255.255.0 BOOTPROTO=none MTU=65520 DEFROUTE=no NM_CONTROLLED=no IPV6INIT=no MTU i entered by hand in General->MTU-Custom field, but: It cannot be set without "CONNECTED_MODE=yes" property, and now in networks->"storage"->hosts it always show as "out-of-sync". "Custom properties" are greyed and not available. 2. If I use checkbox "VM network" when create network and then "setup host networks" with this network for ib0 interface - all engine hangs. I think it's because it try to bridge infiniband interface with other, and that cannot done (i see only "1 task running" that never ends and no other interface can show any details) 3. Also ovirt try to start send LLDP TLVs on interface ib0, but it cannot be done: Nov 6 17:30:01 ovirtnode5 systemd: Starting Link Layer Discovery Protocol Agent Daemon.... Nov 6 17:30:01 ovirtnode5 kernel: bnx2x: [bnx2x_dcbnl_set_dcbx:2383(enp59s0f0)]Requested DCBX mode 5 is beyond advertised capabilities Nov 6 17:30:02 ovirtnode5 systemd: Started /sbin/ifup ib0. Nov 6 17:30:02 ovirtnode5 systemd: Starting /sbin/ifup ib0. Nov 6 17:30:02 ovirtnode5 kernel: IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready Nov 6 17:30:02 ovirtnode5 NetworkManager[1650]: <info> [1541511002.9642] device (ib0): carrier: link connected Nov 6 17:30:02 ovirtnode5 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready Nov 6 17:30:02 ovirtnode5 lldpad: setsockopt nearest_bridge: Invalid argument Nov 6 17:30:41 ovirtnode5 vdsm[127585]: ERROR Internal server error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", lin e 606, in _handle_request#012 res = method(**params)#012 File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 193, in _dynamicMethod#012 result = fn(*methodArg s)#012 File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1561, in getLldp#012 info=supervdsm.getProxy().get_lldp_info(filter))#012 File "/usr/lib/python2.7/site-pack ages/vdsm/common/supervdsm.py", line 55, in __call__#012 return callMethod()#012 File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in <lambda>#012 **kwargs)#012 File "<string>", line 2, in get_lldp_info#012 File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod#012 raise convert_to_error(kin d, result)#012TlvReportLldpError: (1, 'Agent instance for device not found \n', '', 'ib0') 4. Gluster volumes are in strange state: I "import domain" (storage->domains) that webui installer created on first step, but in "host to use" there is a drop-down list contains only host names as it added to cluster, i.e. ovirtnode1, ovirtnode{N}. And there is a red message "For data integrity make sure that the server is configured with Quorum (both client and server Quorum)" (it configured by cockpit webui installer, as i see). But imported volumes shown as "1 Up 2 Down" bricks, and on host only "localhost" bricks showed as "Online", in logs there is a message Nov 6 10:24:24 ovirtnode5 systemd: Started GlusterFS, a clustered file-system server. Nov 6 10:24:24 ovirtnode5 glusterd[229325]: [2018-11-06 06:24:24.404149] C [MSGID: 106003] [glusterd-server-quorum.c:354:glusterd_do_volume_quorum_action] 0-management: Server q uorum regained for volume data. Starting local bricks. Nov 6 10:24:24 ovirtnode5 glusterd[229325]: [2018-11-06 06:24:24.450356] C [MSGID: 106003] [glusterd-server-quorum.c:354:glusterd_do_volume_quorum_action] 0-management: Server q uorum regained for volume engine. Starting local bricks. Nov 6 10:24:24 ovirtnode5 glusterd[229325]: [2018-11-06 06:24:24.503677] C [MSGID: 106003] [glusterd-server-quorum.c:354:glusterd_do_volume_quorum_action] 0-management: Server q uorum regained for volume vmstore. Starting local bricks. What does it mean? If "quorum REGAINED" that all bricks must be started, isn't it? When i try to create volume (storage->volumes->new), press "add bricks" - there is a similar drop-down box "Bricks Host" contains only "ovirtnode" names, not "ovirtstor" ib interfaces.. If I try to use it - It cannot finished with error like "This host not in trusted pool", its true - in trusted tool there is other interface. What the right way to configure this? Are there a "some start guide" for this case with config-steps? I found https://www.ovirt.org/documentation/quickstart/quickstart-guide/ but it also out-of-date, for example it not describe GlusterFS for storage domains, has old screenshots (previous interface version), etc... I cannot find any documentation in https://www.ovirt.org/documentation/admin-guide/ For example, https://www.ovirt.org/documentation/admin-guide/chap-Logical_Networks/ Explanation of Settings in the Manage Networks Window does not contain role "Gluster network" at all... Many links point to not-existent pages, for example "For more information on these parameters, see Explanation of bridge opts Parameters." link to https://www.ovirt.org/documentation/admin-guide/chap-Logical_Networks/Explan... that "404 Not found :( Sorry, but the page you were trying to view does not exist." (it is for example, there are MANY 404 links/pages). -- Mike

07.11.2018 12:27, Mike Lykov пишет:
4. Gluster volumes are in strange state:
When i try to create volume (storage->volumes->new), press "add bricks" - there is a similar drop-down box "Bricks Host" contains only "ovirtnode" names, not "ovirtstor" ib interfaces.. If I try to use it - It cannot finished with error like "This host not in trusted pool", its true - in trusted tool there is other interface.
What the right way to configure this?
Update: I create network "storage", uncheck "required" and check "migrate" and "gluster" role. Then i attach this network in "setup host networks" to hosts, interface ib0 (but it state is out-of-sync because different MTU in DC config and real host, see previous post) But in engine.log I see this message: 2018-11-08 17:19:23,406+04 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler7) [70a2eb8a] Could not associate brick 'ovirtstor1.miac:/gluster_bricks/engine/engine' of volume '77d6bcb1-244d-4319-b3f0-e4eb73a9206c' with correct network as no gluster network found in cluster 'ea3c5a62-de76-11e8-9238-00163e062063' Why "no gluster network found in cluster" ? It is because it out-of-sync? Network is UP in webui ...
participants (1)
-
Mike Lykov