On Fri, May 3, 2019 at 8:14 PM Todd Barton <tcbarton@ipvoicedatasystems.com> wrote:
Simone,

It appears 192.168.122.13 stops routing correctly during the final stage of deployment.  After a failure of final stage, I can restart the hosted-engine VM from the cockpit and I can ping 192.168.122.13 from the Host again.  If I retry the final stage of deployment again, 192.168.122.13 stop routing correctly from Host during that process.  Below are two ping commands...the 1st one is after deploy failure (screen shot previous email) and the second one is after force-restarting the hosted-engine VM.  

[root@ovirt-dr-standalone ~]# ping 192.168.122.13
PING 192.168.122.13 (192.168.122.13) 56(84) bytes of data.
From 192.168.122.1 icmp_seq=21 Destination Host Unreachable
From 192.168.122.1 icmp_seq=22 Destination Host Unreachable
From 192.168.122.1 icmp_seq=23 Destination Host Unreachable
From 192.168.122.1 icmp_seq=24 Destination Host Unreachable
From 192.168.122.1 icmp_seq=25 Destination Host Unreachable
From 192.168.122.1 icmp_seq=26 Destination Host Unreachable
From 192.168.122.1 icmp_seq=27 Destination Host Unreachable
^C
--- 192.168.122.13 ping statistics ---
40 packets transmitted, 0 received, +7 errors, 100% packet loss, time 39041ms pipe 4

[root@ovirt-dr-standalone ~]# ping 192.168.122.13
PING 192.168.122.13 (192.168.122.13) 56(84) bytes of data.
64 bytes from 192.168.122.13: icmp_seq=2 ttl=64 time=0.560 ms
64 bytes from 192.168.122.13: icmp_seq=3 ttl=64 time=0.592 ms
64 bytes from 192.168.122.13: icmp_seq=4 ttl=64 time=0.345 ms
64 bytes from 192.168.122.13: icmp_seq=5 ttl=64 time=0.265 ms
64 bytes from 192.168.122.13: icmp_seq=6 ttl=64 time=0.374 ms
64 bytes from 192.168.122.13: icmp_seq=7 ttl=64 time=0.390 ms
64 bytes from 192.168.122.13: icmp_seq=8 ttl=64 time=0.635 ms
64 bytes from 192.168.122.13: icmp_seq=9 ttl=64 time=0.466 ms
64 bytes from 192.168.122.13: icmp_seq=10 ttl=64 time=0.376 ms
64 bytes from 192.168.122.13: icmp_seq=11 ttl=64 time=0.435 ms
64 bytes from 192.168.122.13: icmp_seq=12 ttl=64 time=0.567 ms
64 bytes from 192.168.122.13: icmp_seq=13 ttl=64 time=0.442 ms
64 bytes from 192.168.122.13: icmp_seq=14 ttl=64 time=0.402 ms
^C
--- 192.168.122.13 ping statistics ---
14 packets transmitted, 13 received, 7% packet loss, time 13000ms rtt min/avg/max/mdev = 0.265/0.449/0.635/0.108 ms


This appears to roughly be the same issue as preparing the vm...the network setup goes "off the reservation" when the deployment is making changes.  Maybe this is something caused by the virtualization setup, but I've read about others doing ovirt  hosts in VMs (like https://www.ovirt.org/blog/2018/02/up-and-running-with-ovirt-4-2-and-gluster-storage.html).

I tried reproducing this with oVirt node nested over KVM and everything worked as expected for me.
Honestly I'm not hyper-v expert but I'd suggest to try changing something (not sure exactly what) about the network definition on hyper-v side.

 

Any suggestions?  I'm getting to the point where I may need to throw in the towel on this setup, but it would be greatly advantageous to have a VM lab so I can test changes/upgrades.  I would love to find a way to make this work.
 
Todd Barton



---- On Fri, 03 May 2019 12:04:47 -0400 Simone Tiraboschi <stirabos@redhat.com> wrote ----



On Fri, May 3, 2019 at 5:27 PM Todd Barton <tcbarton@ipvoicedatasystems.com> wrote:

Simone/Dominik,

Double reply below and more info with latest attempt.

---------------

Simone...answers to your questions, using CAPS to make my responses easier to see/read.

"If I correctly understood you environment you have:
- A pfsense software firewall/router on 10.1.1.1
- Your host on 10.1.1.61
- You are accessing cockpit from a browser running on a machine on 10.1.1.101 on the same subnet"

YES

"And the issue is that once the engine created the management bridge, your client machine on 10.1.1.101 wasn't able anymore to reach your host on 10.1.1.61. Am I right?"

YES

"In this case the default gateway or other routes should't be an issue since your client is inside the same subnet."

CORRECT, IMO

"Do you think we are loosing some piece of your network configuration creating the management bridge such as a custom MTU or a VLAN id or something like that?"
 
NO CUSTOM SETUP HERE...RUNNING A PLAIN/BASIC NETWORK

"Do you think pfsense can start blocking/dropping the traffic for any reason?"

NO, ONLY USING PFSENSE TO PROVIDE DHCP AND DNS IN TEST/LAB ENVIRONMENT.

"Todd, another hint: how did you exposed cockpit over firewalld?
Maybe something in firewalld zone configuration?"

I'M USING THE OVIRT NODE MINIMAL/UTILITY INSTALL AND I DIDN'T DO ANY CUSTOMIZATION OUTSIDE OF THE INSTALL/SETUP.  IF ITS FIREWALLD, THEN ITS NOT SOMETHING THE NODE SETUP DID PROPERLY.  I CAN SEND CONFIG INFO IF YOU WOULD LIKE SEE IT, BUT THERE ARE MORE DEVELOPMENTS/INFO FURTHER DOWN IN THE EMAIL FROM ADDITIONAL EFFORTS.

---------------

Dominik,

It is in a virtualized setup as I'm testing install/setup of this version of ovirt/node mainly as a lab/testing setup, but I was hoping to use this environment as a temporary data center to keep critical VMs running if necessary while performing the reinstall/rebuild of my physical system.  I'm running this in hyper-v with the virtual switch/network setup as a private network with no restrictions from what I can see.  The VMs are setup to allow mac spoofing and the mac addresses are all unique.

The virtualization could the be the culprit, but I have no idea how/why it would be causing a problem during this specific point of the install.  

See new info below and I'm curious about your thoughts on this issue.

---------------

**** Additional info from "Redeploy" option ****

To see what would happen, I attempted the "Redeploy" option in the cockpit after the reboot described in previous email.  Upon Redeploy using the same settings, it made it through the "Prepare VM" stage.  I didn't manually perform a hosted-engine cleanup command, but it looked like deploy script cleaned up everything before attempting the redeployment.  I've repeated this behavior twice, so for some reason the redeployment works after the 1st failure.

Continuing on to specify the "storage" settings to finalize deployment, it failed at "Obtain SSO token using username/password credentials" because it couldn't connect to the engine.  The HostedEngineLocal is running on the Host with 192.168.122.13 as the IP (the temp IP).  Trying to ping that address from the Host gets a "Destination Host Unreachable" from 192.168.122.1.  Logging into the HostedEngineLocal console from the cockpit, I can't ping 192.168.122.13 getting the same unreachable message from within the hosted-engine,

This looks really strange to me.
I'd suggest to focus on this.
 
but I can ping the Host address 10.1.1.61 from within the hosted-engine.

I'm assuming the hosted-engine should still be on this temp/private IP until after completion of the HE deployment. 

Until 85% of the process.
Then it will shutdown the local bootstrap VM (running on default natted libvirt network) and it will transfer the content of it's disk over the disk created by the engine on the shared storage.
Latest step is simply activating ovirt-ha-agent to let it start the engine VM from the shared storage as expected.
The final engine VM will be attached to the management bridge created by the deployment process.
 
It seems like if I could make 192.168.122.13 routable from the Host, I could get this to work.  Any advice on how to fix this?...I don't understand why I can't ping 192.168.122.13 from anywhere.

The default libvirt network has its specific bridge; the host has address 192.168.122.1 there and it will be able to communicate to the VM running on 192.168.122.13.
Other machines are not required to communicate with the engine during the deployment so we are not routing 192.168.122.1/24 neither masquerading it for NAT traversal.
 


I've attached logs and ip info commands from Host as well as screen shots from cockpit including storage/final deployment error and hosted-engine basic networking info.

Thanks,

Todd B.









---- On Thu, 02 May 2019 04:51:34 -0400 Dominik Holler <dholler@redhat.com> wrote ----

On Thu, 2 May 2019 09:57:08 +0200
Simone Tiraboschi <stirabos@redhat.com> wrote:

> On Thu, May 2, 2019 at 5:22 AM Todd Barton <tcbarton@ipvoicedatasystems.com>
> wrote:
>
> > Didi,
> >
> > I was able to carve out some time to attempt the original basic setup
> > again this evening. The result was similar to my original post. During HE
> > deployment, in the process of waiting for the host to come up (cockpit
> > message), the networking is disrupted while building the bridged network
> > and the host becomes unreachable.
> >
> > In this state, I can't ping the host from external machine and the
> > ping/nslookup is non-functional from within the host. Nslookup returns
> > "connection time out; no servers could be reached". The networking appears
> > to be completely down although various command make it appear operational.
> >
> > Upon rebooting the Host (the host locked up on reboot attempt and needed
> > to be reset), the message appears "libvirt-guests is configured not to
> > start any guests on boot". After the reboot, the cockpit becomes
> > responsive again and loging-in displays the "This system is already
> > registered ovirt-dr-he-standalone.ipvoicedatasystems.lan!" with a
> > "Redeploy" button. Looking at the networking setup in cockpit, it appears
> > the "ovritmgmt" network is setup, but the hosted engine did not complete
> > deployment and startup. The /etc/host file still contains the temporary IP
> > address used in deployment and a HostedEngineLocal is listed under virtual
> > machines, but it is not running.
> >
> > Please advise with any help/input on why this is happening. *Your help
> > is much appreciated.*
> >
> >
> > Here are the settings and diagnostic info/logs.
> >
> > This is a single-host hyper-converged setup for lab testing.
> >
> > - Host behind pfsense firewall with gateway IP address 10.1.1.1/24. The
> > Host machine and the machine accessing the cockpit from IP address
> > 10.1.1.101 are the only devices on the subnet (other than the router). It
> > really can't get any simpler.
> >
> > - Host setup with single nic eth0
> > - Hostname is setup as fully FQDN on Host
> > - Static IP setup on Host with gateway and DNS server set to 10.1.1.1
> > - FQDNs confirmed resolvable on subnet via dns server at 10.1.1.1 in
> > pfsense
> > Host = ovirt-dr-standalone.ipvoicedatasystems.lan , IP = 10.1.1.61
> > Hosted Engine VM = ovirt-dr-he-standalone.ipvoicedatasystems.lan ,
> > IP = 10.1.1.60
> >
> > - Gluster portion of cockpit setup installed as expected without problems
> >
> >
> Everything defined here looks OK to me.
>
>
> > - Hosted-Engine cockpit deployment executed with settings in attached
> > screen shots.
> > - Hosted engine setup and vdsm logs are attached in zip before the reboot.
> > - Other network info captured in text files included in zip.
> > - Screen shot of post reboot network setup in cockpit.
> >
> >
> According to VDSM logs
>
> setupNetworks got executed here:
>
> 2019-05-01 20:22:14,656-0400 INFO (jsonrpc/0) [api.network] START
> setupNetworks(networks={u'ovirtmgmt': {u'ipv6autoconf': True, u'nic':
> u'eth0', u'ipaddr': u'10.1.1.61', u'switch': u'legacy', u'mtu': 1500,
> u'netmask': u'255.255.255.0', u'dhcpv6': False, u'STP': u'no', u'bridged':
> u'true', u'gateway': u'10.1.1.1', u'defaultRoute': True}}, bondings={},
> options={u'connectivityCheck': u'true', u'connectivityTimeout': 120,
> u'commitOnSuccess': False}) from=::ffff:192.168.122.13,47544,
> flow_id=2e7d10f2 (api:48)
>
> and it successfully completed at:
> 2019-05-01 20:22:22,904-0400 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC
> call Host.confirmConnectivity succeeded in 0.00 seconds (__init__:312)
> 2019-05-01 20:22:22,916-0400 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC
> call Host.confirmConnectivity succeeded in 0.00 seconds (__init__:312)
> 2019-05-01 20:22:22,917-0400 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC
> call Host.confirmConnectivity succeeded in 0.00 seconds (__init__:312)
> 2019-05-01 20:22:23,469-0400 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
> call Host.confirmConnectivity succeeded in 0.00 seconds (__init__:312)
> 2019-05-01 20:22:23,583-0400 INFO (jsonrpc/0) [api.network] FINISH
> setupNetworks return={'status': {'message': 'Done', 'code': 0}}
> from=::ffff:192.168.122.13,47544, flow_id=2e7d10f2 (api:54)
> 2019-05-01 20:22:23,583-0400 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC
> call Host.setupNetworks succeeded in 8.93 seconds (__init__:312)
> 2019-05-01 20:22:24,033-0400 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
> call Host.confirmConnectivity succeeded in 0.01 seconds (__init__:312)
>
> Host.confirmConnectivity means that, after setupNetworks, the bootstrap
> engine VM was still able to reach the engine as expected.
>
>
> and indeed after that we see:
>
> 2019-05-01 20:22:24,051-0400 INFO (jsonrpc/4) [api.host] START
> getCapabilities() from=::ffff:192.168.122.13,47544, flow_id=2e7d10f2
> (api:48)
> ...
> 2019-05-01 20:22:25,557-0400 INFO (jsonrpc/4) [api.host] FINISH
> getCapabilities return={'status': {'message': 'Done', 'code': 0}, 'info':
> {u'HBAInventory': {u'iSCSI': [{u'InitiatorName':
> u'iqn.1994-05.com.redhat:79982989d81e'}], u'FC': []}, u'packages2':
> {u'kernel': {u'release': u'957.10.1.el7.x86_64', u'version': u'3.10.0'},
> u'glusterfs-rdma': {u'release': u'2.el7', u'version': u'5.3'},
> u'glusterfs-fuse': {u'release': u'2.el7', u'version': u'5.3'},
> u'spice-server': {u'release': u'6.el7_6.1', u'version': u'0.14.0'},
> u'librbd1': {u'release': u'4.el7', u'version': u'10.2.5'}, u'vdsm':
> {u'release': u'1.el7', u'version': u'4.30.11'}, u'qemu-kvm': {u'release':
> u'18.el7_6.3.1', u'version': u'2.12.0'}, u'openvswitch': {u'release':
> u'3.el7', u'version': u'2.10.1'}, u'libvirt': {u'release': u'10.el7_6.4',
> u'version': u'4.5.0'}, u'ovirt-hosted-engine-ha': {u'release': u'1.el7',
> u'version': u'2.3.1'}, u'qemu-img': {u'release': u'18.el7_6.3.1',
> u'version': u'2.12.0'}, u'mom': {u'release': u'1.el7.centos', u'version':
> u'0.5.12'}, u'glusterfs': {u'release': u'2.el7', u'version': u'5.3'},
> u'glusterfs-cli': {u'release': u'2.el7', u'version': u'5.3'},
> u'glusterfs-server': {u'release': u'2.el7', u'version': u'5.3'},
> u'glusterfs-geo-replication': {u'release': u'2.el7', u'version': u'5.3'}},
> u'numaNodeDistance': {u'0': [10]}, u'cpuModel': u'Intel(R) Xeon(R) CPU
> X5675 @ 3.07GHz', u'nestedVirtualization': False, u'liveMerge':
> u'true', u'hooks': {u'after_vm_start': {u'openstacknet_utils.py': {u'md5':
> u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5':
> u'ea0a5a715da8c1badbcda28e8b8fa00e'}}, u'after_network_setup':
> {u'30_ethtool_options': {u'md5': u'ce1fbad7aa0389e3b06231219140bf0d'}},
> u'after_vm_destroy': {u'delete_vhostuserclient_hook': {u'md5':
> u'c2f279cc9483a3f842f6c29df13994c1'}, u'50_vhostmd': {u'md5':
> u'bdf4802c0521cf1bae08f2b90a9559cf'}}, u'before_vm_start':
> {u'50_hostedengine': {u'md5': u'95c810cdcfe4195302a59574a5148289'},
> u'50_vhostmd': {u'md5': u'9206bc390bcbf208b06a8e899581be2d'}},
> u'after_device_migrate_destination': {u'openstacknet_utils.py': {u'md5':
> u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5':
> u'6226fbc4d1602994828a3904fc1b875d'}},
> u'before_device_migrate_destination': {u'50_vmfex': {u'md5':
> u'49caba1a5faadd8efacef966f79bc30a'}}, u'after_get_caps':
> {u'openstacknet_utils.py': {u'md5': u'1ed38ddf30f8a9c7574589e77e2c0b1f'},
> u'50_openstacknet': {u'md5': u'5c3a9ab6e06e039bdd220c0216e45809'},
> u'ovirt_provider_ovn_hook': {u'md5': u'4c4b1d2d5460e6a65114ae36cb775df6'}},
> u'after_nic_hotplug': {u'openstacknet_utils.py': {u'md5':
> u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5':
> u'6226fbc4d1602994828a3904fc1b875d'}}, u'before_vm_migrate_destination':
> {u'50_vhostmd': {u'md5': u'9206bc390bcbf208b06a8e899581be2d'}},
> u'after_device_create': {u'openstacknet_utils.py': {u'md5':
> u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5':
> u'6226fbc4d1602994828a3904fc1b875d'}}, u'before_vm_dehibernate':
> {u'50_vhostmd': {u'md5': u'9206bc390bcbf208b06a8e899581be2d'}},
> u'before_nic_hotplug': {u'50_vmfex': {u'md5':
> u'49caba1a5faadd8efacef966f79bc30a'}, u'openstacknet_utils.py': {u'md5':
> u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5':
> u'ec8d5d7ba063a109f749cd63c7a2b58d'},
> u'20_ovirt_provider_ovn_vhostuser_hook': {u'md5':
> u'a8af653b7386c138b2e6e9738bd6b62c'}, u'10_ovirt_provider_ovn_hook':
> {u'md5': u'73822988042847bab1ea832a6b9fa837'}}, u'before_network_setup':
> {u'50_fcoe': {u'md5': u'28c352339c8beef1e1b05c67d106d062'}},
> u'before_device_create': {u'50_vmfex': {u'md5':
> u'49caba1a5faadd8efacef966f79bc30a'}, u'openstacknet_utils.py': {u'md5':
> u'1ed38ddf30f8a9c7574589e77e2c0b1f'}, u'50_openstacknet': {u'md5':
> u'ec8d5d7ba063a109f749cd63c7a2b58d'},
> u'20_ovirt_provider_ovn_vhostuser_hook': {u'md5':
> u'a8af653b7386c138b2e6e9738bd6b62c'}, u'10_ovirt_provider_ovn_hook':
> {u'md5': u'73822988042847bab1ea832a6b9fa837'}}}, u'supportsIPv6': True,
> u'realtimeKernel': False, u'vmTypes': [u'kvm'], u'liveSnapshot': u'true',
> u'cpuThreads': u'12', u'kdumpStatus': 0, u'networks': {u'ovirtmgmt':
> {u'iface': u'ovirtmgmt', u'ipv6autoconf': True, u'addr': u'10.1.1.61',
> u'dhcpv6': False, u'ipv6addrs': [], u'switch': u'legacy', u'bridged': True,
> u'southbound': u'eth0', u'dhcpv4': False, u'netmask': u'255.255.255.0',
> u'ipv4defaultroute': True, u'stp': u'off', u'ipv4addrs': [u'10.1.1.61/24'],
> u'mtu': u'1500', u'ipv6gateway': u'::', u'gateway': u'10.1.1.1', u'ports':
> [u'eth0']}}, u'kernelArgs':
> u'BOOT_IMAGE=/ovirt-node-ng-4.3.2-0.20190319.0+1/vmlinuz-3.10.0-957.10.1.el7.x86_64
> root=/dev/onn/ovirt-node-ng-4.3.2-0.20190319.0+1 ro crashkernel=auto
> rd.lvm.lv=onn/ovirt-node-ng-4.3.2-0.20190319.0+1 rd.lvm.lv=onn/swap rhgb
> quiet LANG=en_US.UTF-8 img.bootid=ovirt-node-ng-4.3.2-0.20190319.0+1',
> u'domain_versions': [0, 2, 3, 4, 5], u'bridges': {u'ovirtmgmt':
> {u'ipv6autoconf': True, u'addr': u'10.1.1.61', u'dhcpv6': False,
> u'ipv6addrs': [], u'mtu': u'1500', u'dhcpv4': False, u'netmask':
> u'255.255.255.0', u'ipv4defaultroute': True, u'stp': u'off', u'ipv4addrs':
> [u'10.1.1.61/24'], u'ipv6gateway': u'::', u'gateway': u'10.1.1.1', u'opts':
> {u'multicast_last_member_count': u'2', u'vlan_protocol': u'0x8100',
> u'hash_elasticity': u'4', u'multicast_query_response_interval': u'1000',
> u'group_fwd_mask': u'0x0', u'multicast_snooping': u'1',
> u'multicast_startup_query_interval': u'3125', u'hello_timer': u'0',
> u'multicast_querier_interval': u'25500', u'max_age': u'2000', u'hash_max':
> u'512', u'stp_state': u'0', u'topology_change_detected': u'0', u'priority':
> u'32768', u'multicast_igmp_version': u'2',
> u'multicast_membership_interval': u'26000', u'root_path_cost': u'0',
> u'root_port': u'0', u'multicast_stats_enabled': u'0',
> u'multicast_startup_query_count': u'2', u'nf_call_iptables': u'0',
> u'vlan_stats_enabled': u'0', u'hello_time': u'200', u'topology_change':
> u'0', u'bridge_id': u'8000.00155d380110', u'topology_change_timer': u'0',
> u'ageing_time': u'30000', u'nf_call_ip6tables': u'0',
> u'multicast_mld_version': u'1', u'gc_timer': u'29395', u'root_id':
> u'8000.00155d380110', u'nf_call_arptables': u'0', u'group_addr':
> u'1:80:c2:0:0:0', u'multicast_last_member_interval': u'100',
> u'default_pvid': u'1', u'multicast_query_interval': u'12500',
> u'multicast_query_use_ifaddr': u'0', u'tcn_timer': u'0',
> u'multicast_router': u'1', u'vlan_filtering': u'0', u'multicast_querier':
> u'0', u'forward_delay': u'0'}, u'ports': [u'eth0']}, u'virbr0':
> {u'ipv6autoconf': False, u'addr': u'192.168.122.1', u'dhcpv6': False,
> u'ipv6addrs': [], u'mtu': u'1500', u'dhcpv4': False, u'netmask':
> u'255.255.255.0', u'ipv4defaultroute': False, u'stp': u'on', u'ipv4addrs':
> [u'192.168.122.1/24'], u'ipv6gateway': u'::', u'gateway': u'', u'opts':
> {u'multicast_last_member_count': u'2', u'vlan_protocol': u'0x8100',
> u'hash_elasticity': u'4', u'multicast_query_response_interval': u'1000',
> u'group_fwd_mask': u'0x0', u'multicast_snooping': u'1',
> u'multicast_startup_query_interval': u'3125', u'hello_timer': u'138',
> u'multicast_querier_interval': u'25500', u'max_age': u'2000', u'hash_max':
> u'512', u'stp_state': u'1', u'topology_change_detected': u'0', u'priority':
> u'32768', u'multicast_igmp_version': u'2',
> u'multicast_membership_interval': u'26000', u'root_path_cost': u'0',
> u'root_port': u'0', u'multicast_stats_enabled': u'0',
> u'multicast_startup_query_count': u'2', u'nf_call_iptables': u'0',
> u'vlan_stats_enabled': u'0', u'hello_time': u'200', u'topology_change':
> u'0', u'bridge_id': u'8000.5254008ac0fb', u'topology_change_timer': u'0',
> u'ageing_time': u'30000', u'nf_call_ip6tables': u'0',
> u'multicast_mld_version': u'1', u'gc_timer': u'4000', u'root_id':
> u'8000.5254008ac0fb', u'nf_call_arptables': u'0', u'group_addr':
> u'1:80:c2:0:0:0', u'multicast_last_member_interval': u'100',
> u'default_pvid': u'1', u'multicast_query_interval': u'12500',
> u'multicast_query_use_ifaddr': u'0', u'tcn_timer': u'0',
> u'multicast_router': u'1', u'vlan_filtering': u'0', u'multicast_querier':
> u'0', u'forward_delay': u'200'}, u'ports': [u'vnet0', u'virbr0-nic']}},
> u'uuid': u'cb4aee34-27aa-064d-aaf1-2c27871125bc', u'onlineCpus':
> u'0,1,2,3,4,5,6,7,8,9,10,11', u'nameservers': [u'10.1.1.1'], u'nics':
> {u'eth0': {u'ipv6autoconf': False, u'addr': u'', u'speed': 10000,
> u'dhcpv6': False, u'ipv6addrs': [], u'mtu': u'1500', u'dhcpv4': False,
> u'netmask': u'', u'ipv4defaultroute': False, u'ipv4addrs': [], u'hwaddr':
> u'00:15:5d:38:01:10', u'ipv6gateway': u'::', u'gateway': u''}},
> u'software_revision': u'1', u'hostdevPassthrough': u'false',
> u'clusterLevels': [u'4.1', u'4.2', u'4.3'], u'cpuFlags':
> u'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ss,ht,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,rep_good,nopl,xtopology,eagerfpu,pni,pclmulqdq,vmx,ssse3,cx16,sse4_1,sse4_2,popcnt,aes,hypervisor,lahf_lm,ibrs,ibpb,stibp,tpr_shadow,vnmi,ept,vpid,spec_ctrl,intel_stibp,arch_capabilities,model_Opteron_G2,model_kvm32,model_coreduo,model_Conroe,model_Nehalem,model_Westmere-IBRS,model_Opteron_G1,model_core2duo,model_Nehalem-IBRS,model_qemu32,model_Penryn,model_pentium2,model_pentium3,model_qemu64,model_Westmere,model_kvm64,model_pentium,model_486',
> u'kernelFeatures': {u'RETP': 1, u'IBRS': 0, u'PTI': 1},
> u'ISCSIInitiatorName': u'iqn.1994-05.com.redhat:79982989d81e',
> u'netConfigDirty': u'True', u'selinux': {u'mode': u'1'},
> u'autoNumaBalancing': 0, u'reservedMem': u'321', u'bondings': {},
> u'software_version': u'4.30.11', u'supportedENGINEs': [u'4.1', u'4.2',
> u'4.3'], u'vncEncrypted': False, u'backupEnabled': False, u'cpuSpeed':
> u'3063.656', u'numaNodes': {u'0': {u'totalMemory': u'64248', u'cpus': [0,
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}}, u'cpuSockets': u'1', u'vlans': {},
> u'version_name': u'Snow Man', 'lastClientIface': 'ovirtmgmt', u'cpuCores':
> u'6', u'hostedEngineDeployed': False, u'hugepages': [1048576, 2048],
> u'guestOverhead': u'65', u'additionalFeatures': [u'libgfapi_supported',
> u'GLUSTER_SNAPSHOT', u'GLUSTER_GEO_REPLICATION',
> u'GLUSTER_BRICK_MANAGEMENT'], u'openstack_binding_host_ids':
> {u'OPENSTACK_OVN': u'ovirt-dr-standalone.ipvoicedatasystems.lan',
> u'OPEN_VSWITCH': u'ovirt-dr-standalone.ipvoicedatasystems.lan',
> u'OVIRT_PROVIDER_OVN': u'a86b72aa-c9d2-488d-b04d-1ccf4bb010e7'},
> u'kvmEnabled': u'true', u'memSize': u'64248', u'emulatedMachines':
> [u'pc-i440fx-rhel7.1.0', u'pc-q35-rhel7.3.0', u'rhel6.3.0',
> u'pc-i440fx-rhel7.5.0', u'pc-i440fx-rhel7.0.0', u'rhel6.1.0',
> u'pc-q35-rhel7.6.0', u'pc-i440fx-rhel7.4.0', u'rhel6.6.0',
> u'pc-q35-rhel7.5.0', u'rhel6.2.0', u'pc', u'pc-i440fx-rhel7.3.0', u'q35',
> u'pc-i440fx-rhel7.2.0', u'rhel6.4.0', u'pc-q35-rhel7.4.0',
> u'pc-i440fx-rhel7.6.0', u'rhel6.0.0', u'rhel6.5.0'], u'rngSources':
> [u'hwrng', u'random'], u'operatingSystem': {u'release':
> u'6.1810.2.el7.centos', u'pretty_name': u'oVirt Node 4.3.2', u'version':
> u'7', u'name': u'RHEL'}}} from=::ffff:192.168.122.13,47544,
> flow_id=2e7d10f2 (api:54)
> 2019-05-01 20:22:25,579-0400 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC
> call Host.getCapabilities succeeded in 1.53 seconds (__init__:312)
> 2019-05-01 20:22:25,743-0400 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC
> call Host.setSafeNetworkConfig succeeded in 0.02 seconds (__init__:312)
>
> where Host.setSafeNetworkConfig means that the engine committed the new
> network configuration since everything was fine from its point of view.
>
> And after that we also have:
> 2019-05-01 20:22:29,748-0400 INFO (jsonrpc/7) [api.host] START
> getHardwareInfo() from=::ffff:192.168.122.13,47544 (api:48)
> 2019-05-01 20:22:29,805-0400 INFO (jsonrpc/7) [api.host] FINISH
> getHardwareInfo return={'status': {'message': 'Done', 'code': 0}, 'info':
> {'systemProductName': 'Virtual Machine', 'systemUUID':
> 'cb4aee34-27aa-064d-aaf1-2c27871125bc', 'systemSerialNumber':
> '9448-7597-9700-7577-0920-4186-69', 'systemVersion': '7.0',
> 'systemManufacturer': 'Microsoft Corporation'}}
> from=::ffff:192.168.122.13,47544 (api:54)
>
>
> So I'm pretty sure that the engine VM was correctly able to talk with the
> host after the network configuration change so, on my opinion, we should
> focus somewhere else.
>
> If I correctly understood you environment you have:
> - A pfsense software firewall/router on 10.1.1.1
> - Your host on 10.1.1.61
> - You are accessing cockpit from a browser running on a machine on
> 10.1.1.101 on the same subnet
>
> And the issue is that once the engine created the management bridge, your
> client machine on 10.1.1.101 wasn't able anymore to reach your host on
> 10.1.1.61. Am I right?
>
> In this case the default gateway or other routes should't be an issue since
> your client is inside the same subnet.
>
> Do you think we are loosing some piece of your network configuration
> creating the management bridge such as a custom MTU or a VLAN id or
> something like that?
>
> Do you think pfsense can start blocking/dropping the traffic for any reason?
>
> Dominik, any hint from you?
>

Layer 3 looks good, so let's check layer 2:
I understood that the oVirt host is VM.
Has the networking interface of this VM some kind of mac spoofing
protection or any other kind of filtering?

Are the MAC addresses of all involved interfaces, including the ones
from the router, unique?

>
>
> >
> > Regards,
> > *Todd Barton*
> >
> >
> > ---- On Wed, 01 May 2019 11:29:39 -0400 *Todd Barton
> > wrote ----
> >
> > Thanks again...I've done all the detail work back in the 3.x days and I
> > thought (and was hoping) the node/cockpit setup would make this easier to
> > get everything lined up for the HE deploy, but it is not working as
> > expected. I've followed best practices/recommendations, but realize there
> > are not absolute specifics in these recommendations...there are a lot of
> > either/or statements...which is why I was asking for recommendations. I've
> > reviewed many articles including the "up and running" one like
> > In everything I've looked at there isn't anything new or different vs what
> > I've already done or attempted.
> >
> > I was very methodical in my initial attempts as I did in my initial
> > install of v3.3 years ago which took many attempts and methodical
> > configuration to get it up and setup the way I wanted. What I'm trying to
> > understand is why its not coming up in a lab setting with what I would
> > consider to be a pretty remedial setup.
> >
> > I'll get back to a basic setup and run through the process again today or
> > tomorrow and post logs of the failure.
> >
> > Regards,
> >
> > *Todd Barton*
> >
> >
> > ---- On Wed, 01 May 2019 01:50:49 -0400 *Yedidyah Bar David
> > <didi@redhat.com <didi@redhat.com>>* wrote ----
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list -- users@ovirt.org
> > To unsubscribe send an email to users-leave@ovirt.org
> > oVirt Code of Conduct:
> > List Archives:
> >
> > On Tue, Apr 30, 2019 at 4:09 PM Todd Barton
> > >
> > > Thanks a bunch for the reply Didi and Simone. I will admit this last
> > setup was a bit of wild attempt to see if i could get it working somehow so
> > maybe it wasn't the best example to submit...and yeah, should have been /24
> > subnets. Initially I tried the single nic setup, but the outcome seemed to
> > be the same scenario.
> > >
> > > Honestly I've run through this setup so many times in the last week its
> > all a blur. I started messing multiple nics in latest attempts to see if
> > this was something specific I should do in a cockpit setup as one of the
> > articles I read suggested multiple interfaces to separate traffic.
> > >
> > > My "production" 4.0 environment (currently a failed upgrade with a down
> > host that I can't seem to get back online) is 3 host gluster on 4 bonded
> > 1Gbps links. With the exception of the upgrade issue/failure, it has been
> > rock-solid with good performance and I've only restarted hosts on upgrades
> > in 4+ years. There are a few networking changes i would like to make in a
> > rebuild, but I wanted to test various options before implementing. Getting
> > a single nic environment was the initial goal to get started.
> > >
> > > I'm doing this testing in a virtualized setup with pfsense as the
> > firewall/router and I can setup hosts/nics however I want. I will start
> > over again with more straightforward setup and get more data on failure.
> > Considering I can setup the environment how i want, what would be your
> > recommended config for a single nic(or single bond) setup using cockpit?
> > Static IPs with host file resolution, DHCP with mac specific IPs, etc.
> >
> > Much of such decisions is a matter of personal preferences,
> > acquaintance with the relevant technologies and tooling you have
> > around them, local needs/policies/mandates, existing infrastructure,
> > etc.
> >
> > If you search the net, e.g. for "ovirt best practices" or "RHV best
> > practices", you can find various articles etc. that can provide some
> > good guidelines/ideas.
> >
> > I suggest to read around a bit, then spend some good time on planning,
> > then carefully and systematically implement your design, verifying
> > each step right after doing it. When you run into problems, tell us
> > :-). Ideally, IMO, you should not give up on your design due to such
> > problems and try workarounds, inferior (in your eyes) solutions, etc.,
> > unless you manage to find existing open bugs that describe your
> > problem and you decide you can't want until they are solved. Instead,
> > try to fix problems, perhaps with the list members' help.
> >
> > I realize spending a week on what is in your perception a simple,
> > straightforward task, does not leave you in the best mood for such a
> > methodical next attempt. Perhaps first take a break and do something
> > else :-), then start from a clean and fresh hardware/software
> > environment and mind.
> >
> > Good luck and best regards,
> >
> > >
> > > Thank you,
> > >
> > > Todd Barton
> > >
> > >
> > >
> > >
> > > ---- On Tue, 30 Apr 2019 05:20:04 -0400 Simone Tiraboschi <
> > stirabos@redhat.com> wrote ----
> > >
> > >
> > >
> > > On Tue, Apr 30, 2019 at 9:50 AM Yedidyah Bar David <didi@redhat.com>
> > wrote:
> > >
> > > On Tue, Apr 30, 2019 at 5:09 AM Todd Barton
> > > >
> > > > I've having to rebuild an environment that started back in the early
> > 3.x days. A lot has changed and I'm attempting to use the Ovirt Node based
> > setup to build a new environment, but I can't get through the hosted engine
> > deployment process via the cockpit (I've done command line as well). I've
> > tried static DHCP address and static IPs as well as confirmed I have
> > resolvable host-names. This is a test environment so I can work through any
> > issues in deployment.
> > > >
> > > > When the cockpit is displaying the waiting for host to come up task,
> > the cockpit gets disconnected. It appears to a happen when the bridge
> > network is setup. At that point, the deployment is messed up and I can't
> > return to the cockpit. I've tried this with one or two nic/interfaces and
> > tried every permutation of static and dynamic ip addresses. I've spent a
> > week trying different setups and I've got to be doing something stupid.
> > > >
> > > > Attached is a screen capture of the resulting IP info after my latest
> > try failing. I used two nics, one for the gluster and bridge network and
> > the other for the ovirt cockpit access. I can't access cockpit on either ip
> > address after the failure.
> > > >
> > > > I've attempted this setup as both a single host hyper-converged setup
> > and a three host hyper-converged environment...same issue in both.
> > > >
> > > > Can someone please help me or give me some thoughts on what is wrong?
> > >
> > > There are two parts here: 1. Fix it so that you can continue (and so
> > > that if it happens to you on production, you know what to do) 2. Fix
> > > the code so that it does not happen again. They are not necessarily
> > > identical (or even very similar).
> > >
> > > At the point in time of taking the screen capture:
> > >
> > > 1. Did the ovirtmgmt bridge get the IP address of the intended nic?
> > Which one?
> > >
> > > 2. Did you check routing? Default gateway, or perhaps you had/have
> > > specific other routes?
> > >
> > > 3. What nics are in the bridge? Can you check/share output of 'brctl
> > show'?
> > >
> > > 4. Probably not related, just noting: You have there (currently on
> > > eth0 and on ovirtmgmt, perhaps you tried other combinations):
> > > 10.1.2.61/16 and 10.1.1.61/16 . It seems like you wanted two different
> > > subnets, but are actually using a single one. Perhaps you intended to
> > > use 10.1.2.61/24 and 10.1.1.61/24.
> > >
> > >
> > > Good catch: the issue comes exactly form here!
> > > Please see:
> > >
> > > The issue happens when the user has two interfaces configured on the
> > same IP subnet, the default gateway is configured to be reached from one of
> > the two interfaces and the user chooses to create the management bridge on
> > the other one.
> > > When the engine, adding the host, creates the management bridge it also
> > tries to configure the default gateway on the bridge and for some reason
> > this disrupt the external connectivity on the host and the the user is
> > going to loose it.
> > >
> > > If you intend to use one interface for gluster and the other for the
> > management network I'd strongly suggest to use two distinct subnets having
> > the default gateway on the subnet you are going to use for the management
> > network.
> > >
> > > If you want to use two interfaces for reliability reasons I'd strongly
> > suggest to create a bond of the two instead.
> > >
> > > Please also notice that deploying a three host hyper-converged
> > environment over a single 1 gbps interface will be really penalizing in
> > terms of storage performances.
> > > Each data has to be written on the host itself and on the two remote
> > ones so you are going to have 1000 mbps / 2 (external replicas ) / 8
> > (bit/bytes) = a max of 62.5 MB/s sustained throughput shared between all
> > the VMs and this ignoring all the overheads.
> > > In practice it will be much less ending in a barely usable environment.
> > >
> > > I'd strongly suggest to move to a 10 gbps environment if possible, or to
> > bond a few 1 gbps nics for gluster.
> > >
> > >
> > > 5. Can you ping from/to these two addresses from/to some other machine
> > > on the network? Your laptop? The storage?
> > >
> > > 6. If possible, please check/share relevant logs, including (from the
> > > host) /var/log/vdsm/* and /var/log/ovirt-hosted-engine-setup/*.
> > >
> > > Thanks and best regards,
> > > --
> > > Didi
> > > _______________________________________________
> > > Users mailing list -- users@ovirt.org
> > > To unsubscribe send an email to users-leave@ovirt.org
> > > oVirt Code of Conduct:
> > > List Archives:
> > >
> > >
> > >
> >
> >
> > --
> > Didi
> >
> >
> >
> >





--

Simone Tiraboschi

He / Him / His

Principal Software Engineer

Red Hat


stirabos@redhat.com   







--

Simone Tiraboschi

He / Him / His

Principal Software Engineer

Red Hat

stirabos@redhat.com   

@redhatjobs   redhatjobs @redhatjobs