On Thu, Aug 2, 2018 at 1:47 PM, <stuartk@alleninstitute.org> wrote:
OK, I've spent time capturing traffic from the Hosts in Cluster B back to Data Center A.  I don't believe most of the traffic matters:  syslog, snmp, icmp, influxd (grafana), ssh, cfengine

After filtering out all that, I'm left with TCP 54321 -- netstat tells me that the Python interpreter owns this port -- I'm guessing that this daemon is talking with ovirt-engine down in Data Center A.

No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor 49152+  (that's what I expected to see ... some sort of storage dependency between the two)

So I'm back to wondering under what happens when the conversation between ovirt-engine and KVM instances is disrupted?  Does it sound plausible that bad things happen?  Or would you say that this seems unlikely ... that management functions may be disrupted, but operational functions would be unaffected?

--sk
 
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/installation_guide/networking-requirements#host-firewall-requirements_RHV_install

port 54321 used by both manager and hosts to inter-communicate each one with the others:

VDSM communications with the Manager and other virtualization hosts.

I think that in case engine in site A is not able to connect to vdsmd of a host in site B (did this happen? you only talk about disruption inside site A but it is not clear the kind of disruption...), I think it should mark it as not responsive and eventually fence it so that it can release VM resources (if VMs running on it) and storage (if SPM)  it is carrying on and start on other hosts.
But if all cluster B hosts becomes unresponsive from the engine point of view I don't know the default action what would be: perhaps freeze all until something comes back?

Did you configure fencing in your clusters? If so, when you stop communication inside site A, could it affect your fencing configuration towards hosts in site B?

Gianluca