On Thu, Aug 2, 2018 at 1:47 PM, <stuartk(a)alleninstitute.org> wrote:
OK, I've spent time capturing traffic from the Hosts in Cluster B
back to
Data Center A. I don't believe most of the traffic matters: syslog, snmp,
icmp, influxd (grafana), ssh, cfengine
After filtering out all that, I'm left with TCP 54321 -- netstat tells me
that the Python interpreter owns this port -- I'm guessing that this daemon
is talking with ovirt-engine down in Data Center A.
No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor
49152+ (that's what I expected to see ... some sort of storage dependency
between the two)
So I'm back to wondering under what happens when the conversation between
ovirt-engine and KVM instances is disrupted? Does it sound plausible that
bad things happen? Or would you say that this seems unlikely ... that
management functions may be disrupted, but operational functions would be
unaffected?
--sk
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/...
port 54321 used by both manager and hosts to inter-communicate each one
with the others:
VDSM communications with the Manager and other virtualization hosts.
I think that in case engine in site A is not able to connect to vdsmd of a
host in site B (did this happen? you only talk about disruption inside site
A but it is not clear the kind of disruption...), I think it should mark it
as not responsive and eventually fence it so that it can release VM
resources (if VMs running on it) and storage (if SPM) it is carrying on
and start on other hosts.
But if all cluster B hosts becomes unresponsive from the engine point of
view I don't know the default action what would be: perhaps freeze all
until something comes back?
Did you configure fencing in your clusters? If so, when you stop
communication inside site A, could it affect your fencing configuration
towards hosts in site B?
Gianluca