
So, in my case, I'm wondering if maybe there is some kind of weird network issue happening. The node that seems to be showing up most for the last day or two is one of the two nodes running the hosted-engine HA, and is _not_ currently hosting the engine. It seems that, at the same time the engine has trouble communicating with that node, the hosted-engine HA running on that node has trouble seeing the engine. I still can't find any actual network problem. Using another physical system, I ran fping to all the nodes and the engine with a 0.2 second interval, and that didn't show any problem (I ran it until I also saw an instance of the engine->node communication error). I'm watching ARP traffic now to see if something is sending bad answers. I'm pretty stumped at this point of what to look at next. -- Chris Adams <cma@cmadams.net>