Network instability after upgrade 3.6.0 -> 3.6.1

Hello, I have one testing host (only one host) with self hosted engine and 2 VM (one linux and one windows). After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works discontinuously. Every 10 minutes HA agent restart hosted engine VM because result down. But the machine is UP, only the network stop to work for some minutes. I activate global maintenace mode to prevent engine reboot. If I ssh to the hosted engine sometimes the connection work and sometimes no. Using VNC connection to engine I see that sometime VM reach external network and sometimes no. If I do a tcpdump on phisical ethernet interface I don't see any packet when network on vm don't work. Same thing happens fo others two VM. Before the upgrade I never had network problems.

I partially solve the problem. My host machine has 2 network interfaces with a bond. The bond was configured with mode=4 (802.3ad) and switch was configured in the same way. If I remove one network cable the network become stable. With both cables attached the network is instable. I removed the link aggregation configuration from switch and change the bond in mode=2 (balance-xor). Now the network are stable. The strange thing is that previous configuration worked fine for one year... since the last upgrade. Now ha-agent don't reboot the hosted-engine anymore, but I receive two emails from brocker evere 2/5 minutes. First a mail with "ovirt-hosted-engine state transition StartState-ReinitializeFSM" and after "ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting" Il 17/12/2015 10.51, Stefano Danzi ha scritto:
Hello, I have one testing host (only one host) with self hosted engine and 2 VM (one linux and one windows).
After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works discontinuously. Every 10 minutes HA agent restart hosted engine VM because result down. But the machine is UP, only the network stop to work for some minutes. I activate global maintenace mode to prevent engine reboot. If I ssh to the hosted engine sometimes the connection work and sometimes no. Using VNC connection to engine I see that sometime VM reach external network and sometimes no. If I do a tcpdump on phisical ethernet interface I don't see any packet when network on vm don't work.
Same thing happens fo others two VM.
Before the upgrade I never had network problems. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

I found this in vdsm.log and I think that could be the problem: Thread-3771::ERROR::2015-12-18 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) Connection closed: Connection closed Thread-3771::ERROR::2015-12-18 16:18:58,597::API::1847::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}: Connection closed Il 17/12/2015 18.51, Stefano Danzi ha scritto:
I partially solve the problem.
My host machine has 2 network interfaces with a bond. The bond was configured with mode=4 (802.3ad) and switch was configured in the same way. If I remove one network cable the network become stable. With both cables attached the network is instable.
I removed the link aggregation configuration from switch and change the bond in mode=2 (balance-xor). Now the network are stable. The strange thing is that previous configuration worked fine for one year... since the last upgrade.
Now ha-agent don't reboot the hosted-engine anymore, but I receive two emails from brocker evere 2/5 minutes. First a mail with "ovirt-hosted-engine state transition StartState-ReinitializeFSM" and after "ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting"
Il 17/12/2015 10.51, Stefano Danzi ha scritto:
Hello, I have one testing host (only one host) with self hosted engine and 2 VM (one linux and one windows).
After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works discontinuously. Every 10 minutes HA agent restart hosted engine VM because result down. But the machine is UP, only the network stop to work for some minutes. I activate global maintenace mode to prevent engine reboot. If I ssh to the hosted engine sometimes the connection work and sometimes no. Using VNC connection to engine I see that sometime VM reach external network and sometimes no. If I do a tcpdump on phisical ethernet interface I don't see any packet when network on vm don't work.
Same thing happens fo others two VM.
Before the upgrade I never had network problems. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi <s.danzi@hawai.it> wrote:
I found this in vdsm.log and I think that could be the problem:
Thread-3771::ERROR::2015-12-18 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) Connection closed: Connection closed Thread-3771::ERROR::2015-12-18 16:18:58,597::API::1847::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}: Connection closed
My guess is that this is a consequence of your networking problems. Adding Dan.
Il 17/12/2015 18.51, Stefano Danzi ha scritto:
I partially solve the problem.
My host machine has 2 network interfaces with a bond. The bond was configured with mode=4 (802.3ad) and switch was configured in the same way. If I remove one network cable the network become stable. With both cables attached the network is instable.
I removed the link aggregation configuration from switch and change the bond in mode=2 (balance-xor). Now the network are stable. The strange thing is that previous configuration worked fine for one year... since the last upgrade.
Now ha-agent don't reboot the hosted-engine anymore, but I receive two emails from brocker evere 2/5 minutes. First a mail with "ovirt-hosted-engine state transition StartState-ReinitializeFSM" and after "ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting"
Il 17/12/2015 10.51, Stefano Danzi ha scritto:
Hello, I have one testing host (only one host) with self hosted engine and 2 VM (one linux and one windows).
After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works discontinuously. Every 10 minutes HA agent restart hosted engine VM because result down. But the machine is UP, only the network stop to work for some minutes. I activate global maintenace mode to prevent engine reboot. If I ssh to the hosted engine sometimes the connection work and sometimes no. Using VNC connection to engine I see that sometime VM reach external network and sometimes no. If I do a tcpdump on phisical ethernet interface I don't see any packet when network on vm don't work.
Same thing happens fo others two VM.
Before the upgrade I never had network problems. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi

------7Z885JONKSCPB0YGXE6YXOAPP71NJL Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Stefano, I am currently experiencing the same issue. 2x nic lacp config at switch, mode 4 bond at server with no connectivity. Interestingly I am able to ping the switch itself. I haven't had time to investigate thoroughly but my first thought is an update somewhere. Did you ever resolve and get back to mode=4? Jon On 17 December 2015 17:51:50 GMT+00:00, Stefano Danzi <s.danzi@hawai.it> wrote:
I partially solve the problem.
My host machine has 2 network interfaces with a bond. The bond was configured with mode=4 (802.3ad) and switch was configured in the same way. If I remove one network cable the network become stable. With both cables attached the network is instable.
I removed the link aggregation configuration from switch and change the
bond in mode=2 (balance-xor). Now the network are stable. The strange thing is that previous configuration worked fine for one year... since the last upgrade.
Now ha-agent don't reboot the hosted-engine anymore, but I receive two emails from brocker evere 2/5 minutes. First a mail with "ovirt-hosted-engine state transition StartState-ReinitializeFSM" and after "ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting"
Il 17/12/2015 10.51, Stefano Danzi ha scritto:
Hello, I have one testing host (only one host) with self hosted engine and 2
VM (one linux and one windows).
After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works discontinuously. Every 10 minutes HA agent restart hosted engine VM because result down. But the machine is UP, only the network stop to work for some minutes. I activate global maintenace mode to prevent engine reboot. If I ssh to the hosted engine sometimes the connection work and sometimes no. Using VNC connection to engine
I see that sometime VM reach external network and sometimes no. If I do a tcpdump on phisical ethernet interface I don't see any packet when network on vm don't work.
Same thing happens fo others two VM.
Before the upgrade I never had network problems. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Sent from my Android device with K-9 Mail. Please excuse my brevity. ------7Z885JONKSCPB0YGXE6YXOAPP71NJL Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html><head></head><body>Stefano,<br> <br> I am currently experiencing the same issue. 2x nic lacp config at switch, mode 4 bond at server with no connectivity. Interestingly I am able to ping the switch itself.<br> <br> I haven't had time to investigate thoroughly but my first thought is an update somewhere.<br> <br> Did you ever resolve and get back to mode=4?<br> <br> Jon<br><br><div class="gmail_quote">On 17 December 2015 17:51:50 GMT+00:00, Stefano Danzi <s.danzi@hawai.it> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <pre class="k9mail">I partially solve the problem.<br /><br />My host machine has 2 network interfaces with a bond. The bond was <br />configured with mode=4 (802.3ad) and switch was configured in the same way.<br />If I remove one network cable the network become stable. With both <br />cables attached the network is instable.<br /><br />I removed the link aggregation configuration from switch and change the <br />bond in mode=2 (balance-xor). Now the network are stable.<br />The strange thing is that previous configuration worked fine for one <br />year... since the last upgrade.<br /><br />Now ha-agent don't reboot the hosted-engine anymore, but I receive two <br />emails from brocker evere 2/5 minutes.<br />First a mail with "ovirt-hosted-engine state transition <br />StartState-ReinitializeFSM" and after "ovirt-hosted-engine state <br />transition ReinitializeFSM-EngineStarting"<br /><br /><br />Il 17/12/2015 10.51, Stefano Danzi ha scritto:<br /><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;"> Hello,<br /> I have one testing host (only one host) with self hosted engine and 2 <br /> VM (one linux and one windows).<br /><br /> After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works <br /> discontinuously.<br /> Every 10 minutes HA agent restart hosted engine VM because result <br /> down. But the machine is UP,<br /> only the network stop to work for some minutes.<br /> I activate global maintenace mode to prevent engine reboot. If I ssh <br /> to the hosted engine sometimes<br /> the connection work and sometimes no. Using VNC connection to engine <br /> I see that sometime VM reach external network<br /> and sometimes no.<br /> If I do a tcpdump on phisical ethernet interface I don't see any <br /> packet when network on vm don't work.<br /><br /> Same thing happens fo others two VM.<br /><br /> Before the upgrade I never had network problems.<br /><hr /><br /> Users mailing list<br /> Users@ovirt.org<br /> <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a></blockquote><br /><br /><hr /><br />Users mailing list<br />Users@ovirt.org<br /><a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br /></pre></blockquote></div><br> -- <br> Sent from my Android device with K-9 Mail. Please excuse my brevity.</body></html> ------7Z885JONKSCPB0YGXE6YXOAPP71NJL--
participants (3)
-
Jon Archer
-
Stefano Danzi
-
Yedidyah Bar David