
On Fri, Jul 22, 2016 at 9:58 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Fri, Jul 22, 2016 at 4:11 AM, Robert Story <rstory@tislabs.com> wrote:
On Thu, 21 Jul 2016 16:04:41 -0400 Robert wrote: RS> Thread-1::config::278::ovirt_hosted_engine_ha.broker.notifications.Notifications.config RS> ::(refresh_local_conf_file) local conf file was correctly written RS> RS> And then .... nothing. It just hangs. Nothing more is logged Thread-1.
So I started digging around the the python source, starting from refresh_local_conf_file. I ended up in ./broker/notifications.py, in send_email. I added some logging:
def send_email(cfg, email_body): """Send email."""
logger = logging.getLogger("%s.Notifications" % __name__)
try: logger.debug("#### setting up smtp 1") server = smtplib.SMTP(cfg["smtp-server"], port=cfg["smtp-port"]) logger.debug("#### setting up smtp 2") ...
Now the final messages are:
Thread-1::DEBUG::2016-07-21 21:35:05,280::config::278:: ovirt_hosted_engine_ha.broker.notifications.Notifications.config:: (refresh_local_conf_file) local conf file was correctly written Thread-1::DEBUG::2016-07-21 21:35:05,282::notifications::27:: ovirt_hosted_engine_ha.broker.notifications.Notifications:: (send_email) #### setting up smtp 1
So the culprit is:
server = smtplib.SMTP(cfg["smtp-server"], port=cfg["smtp-port"])
Note that this does actually send the email - 2 minutes later.
Thanks for time and your effort Robert! In general the agent shouldn't got stuck if the broker is not able to send a notification email within a certain amount of time. I'm open a bug to track this. Adding Martin here.
https://bugzilla.redhat.com/1359059
So I tried:
$ telnet localhost 25 Trying ::1...
which hung, and a little bell went off in my brain...
After changing /etc/hosts from:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
to
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost6 localhost6.localdomain6
localhost resolves to 127.0.0.1, the delay is gone, and everything is fine.
We are seeing similar reports regarding ip4/ip6 issues also migrating on 4.0 See also http://lists.ovirt.org/pipermail/users/2016-June/040578.html and https://bugzilla.redhat.com/show_bug.cgi?id=1358530
Adding Oved here.
I don't want to update /etc/hosts on each host. Is there somewhere I can edit the broker config for mail?
The shortest option is to edit broker.conf inside the configuration volume on the hosted-engine storage domain but it's a bit tricky and also potentially dangerous if not well done. We have an RFE about letting you reconfigure it from the engine, for now, if you are brave enough, please try something like this.
dir=`mktemp -d` && cd $dir mnt_point=/rhev/data-center/mnt/192.168.1.115:_Virtual_ext35u36 # pleace with your local mount point systemctl stop ovirt-ha-broker # on all the hosts! sdUUID_line=$(grep sdUUID /etc/ovirt-hosted-engine/hosted-engine.conf) sdUUID=${sdUUID_line:7:36} conf_volume_UUID_line=$(grep conf_volume_UUID /etc/ovirt-hosted-engine/hosted-engine.conf) conf_volume_UUID=${conf_volume_UUID_line:17:36} conf_image_UUID_line=$(grep conf_image_UUID /etc/ovirt-hosted-engine/hosted-engine.conf) conf_image_UUID=${conf_image_UUID_line:16:36} sudo -u vdsm dd if=$mnt_point/$sdUUID/images/$conf_image_UUID/$conf_volume_UUID 2>/dev/null| tar -xvf - # here you have to edit the locally extracted broker.conf tar -cO * | sudo -u vdsm dd of=$mnt_point/$sdUUID/images/$conf_image_UUID/$conf_volume_UUID systemctl restart ovirt-ha-agent # on all the hosts
I strongly advice to take a backup before editing.
Robert
-- Senior Software Engineer @ Parsons