Hosted Engine Spamming Transition E-Mails

Hi Folks, I'm having a strange issue with my 3.6 setup. For the past few days the system has been spamming me with "The state machine changed state" E-Mails. First I get the "EngineUnexpectedlyDown-EngineDown", then "EngineDown-EngineStart" then "EngineStart-EngineStarting" I checked the hosted engine VM and it's been up for 16 days and the system looks to be running fine, so I'm wondering what's going on. And when I say spamming, I mean I'm getting 200+ E-Mails a day. It seems to trigger every 20 minutes or so with about 2 minutes between each of the 3 messages in the set.

On Sat, Jun 18, 2016 at 6:04 PM, Charles Tassell <ctassell@gmail.com> wrote:
Hi Folks,
I'm having a strange issue with my 3.6 setup. For the past few days the system has been spamming me with "The state machine changed state" E-Mails. First I get the "EngineUnexpectedlyDown-EngineDown", then "EngineDown-EngineStart" then "EngineStart-EngineStarting" I checked the hosted engine VM and it's been up for 16 days and the system looks to be running fine, so I'm wondering what's going on.
And when I say spamming, I mean I'm getting 200+ E-Mails a day. It seems to trigger every 20 minutes or so with about 2 minutes between each of the 3 messages in the set.
Please check/share /var/log/ovirt-hosted-engine-ha/agent.log on your hosts. Thanks, -- Didi

Hi Didi, Okay, I looked at the logs as you suggested and found one of the hosts was showing that it couldn't connect to the local VDSM. I restarted the host and then tried running VDSM through the console so that I could see the debugging output, and it crashed my storage domain taking down all the VMs in the cluster. So that wasn't good... ;-) I'm going to leave that host down and reinstall it. I think the issue was that when I originally installed it there was a problem with attaching to the FC storage. I rebooted it and it seemed to be working fine, joined the cluster et all, but when I ran VDSM manually it looked like it wanted to reinitialize the storage or something. On 2016-06-19 04:54 AM, Yedidyah Bar David wrote:
On Sat, Jun 18, 2016 at 6:04 PM, Charles Tassell <ctassell@gmail.com> wrote:
Hi Folks,
I'm having a strange issue with my 3.6 setup. For the past few days the system has been spamming me with "The state machine changed state" E-Mails. First I get the "EngineUnexpectedlyDown-EngineDown", then "EngineDown-EngineStart" then "EngineStart-EngineStarting" I checked the hosted engine VM and it's been up for 16 days and the system looks to be running fine, so I'm wondering what's going on.
And when I say spamming, I mean I'm getting 200+ E-Mails a day. It seems to trigger every 20 minutes or so with about 2 minutes between each of the 3 messages in the set. Please check/share /var/log/ovirt-hosted-engine-ha/agent.log on your hosts.
Thanks,

On Tue, Jun 21, 2016 at 7:53 PM, Charles Tassell <ctassell@gmail.com> wrote:
Hi Didi,
Okay, I looked at the logs as you suggested and found one of the hosts was showing that it couldn't connect to the local VDSM. I restarted the host and then tried running VDSM through the console so that I could see the debugging output, and it crashed my storage domain taking down all the VMs in the cluster. So that wasn't good... ;-)
I'm going to leave that host down and reinstall it. I think the issue was that when I originally installed it there was a problem with attaching to the FC storage. I rebooted it and it seemed to be working fine, joined the cluster et all, but when I ran VDSM manually it looked like it wanted to reinitialize the storage or something.
Any chance you can open a bug about this and attach all the logs you can get from this host before reinstalling it? Thanks!
On 2016-06-19 04:54 AM, Yedidyah Bar David wrote:
On Sat, Jun 18, 2016 at 6:04 PM, Charles Tassell <ctassell@gmail.com> wrote:
Hi Folks,
I'm having a strange issue with my 3.6 setup. For the past few days the system has been spamming me with "The state machine changed state" E-Mails. First I get the "EngineUnexpectedlyDown-EngineDown", then "EngineDown-EngineStart" then "EngineStart-EngineStarting" I checked the hosted engine VM and it's been up for 16 days and the system looks to be running fine, so I'm wondering what's going on.
And when I say spamming, I mean I'm getting 200+ E-Mails a day. It seems to trigger every 20 minutes or so with about 2 minutes between each of the 3 messages in the set.
Please check/share /var/log/ovirt-hosted-engine-ha/agent.log on your hosts.
Thanks,
-- Didi

Hi Didi, Unfortunately the box got reinstalled before I could grab the logs. I think I have an idea as to what caused the issue though. When I was originally registering the oVirt node one of the fiber channel volumes wouldn't detect so it just kept waiting to register with the hosted engine, saying it couldn't connect to the storage pool. I rebooted the host which caused the FC volume to show up and it looked like everything was fine. We were able to host VMs on the box but then it started spitting out the error E-Mails. I migrated the VMs off to other hosts (2 went fine, 2 went into a paused state citing storage error) and then rebooted the host. When the host rebooted I shutdown the VDSM service and started it manually from the shell prompt. I'm not sure if it was right when the box came up or when I started VDSM manually, but that messed up our storage pool and crashed all the VMs in the cluster, putting them into a paused state. Sorry I couldn't grab the logs, one of the other techs had some spare time so he reinstalled the OS the same afternoon. On 2016-06-21 04:34 PM, Yedidyah Bar David wrote:
On Tue, Jun 21, 2016 at 7:53 PM, Charles Tassell <ctassell@gmail.com> wrote:
Hi Didi,
Okay, I looked at the logs as you suggested and found one of the hosts was showing that it couldn't connect to the local VDSM. I restarted the host and then tried running VDSM through the console so that I could see the debugging output, and it crashed my storage domain taking down all the VMs in the cluster. So that wasn't good... ;-)
I'm going to leave that host down and reinstall it. I think the issue was that when I originally installed it there was a problem with attaching to the FC storage. I rebooted it and it seemed to be working fine, joined the cluster et all, but when I ran VDSM manually it looked like it wanted to reinitialize the storage or something. Any chance you can open a bug about this and attach all the logs you can get from this host before reinstalling it? Thanks!
On 2016-06-19 04:54 AM, Yedidyah Bar David wrote:
On Sat, Jun 18, 2016 at 6:04 PM, Charles Tassell <ctassell@gmail.com> wrote:
Hi Folks,
I'm having a strange issue with my 3.6 setup. For the past few days the system has been spamming me with "The state machine changed state" E-Mails. First I get the "EngineUnexpectedlyDown-EngineDown", then "EngineDown-EngineStart" then "EngineStart-EngineStarting" I checked the hosted engine VM and it's been up for 16 days and the system looks to be running fine, so I'm wondering what's going on.
And when I say spamming, I mean I'm getting 200+ E-Mails a day. It seems to trigger every 20 minutes or so with about 2 minutes between each of the 3 messages in the set. Please check/share /var/log/ovirt-hosted-engine-ha/agent.log on your hosts.
Thanks,
participants (2)
-
Charles Tassell
-
Yedidyah Bar David