Hi Didi,
Unfortunately the box got reinstalled before I could grab the logs.
I think I have an idea as to what caused the issue though. When I was
originally registering the oVirt node one of the fiber channel volumes
wouldn't detect so it just kept waiting to register with the hosted
engine, saying it couldn't connect to the storage pool. I rebooted the
host which caused the FC volume to show up and it looked like everything
was fine. We were able to host VMs on the box but then it started
spitting out the error E-Mails. I migrated the VMs off to other hosts
(2 went fine, 2 went into a paused state citing storage error) and then
rebooted the host. When the host rebooted I shutdown the VDSM service
and started it manually from the shell prompt. I'm not sure if it was
right when the box came up or when I started VDSM manually, but that
messed up our storage pool and crashed all the VMs in the cluster,
putting them into a paused state.
Sorry I couldn't grab the logs, one of the other techs had some spare
time so he reinstalled the OS the same afternoon.
On 2016-06-21 04:34 PM, Yedidyah Bar David wrote:
On Tue, Jun 21, 2016 at 7:53 PM, Charles Tassell
<ctassell(a)gmail.com> wrote:
> Hi Didi,
>
> Okay, I looked at the logs as you suggested and found one of the hosts was
> showing that it couldn't connect to the local VDSM. I restarted the host
> and then tried running VDSM through the console so that I could see the
> debugging output, and it crashed my storage domain taking down all the VMs
> in the cluster. So that wasn't good... ;-)
>
> I'm going to leave that host down and reinstall it. I think the issue was
> that when I originally installed it there was a problem with attaching to
> the FC storage. I rebooted it and it seemed to be working fine, joined the
> cluster et all, but when I ran VDSM manually it looked like it wanted to
> reinitialize the storage or something.
Any chance you can open a bug about this and attach all the logs you can
get from this host before reinstalling it? Thanks!
>
> On 2016-06-19 04:54 AM, Yedidyah Bar David wrote:
>> On Sat, Jun 18, 2016 at 6:04 PM, Charles Tassell <ctassell(a)gmail.com>
>> wrote:
>>> Hi Folks,
>>>
>>> I'm having a strange issue with my 3.6 setup. For the past few days
>>> the
>>> system has been spamming me with "The state machine changed state"
>>> E-Mails.
>>> First I get the "EngineUnexpectedlyDown-EngineDown", then
>>> "EngineDown-EngineStart" then
"EngineStart-EngineStarting" I checked the
>>> hosted engine VM and it's been up for 16 days and the system looks to be
>>> running fine, so I'm wondering what's going on.
>>>
>>> And when I say spamming, I mean I'm getting 200+ E-Mails a day. It
>>> seems
>>> to trigger every 20 minutes or so with about 2 minutes between each of
>>> the 3
>>> messages in the set.
>> Please check/share /var/log/ovirt-hosted-engine-ha/agent.log on your
>> hosts.
>>
>> Thanks,
>