[ovirt-users] Suddenly all VM's down including HostedEngine & NFSshares (except HE) unmounted

Matt . yamakasi.014 at gmail.com
Sun Aug 21 20:34:05 UTC 2016


THe strange things is that there are no IP's duplicated in the ovirt
environment, storage or whatever the VM's make running.

What happens tho is that the statusses of all agents change, and
why... don' t ask me.

There is really nothing in the logs that shows this behaviour.

Restarting broker, agent, Rebooting the hosts, it doesn' t work out.
the only one where I can start the HostedEngine on now is Host-4 where
I was able to start them on other hosts in theit current states also.

Something is wobbeling around the communication between the agents if
you ask me. This happened from 4.0.1

--== Host 1 status ==--

Status up-to-date                  : False
Hostname                           : host-01.mydomain.tld
Host ID                            : 1
Engine status                      : unknown stale-data
Score                              : 0
stopped                            : True
Local maintenance                  : False
crc32                              : 6b73a02e
Host timestamp                     : 2710
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=2710 (Sun Aug 21 21:52:56 2016)
        host-id=1
        score=0
        maintenance=False
        state=AgentStopped
        stopped=True


--== Host 2 status ==--

Status up-to-date                  : False
Hostname                           : host-02.mydomain.tld
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 0
stopped                            : True
Local maintenance                  : False
crc32                              : 8e647fca
Host timestamp                     : 509
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=509 (Sun Aug 21 21:53:00 2016)
        host-id=2
        score=0
        maintenance=False
        state=AgentStopped
        stopped=True


--== Host 3 status ==--

Status up-to-date                  : False
Hostname                           : host-01.mydomain.tld
Host ID                            : 3
Engine status                      : unknown stale-data
Score                              : 0
stopped                            : True
Local maintenance                  : False
crc32                              : 73748f9f
Host timestamp                     : 2888
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=2888 (Sun Aug 21 00:16:12 2016)
        host-id=3
        score=0
        maintenance=False
        state=AgentStopped
        stopped=True


--== Host 4 status ==--

Status up-to-date                  : False
Hostname                           : host-02.mydomain.tld
Host ID                            : 4
Engine status                      : unknown stale-data
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 86ef0447
Host timestamp                     : 67879
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=67879 (Sun Aug 21 18:30:38 2016)
        host-id=4
        score=3400
        maintenance=False
        state=GlobalMaintenance
        stopped=False



2016-08-21 22:09 GMT+02:00 Charles Kozler <ckozleriii at gmail.com>:
> This usually happens when SPM falls off or master storage domain was
> unreachable for a brief period of time in some capacity. Your logs should
> say something about an underlying storage problem so oVirt offlined or
> paused the VMs to avoid problems. I'd check the pathway to your master
> storage domain. You're probably right that something had another conflict
> IP. This happened to me one time where someone brought up a system on an IP
> that matched my SPM
>
>
> On Aug 21, 2016 3:33 PM, "Matt ." <yamakasi.014 at gmail.com> wrote:
>>
>> HI All,
>>
>> I'm trying to tackle an issues on 4.0.2 that sunddenly all VM's
>> including the HostedEngine are just down at once.
>>
>> I have also seen that all NFS shares are unmounted except the
>> HostedEngine Storage, which is on the same NFS device as well.
>>
>> I have checked the logs, nothing strange to see there, but as I run a
>> vrrp setup and do some tests also I wonder if there is a duplicate IP
>> brought up, could this make happen the whole system to go down and the
>> Engine or VDSM unmounts the NFS shares ? My switches don't complain.
>>
>> It's strange that the HE share is only available after it happens.
>>
>> If so, this would be quite fragile and we should tackle where it goes
>> wrong.
>>
>> Anyone seen this bahaviour ?
>>
>> Thanks,
>>
>> Matt
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users



More information about the Users mailing list