On Sat, Sep 5, 2020 at 1:49 AM Gillingham, Eric J (US 393D)
<eric.j.gillingham(a)jpl.nasa.gov> wrote:
On 9/4/20, 2:26 PM, "Nir Soffer" <nsoffer(a)redhat.com> wrote:
On Fri, Sep 4, 2020 at 5:43 PM Gillingham, Eric J (US 393D) via Users
<users(a)ovirt.org> wrote:
>
> On 9/4/20, 4:50 AM, "Vojtech Juranek" <vjuranek(a)redhat.com>
wrote:
>
> On čtvrtek 3. září 2020 22:49:17 CEST Gillingham, Eric J (US 393D) via
Users
> wrote:
>
> how do you remove the fist host, did you put it into maintenance first? I
> wonder, how this situation (two lockspaces with conflicting names) can
occur.
>
> You can try to re-initialize the lockspace directly using sanlock command
(see
> man sanlock), but it would be good to understand the situation first.
>
>
> Just as you said, put into maintenance mode, shut it down, removed it via the
engine UI.
Eric, it is possible that you shutdown the host too quickly, before it actually
disconnected from the lockspace?
When engine move a host to maintenance, it does not wait until the host actually
move into maintenance. This is actually a bug, so it would be good idea to file
a bug about this.
That is a possibility, from the UI view it usually takes a bit for the host to show is in
maintenance, so I assumed it was an accurate representation of the state. Unfortunately
all hosts have since been completely wiped and re-installed, this issue brought down the
entire cluster for over a day so I needed to get everything up again ASAP.
I did not archive/backup the sanlock logs beforehand, so I can't check for the
sanlock events David mentioned. When I cleared the sanlock there were no s or r entries
listed in sanlock client status, and there were no other running hosts to obtain other
locks, but I don’t fully grok sanlock if there was maybe some lock that existed only on
the iscsi space separate from any current or past hosts.
Looks like we lost all evidence. If this happens again, please file a
bug and attach
the logs.
Nir