Hi,

Did anyone find much luck tracking this down? I rebooted one of our servers and hit this issue again, conveniently, the dell remote access card has borked as well.. so a 50 minute trip to the DC..


On Thu, Jun 19, 2014 at 10:10 AM, Bob Doolittle <bobddroid@gmail.com> wrote:
Specifically, if do the following:
  • Enter global maintenance (hosted-engine --set-maintenance-mode --mode=global)
  • init 0 the engine
  • systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd

and then run "sanlock client status" I see:

# sanlock client status
daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar
p -1 helper
p -1 listener
p -1 status
s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
s 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0
s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

Waiting a few minutes does not change this state.

The earlier data I shared which showed HostedEngine was with a different test scenario.

-Bob


On 06/18/2014 07:53 AM, Bob Doolittle wrote:

I see I have a very unfortunate typo in my previous mail. As supported by the vm-status output I attached, I had set --mode=global (not none) in step 1.

I am not the only one experiencing this. I can reproduce it easily. It appears that shutting down vdsm causes the HA services to incorrectly think the system has come out of Global Maintenance and restart the engine.

-Bob

On Jun 18, 2014 5:06 AM, "Federico Simoncelli" <fsimonce@redhat.com> wrote:
----- Original Message -----
> From: "Bob Doolittle" <bob@doolittle.us.com>
> To: "Doron Fediuck" <dfediuck@redhat.com>, "Andrew Lau" <andrew@andrewklau.com>
> Cc: "users" <users@ovirt.org>, "Federico Simoncelli" <fsimonce@redhat.com>
> Sent: Saturday, June 14, 2014 1:29:54 AM
> Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
>
>
> But there may be more going on. Even if I stop vdsmd, the HA services,
> and libvirtd, and sleep 60 seconds, I still see a lock held on the
> Engine VM storage:
>
> daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
> p -1 helper
> p -1 listener
> p -1 status
> s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
> s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

This output shows that the lockspaces are still acquired. When you put hosted-engine
in maintenance they must be released.
One by directly using rem_lockspace (since it's the hosted-engine one) and the other
one by stopMonitoringDomain.

I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything
related to that.

--
Federico