On Sun, Jul 8, 2018 at 1:16 PM, Martin Perina <mperina(a)redhat.com> wrote:
Hi, thanks for your previous answers
- UserRole ok
> Back in April 2017 for version 4.1.1 I had problems and it seems
I had to
> set super user privileges for the "fencing user"
> See thread here
>
https://lists.ovirt.org/archives/list/users@ovirt.org/messag
> e/FS5YFU5ZXIYDC5SWQY4MZS65UDKSX7JS/
>
> Now instead I only set UserRole for the defined "fenceuser" on the
> virtual cluster VMS and it works ok
>
> - for a VM with permissions already setup:
> [root@cl1 ~]# fence_rhevm -a 10.4.192.49 -l "fenceuser@internal" -S
> /usr/local/bin/pwd_ovmgr01.sh -z --ssl-insecure -o status
> --shell-timeout=20 --power-wait=10 -n cl1
> Status: ON
>
> - for a VM still without permissions
> [root@cl1 ~]# fence_rhevm -a 10.4.192.49 -l "fenceuser@internal" -S
> /usr/local/bin/pwd_ovmgr01.sh -z --ssl-insecure -o status
> --shell-timeout=20 --power-wait=10 -n cl2
> Failed: Unable to obtain correct plug status or plug is not available
>
> - for a VM with permissions already setup:
> I'm able to make power off / power on of the VM
>
Eli, please take a look at above, that might be the issue you saw with
fence_rhevm
Perhaps I didn't explain clear enough.
My note was to confirm that now in my tests it is sufficient to create a
user in internal domain and assign it the UserRole permission for the
cluster VMs, to be able to use that user for fencing VMs.
While, back in 4.1.1 I was forced to create a user with superuser rights.
Or did you mean anything else when asking for Eli input?
In my case, to have the fencing agent work in my 4-virtualnodes CentOS 7.4
cluster, I created the fencing agent this way:
pcs stonith create vmfence fence_rhevm
pcmk_host_map="intracl1:cl1;intracl2:cl2;intracl3:cl3;intracl4:cl4" \
ipaddr=10.4.192.49 ssl=1 ssl_insecure=1 login="fenceuser@internal"
passwd_script="/usr/local/bin/pwd_ovmgr01.sh" \
shell_timeout=10 power_wait=10
So that now I have:
[root@cl1 ~]# pcs stonith show vmfence
Resource: vmfence (class=stonith type=fence_rhevm)
Attributes: ipaddr=10.4.192.49 login=fenceuser@internal
passwd_script=/usr/local/bin/pwd_ovmgr01.sh
pcmk_host_map=intracl1:cl1;intracl2:cl2;intracl3:cl3;intracl4:cl4
power_wait=10 shell_timeout=10 ssl=1 ssl_insecure=1
Operations: monitor interval=60s (vmfence-monitor-interval-60s)
[root@cl1 ~]#
I forced a panic on a node that was providing a cluster group of resources
and it was correctly fenced (powered off / powered on) with the service
relocated on another node (after power off action has been completed)
[root@cl1 ~]# echo 1 > /proc/sys/kernel/sysrq
[root@cl1 ~]# echo c > /proc/sysrq-trigger
and this is the chain of events I see in the mean time. The VM cl1 is
indeed powered off and then on and the service relocated on cl2 in this
case and lastly cl1 rejoins the cluster after finishing power on phase.
I don't know if the error messages I see every time fence takes action are
related to more nodes trying to fence and conflicting or what...
Jul 8, 2018, 3:23:02 PM User fenceuser@internal-authz connecting from
'10.4.4.69' using session
'FC9UfrgCj9BM/CW5o6iymyMhqBXUDnNJWD20QGEqLccCMC/qYlsv4vC0SBRSlbNrtfdRCx2QmoipOdNk0UrsHQ=='
logged in.
Jul 8, 2018, 3:22:37 PM VM cl1 started on Host ov200
Jul 8, 2018, 3:22:21 PM User fenceuser@internal-authz connecting from
'10.4.4.63' using session
'6nMvcZNYs+aRBxifeA2aBaMsWVMCehCx1LLQdV5AEyM/Zrx/YihERxfLPc2KZPrvivy86rS+ml1Ic6BqnIKBNw=='
logged in.
Jul 8, 2018, 3:22:11 PM VM cl1 was started by fenceuser@internal-authz
(Host: ov200).
Jul 8, 2018, 3:22:10 PM VM cl1 is down. Exit message: Admin shut down from
the engine
Jul 8, 2018, 3:22:10 PM User fenceuser@internal-authz connecting from
'10.4.4.63' using session
'YCvbpVTy9fWAl+UB2g6hlJgqECvCWZYT0cvMxlgBTzcO2LosBh8oGPxsXBP/Y8TN0x7tYSfjxKr4al1g246nnA=='
logged in.
Jul 8, 2018, 3:22:10 PM Failed to power off VM cl1 (Host: ov200, User:
fenceuser@internal-authz).
Jul 8, 2018, 3:22:10 PM Failed to power off VM cl1 (Host: ov200, User:
fenceuser@internal-authz).
Jul 8, 2018, 3:22:10 PM VDSM ov200 command DestroyVDS failed: Virtual
machine does not exist: {'vmId': u'ff45a524-3363-4266-89fe-dfdbb87a8256'}
Jul 8, 2018, 3:22:10 PM VM cl1 powered off by fenceuser@internal-authz
(Host: ov200).
Jul 8, 2018, 3:21:59 PM User fenceuser@internal-authz connecting from
'10.4.4.63' using session
'tVmcRoYhzWOMgh/1smPidV0wFme2f5jWx3wdWDlpG7xHeUhTu4QJJk+l7u8zW2S73LK3lzai/kPtreSanBmAIA=='
logged in.
Jul 8, 2018, 3:21:48 PM User fenceuser@internal-authz connecting from
'10.4.4.69' using session 'iFBOa5vdfJBu4aqVdqJJbwm
- BTW: the monitoring action every 60 seconds generates many lines in
events list. Is there a way to bypass this eventually?
Gianluca