On Sun, Jul 8, 2018 at 1:16 PM, Martin Perina <mperina@redhat.com> wrote:

Hi, thanks for your previous answers

- UserRole ok
Back in April 2017 for version 4.1.1 I had problems and it seems I had to set super user privileges for the "fencing user"
See thread here

Now instead I only set UserRole for the defined "fenceuser" on the virtual cluster VMS and it works ok

- for a VM with permissions already setup:
[root@cl1 ~]# fence_rhevm -a 10.4.192.49 -l "fenceuser@internal" -S /usr/local/bin/pwd_ovmgr01.sh -z  --ssl-insecure -o status --shell-timeout=20 --power-wait=10 -n cl1
Status: ON

- for a VM still without permissions
[root@cl1 ~]# fence_rhevm -a 10.4.192.49 -l "fenceuser@internal" -S /usr/local/bin/pwd_ovmgr01.sh -z  --ssl-insecure -o status --shell-timeout=20 --power-wait=10 -n cl2
Failed: Unable to obtain correct plug status or plug is not available

- for a VM with permissions already setup:
I'm able to make power off / power on of the VM

​Eli, please take a look at above, that might be the issue you saw with fence_rhevm


Perhaps I didn't explain clear enough.
My note was to confirm that now in my tests it is sufficient to create a user in internal domain and assign it the UserRole permission for the cluster VMs, to be able to use that user for fencing VMs.
While, back in 4.1.1 I was forced to create a user with superuser rights.
Or did you mean anything else when asking for Eli input?
In my case, to have the fencing agent work in my 4-virtualnodes CentOS 7.4 cluster, I created the fencing agent this way:

pcs stonith create vmfence fence_rhevm pcmk_host_map="intracl1:cl1;intracl2:cl2;intracl3:cl3;intracl4:cl4" \
ipaddr=10.4.192.49 ssl=1 ssl_insecure=1 login="fenceuser@internal" passwd_script="/usr/local/bin/pwd_ovmgr01.sh" \
shell_timeout=10 power_wait=10

So that now I have:

[root@cl1 ~]# pcs stonith show vmfence
 Resource: vmfence (class=stonith type=fence_rhevm)
  Attributes: ipaddr=10.4.192.49 login=fenceuser@internal passwd_script=/usr/local/bin/pwd_ovmgr01.sh pcmk_host_map=intracl1:cl1;intracl2:cl2;intracl3:cl3;intracl4:cl4 power_wait=10 shell_timeout=10 ssl=1 ssl_insecure=1
  Operations: monitor interval=60s (vmfence-monitor-interval-60s)
[root@cl1 ~]#

I forced a panic on a node that was providing a cluster group of resources and it was correctly fenced (powered off / powered on) with the service relocated on another node (after power off action has been completed)

[root@cl1 ~]# echo 1 > /proc/sys/kernel/sysrq
[root@cl1 ~]# echo c > /proc/sysrq-trigger

and this is the chain of events I see in the mean time. The VM cl1 is indeed powered off and then on and the service relocated on cl2 in this case and lastly cl1 rejoins the cluster after finishing power on phase.
I don't know if the error messages I see every time fence takes action are related to more nodes trying to fence and conflicting or what...


Jul 8, 2018, 3:23:02 PM User fenceuser@internal-authz connecting from '10.4.4.69' using session 'FC9UfrgCj9BM/CW5o6iymyMhqBXUDnNJWD20QGEqLccCMC/qYlsv4vC0SBRSlbNrtfdRCx2QmoipOdNk0UrsHQ==' logged in.
       
Jul 8, 2018, 3:22:37 PM VM cl1 started on Host ov200
   
Jul 8, 2018, 3:22:21 PM User fenceuser@internal-authz connecting from '10.4.4.63' using session '6nMvcZNYs+aRBxifeA2aBaMsWVMCehCx1LLQdV5AEyM/Zrx/YihERxfLPc2KZPrvivy86rS+ml1Ic6BqnIKBNw==' logged in.
   
Jul 8, 2018, 3:22:11 PM VM cl1 was started by fenceuser@internal-authz (Host: ov200).
   
Jul 8, 2018, 3:22:10 PM VM cl1 is down. Exit message: Admin shut down from the engine
   
Jul 8, 2018, 3:22:10 PM User fenceuser@internal-authz connecting from '10.4.4.63' using session 'YCvbpVTy9fWAl+UB2g6hlJgqECvCWZYT0cvMxlgBTzcO2LosBh8oGPxsXBP/Y8TN0x7tYSfjxKr4al1g246nnA==' logged in.
   
Jul 8, 2018, 3:22:10 PM Failed to power off VM cl1 (Host: ov200, User: fenceuser@internal-authz).
   
Jul 8, 2018, 3:22:10 PM Failed to power off VM cl1 (Host: ov200, User: fenceuser@internal-authz).
   
Jul 8, 2018, 3:22:10 PM VDSM ov200 command DestroyVDS failed: Virtual machine does not exist: {'vmId': u'ff45a524-3363-4266-89fe-dfdbb87a8256'}
   
Jul 8, 2018, 3:22:10 PM VM cl1 powered off by fenceuser@internal-authz (Host: ov200).
   
Jul 8, 2018, 3:21:59 PM User fenceuser@internal-authz connecting from '10.4.4.63' using session 'tVmcRoYhzWOMgh/1smPidV0wFme2f5jWx3wdWDlpG7xHeUhTu4QJJk+l7u8zW2S73LK3lzai/kPtreSanBmAIA==' logged in.
   
Jul 8, 2018, 3:21:48 PM User fenceuser@internal-authz connecting from '10.4.4.69' using session 'iFBOa5vdfJBu4aqVdqJJbwm


- BTW: the monitoring action every 60 seconds generates many lines in events list. Is there a way to bypass this eventually?

Gianluca