[ovirt-users] hosted-engine Debug Help
RabidCicada
rabidcicada at gmail.com
Tue Apr 3 19:46:53 UTC 2018
Indeed that is the problem. I am sshing as root
The only possible explanations I have for this working in the ovirt
autmated testing setup are:
- Either the ovirt automated tests must be using a host inventory file
with become: true?
- The inventory file ssh credentials happen to be to vdsm user? (And
become is just not doing anything)
- Some other janky thing that allows the access
It also happens for the immediately following task `Initialize metadata
volume` and many more following that task.
I will correct and then open a pull request to fix with your input that you
believe it's a problem.
~Kyle
On Tue, Apr 3, 2018 at 3:08 PM, RabidCicada <rabidcicada at gmail.com> wrote:
> Alright. By differential comparing to successfull postgres-command
> sudo-becomes. It looks like the salient difference is that `Copy
> configuration archive to storage` is missing become:true. Testing now.
>
> The similar postgres commands have become:true while this one does not.
>
> ~Kyle
>
> On Tue, Apr 3, 2018 at 1:19 PM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Apr 3, 2018 at 5:21 PM, RabidCicada <rabidcicada at gmail.com>
>> wrote:
>>
>>> I've attached a full debug packet below. I include the log file from
>>> ovirt-hosted-engine-setup. I include relevant cmd line info. I include
>>> info from the command line where epdb has a breakpoint in playbook.py from
>>> ansible itself. I also include info from commands I ran after it failed.
>>> I also include attached the ferried over script in /root/.ansible/tmp that
>>> is run.
>>>
>>>
>>> *Output on cmd line:*
>>> [ INFO ] TASK [Copy configuration archive to storage]
>>> [ ERROR ] fatal: [localhost]: FAILED! => {
>>> "changed": true,
>>> "cmd": [
>>> "dd",
>>> "bs=20480",
>>> "count=1",
>>> "oflag=direct",
>>> "if=/var/tmp/localvmbCDQIR/5e
>>> f881f5-c992-48d2-b969-a0b6156bdf7c",
>>> "of=/rhev/data-center/mnt/nod
>>> e.local:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/image
>>> s/c5510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b9
>>> 69-a0b6156bdf7c"
>>> ],
>>> "delta": "0:00:00.004336",
>>> "end": "2018-04-03 15:01:55.581823",
>>> "invocation": {
>>> "module_args": {
>>> "_raw_params": "dd bs=20480 count=1 oflag=direct
>>> if=\"/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c\"
>>> of=\"/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d
>>> 3-4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-24c1
>>> 79313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c\"",
>>> "_uses_shell": false,
>>> "chdir": null,
>>> "creates": null,
>>> "executable": null,
>>> "removes": null,
>>> "stdin": null,
>>> "warn": true
>>> }
>>> },
>>> "msg": "non-zero return code",
>>> "rc": 1,
>>> "start": "2018-04-03 15:01:55.577487",
>>> "stderr": "dd: failed to open ‘/rhev/data-center/mnt/node.lo
>>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5
>>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c’:
>>> Permission denied",
>>> "stderr_lines": [
>>> "dd: failed to open ‘/rhev/data-center/mnt/node.lo
>>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5
>>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c’:
>>> Permission denied"
>>> ],
>>> "stdout": "",
>>> "stdout_lines": []
>>> }
>>> [ ERROR ] Failed to execute stage 'Closing up': Failed executing
>>> ansible-playbook
>>>
>>
>> In the playbook we have on that task:
>> become_user: vdsm
>> become_method: sudo
>>
>> but I fear it got somehow ignored.
>> I'll investigate it.
>>
>>
>>>
>>> *Output from ansible epdb tracepoint:*
>>>
>>> Using module file /usr/lib/python2.7/site-packag
>>> es/ansible/modules/commands/command.py
>>> <localhost> ESTABLISH LOCAL CONNECTION FOR USER: root
>>> <localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'
>>> <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo
>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" && echo
>>> ansible-tmp-1522767715.36-81496549401055="` echo
>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" ) &&
>>> sleep 0'
>>> <localhost> PUT /tmp/tmpGMGdjh TO /root/.ansible/tmp/ansible-tmp
>>> -1522767715.36-81496549401055/command.py
>>> <localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/
>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/command.py
>>> && sleep 0'
>>> <localhost> EXEC /bin/sh -c '/usr/bin/python
>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/command.py
>>> && sleep 0'
>>> to retry, use: --limit @/usr/share/ovirt-hosted-engin
>>> e-setup/ansible/create_target_vm.retry
>>>
>>> The above command.py is the on I have attached as problematic_command.py
>>>
>>> *Investigation After Failure:*
>>> [root at node ~]# ls -al '/rhev/data-center/mnt/node.lo
>>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5
>>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c'
>>> -rw-rw----. 1 vdsm kvm 20480 Apr 3 15:01 /rhev/data-center/mnt/node.loc
>>> al:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c55
>>> 10e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c
>>>
>>> sudo -u vdsm dd bs=20480 count=1 oflag=direct
>>> if="/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c"
>>> of="/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3
>>> -4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-24c17
>>> 9313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c
>>> 1+0 records in
>>> 1+0 records out
>>>
>>>
>>> It seems to me that somehow it is not getting the right permissions even
>>> though the playbook has:
>>> - name: Copy configuration archive to storage
>>> command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR }}/{{
>>> he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path }}"
>>> become_user: vdsm
>>> become_method: sudo
>>> changed_when: True
>>> tags: [ 'skip_ansible_lint' ]
>>>
>>>
>>> On Tue, Apr 3, 2018 at 8:51 AM, RabidCicada <rabidcicada at gmail.com>
>>> wrote:
>>>
>>>> I am now also running with:
>>>>
>>>> export ANSIBLE_VERBOSITY=5
>>>>
>>>> export ANSIBLE_FORKS=1
>>>>
>>>> export ANSIBLE_KEEP_REMOTE_FILES=1
>>>>
>>>>
>>>> On Tue, Apr 3, 2018 at 8:49 AM, RabidCicada <rabidcicada at gmail.com>
>>>> wrote:
>>>>
>>>>> Here's the log.
>>>>>
>>>>> So the command that it says it ran is:
>>>>> dd bs=20480 count=1 oflag=direct if=/var/tmp/localvmHaWb6G/1cce
>>>>> 8df2-1810-4063-b4e2-e19a2c5b1909 of=/rhev/data-center/mnt/node.
>>>>> local:_srv_data/3c7485ea-14e3-40c1-b627-f89a819ed1d6/images/
>>>>> 2c1f7c2f-b8f7-46d4-ac66-8ff1e9649e29/1cce8df2-1810-4063-b4e2
>>>>> -e19a2c5b1909
>>>>>
>>>>> But we all know that was done with:
>>>>>
>>>>> - name: Copy configuration archive to storage
>>>>> command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR
>>>>> }}/{{ he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path
>>>>> }}"
>>>>> become_user: vdsm
>>>>> become_method: sudo
>>>>> changed_when: True
>>>>> tags: [ 'skip_ansible_lint' ]
>>>>>
>>>>> So I manually replicated with `sudo vdsm dd bs=20480 count=1
>>>>> if=/var/tmp/localvmHaWb6G/1cce8df2-1810-4063-b4e2-e19a2c5b1909
>>>>> of=/rhev/data-center/mnt/node.local:_srv_data/3c7485ea-14e3-
>>>>> 40c1-b627-f89a819ed1d6/images/2c1f7c2f-b8f7-46d4-ac66-8ff1e9
>>>>> 649e29/1cce8df2-1810-4063-b4e2-e19a2c5b1909`
>>>>>
>>>>> And it works when I manually do it. Though I think I didn't use the
>>>>> oflag=direct (Just realised this)
>>>>>
>>>>> I eventually put a pause task directly preceeding it with debug output
>>>>> that showed the file paths. I manually ran the command and it worked.
>>>>> Then let it do it....failed. I checked all the permissions e.g.
>>>>> vdsm:kvm. All looks good from a filesystem point of view. I'm beginning
>>>>> (naively) to suspect a race condition for the file access problem...but
>>>>> have come nowhere close to solving it.
>>>>>
>>>>> *Can you suggest a good way to continue executing the install process
>>>>> from the create_target_vm.yml playbook (with proper variables and context
>>>>> from otopi etc)?* I currently have to restart the entire process
>>>>> over again and wait quite a while for it to circle back around.
>>>>>
>>>>> I have since discovered epdb and I've set breakpoints directly in
>>>>> playbook.py of ansible just to see better log output. I insert epdb.serve
>>>>> and use netcat to connect since epdb on python 2.7.5 and up seems to have
>>>>> problems using epdb.connect() itself.
>>>>>
>>>>> ~Kyle
>>>>>
>>>>> On Tue, Apr 3, 2018 at 4:06 AM, Simone Tiraboschi <stirabos at redhat.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 2, 2018 at 4:52 PM, RabidCicada <rabidcicada at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Heyo everyone. I'm trying to debug hosted-engine --deploy. It is
>>>>>>> failing in `Copy configuration archive to storage` in
>>>>>>> `create_target_vm.yml` from `hosted-engine --deploy`. My general
>>>>>>> and most important query here is how to get good debug output from ansible
>>>>>>> through hosted-engine. I'm running hosted-engine through an ssh session.
>>>>>>>
>>>>>>> I can't figure out how to get good debug output from ansible within
>>>>>>> that workflow. I see it's running through otopi, I tried setting
>>>>>>> typical `debugger: on_failed` hooks etc and tried many incantations on the
>>>>>>> command line and config files to get ansible to help me out. The
>>>>>>> debugger: directive and other debugger related ansible config file stuff
>>>>>>> wouldn't result in any debugger popping up. I also can't seem to pass
>>>>>>> normal -vvvv flags to hosted-engine either and get it to ansible.
>>>>>>> Ultimately I tried to use a `pause` directive and it complained that it was
>>>>>>> in a non-interactive shell. I figured it might be the result of my ssh
>>>>>>> session so I enabled tty allocation with -t -t. It did not resolve the
>>>>>>> issue.
>>>>>>>
>>>>>>> I eventually wrote-my-own/stole a callback_plugin that checks an
>>>>>>> environmental variable and enables `display.verbosity = int(v)` since I
>>>>>>> can't seem to pass typical -vvvv stuff to ansible through `hosted-engine
>>>>>>> --deploy`. It give me the best info that I have so far. But it wont give
>>>>>>> me enough to debug issues around Gathering Facts or what looks like a
>>>>>>> sudo/permission problem in `Copy configuration archive to storage`
>>>>>>> in `create_target_vm.yml`. I took and used the exact command that they use
>>>>>>> manually and it works when I run it manually (But I can't get debug output
>>>>>>> to show me the exact sudo command being executed), hence my interest in
>>>>>>> passing -vvvv or equivalent to ansible through `hosted-engine`. I
>>>>>>> intentionally disabled the VM_directory cleanup so that I could execute the
>>>>>>> same stuff.
>>>>>>>
>>>>>>> So....after all that...what is a good way to get deep debug info
>>>>>>> from hosted-engine ansible stuff?
>>>>>>>
>>>>>>
>>>>>> You should already find all the relevant log entries in a file
>>>>>> called /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engin
>>>>>> e-setup-ansible-create_target_vm-{timestamp}-{hash}.log
>>>>>>
>>>>>> Can you please share it?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Or does anyone have intuition for the possible sudo problem?
>>>>>>> ~Kyle
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at ovirt.org
>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180403/634c3a19/attachment.html>
More information about the Users
mailing list