On Tue, Apr 3, 2018 at 5:21 PM, RabidCicada <rabidcicada(a)gmail.com> wrote:
I've attached a full debug packet below. I include the log file
from
ovirt-hosted-engine-setup. I include relevant cmd line info. I include
info from the command line where epdb has a breakpoint in playbook.py from
ansible itself. I also include info from commands I ran after it failed.
I also include attached the ferried over script in /root/.ansible/tmp that
is run.
*Output on cmd line:*
[ INFO ] TASK [Copy configuration archive to storage]
[ ERROR ] fatal: [localhost]: FAILED! => {
"changed": true,
"cmd": [
"dd",
"bs=20480",
"count=1",
"oflag=direct",
"if=/var/tmp/localvmbCDQIR/5e
f881f5-c992-48d2-b969-a0b6156bdf7c",
"of=/rhev/data-center/mnt/nod
e.local:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/image
s/c5510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-
48d2-b969-a0b6156bdf7c"
],
"delta": "0:00:00.004336",
"end": "2018-04-03 15:01:55.581823",
"invocation": {
"module_args": {
"_raw_params": "dd bs=20480 count=1 oflag=direct
if=\"/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c\"
of=\"/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d
3-4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-
24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c\"",
"_uses_shell": false,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"msg": "non-zero return code",
"rc": 1,
"start": "2018-04-03 15:01:55.577487",
"stderr": "dd: failed to open ‘/rhev/data-center/mnt/node.lo
cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5
510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c’:
Permission denied",
"stderr_lines": [
"dd: failed to open ‘/rhev/data-center/mnt/node.lo
cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5
510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c’:
Permission denied"
],
"stdout": "",
"stdout_lines": []
}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing
ansible-playbook
In the playbook we have on that task:
become_user: vdsm
become_method: sudo
but I fear it got somehow ignored.
I'll investigate it.
*Output from ansible epdb tracepoint:*
Using module file /usr/lib/python2.7/site-packages/ansible/modules/
commands/command.py
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: root
<localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo
/root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" && echo
ansible-tmp-1522767715.36-81496549401055="` echo
/root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" ) && sleep
0'
<localhost> PUT /tmp/tmpGMGdjh TO /root/.ansible/tmp/ansible-
tmp-1522767715.36-81496549401055/command.py
<localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-
tmp-1522767715.36-81496549401055/ /root/.ansible/tmp/ansible-
tmp-1522767715.36-81496549401055/command.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python /root/.ansible/tmp/ansible-
tmp-1522767715.36-81496549401055/command.py && sleep 0'
to retry, use: --limit @/usr/share/ovirt-hosted-
engine-setup/ansible/create_target_vm.retry
The above command.py is the on I have attached as problematic_command.py
*Investigation After Failure:*
[root@node ~]# ls -al '/rhev/data-center/mnt/node.lo
cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5
510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c'
-rw-rw----. 1 vdsm kvm 20480 Apr 3 15:01 /rhev/data-center/mnt/node.loc
al:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c55
10e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c
sudo -u vdsm dd bs=20480 count=1 oflag=direct
if="/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c"
of="/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3
-4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-
24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c
1+0 records in
1+0 records out
It seems to me that somehow it is not getting the right permissions even
though the playbook has:
- name: Copy configuration archive to storage
command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR }}/{{
he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path }}"
become_user: vdsm
become_method: sudo
changed_when: True
tags: [ 'skip_ansible_lint' ]
On Tue, Apr 3, 2018 at 8:51 AM, RabidCicada <rabidcicada(a)gmail.com> wrote:
> I am now also running with:
>
> export ANSIBLE_VERBOSITY=5
>
> export ANSIBLE_FORKS=1
>
> export ANSIBLE_KEEP_REMOTE_FILES=1
>
>
> On Tue, Apr 3, 2018 at 8:49 AM, RabidCicada <rabidcicada(a)gmail.com>
> wrote:
>
>> Here's the log.
>>
>> So the command that it says it ran is:
>> dd bs=20480 count=1 oflag=direct if=/var/tmp/localvmHaWb6G/1cce
>> 8df2-1810-4063-b4e2-e19a2c5b1909 of=/rhev/data-center/mnt/node.
>> local:_srv_data/3c7485ea-14e3-40c1-b627-f89a819ed1d6/images/
>> 2c1f7c2f-b8f7-46d4-ac66-8ff1e9649e29/1cce8df2-1810-4063-b4e2
>> -e19a2c5b1909
>>
>> But we all know that was done with:
>>
>> - name: Copy configuration archive to storage
>> command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR }}/{{
>> he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path
}}"
>> become_user: vdsm
>> become_method: sudo
>> changed_when: True
>> tags: [ 'skip_ansible_lint' ]
>>
>> So I manually replicated with `sudo vdsm dd bs=20480 count=1
>> if=/var/tmp/localvmHaWb6G/1cce8df2-1810-4063-b4e2-e19a2c5b1909
>> of=/rhev/data-center/mnt/node.local:_srv_data/3c7485ea-14e3-
>> 40c1-b627-f89a819ed1d6/images/2c1f7c2f-b8f7-46d4-ac66-8ff1e9
>> 649e29/1cce8df2-1810-4063-b4e2-e19a2c5b1909`
>>
>> And it works when I manually do it. Though I think I didn't use the
>> oflag=direct (Just realised this)
>>
>> I eventually put a pause task directly preceeding it with debug output
>> that showed the file paths. I manually ran the command and it worked.
>> Then let it do it....failed. I checked all the permissions e.g.
>> vdsm:kvm. All looks good from a filesystem point of view. I'm beginning
>> (naively) to suspect a race condition for the file access problem...but
>> have come nowhere close to solving it.
>>
>> *Can you suggest a good way to continue executing the install process
>> from the create_target_vm.yml playbook (with proper variables and context
>> from otopi etc)?* I currently have to restart the entire process over
>> again and wait quite a while for it to circle back around.
>>
>> I have since discovered epdb and I've set breakpoints directly in
>> playbook.py of ansible just to see better log output. I insert epdb.serve
>> and use netcat to connect since epdb on python 2.7.5 and up seems to have
>> problems using epdb.connect() itself.
>>
>> ~Kyle
>>
>> On Tue, Apr 3, 2018 at 4:06 AM, Simone Tiraboschi <stirabos(a)redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Apr 2, 2018 at 4:52 PM, RabidCicada <rabidcicada(a)gmail.com>
>>> wrote:
>>>
>>>> Heyo everyone. I'm trying to debug hosted-engine --deploy. It is
>>>> failing in `Copy configuration archive to storage` in
>>>> `create_target_vm.yml` from `hosted-engine --deploy`. My general and
>>>> most important query here is how to get good debug output from ansible
>>>> through hosted-engine. I'm running hosted-engine through an ssh
session.
>>>>
>>>> I can't figure out how to get good debug output from ansible within
>>>> that workflow. I see it's running through otopi, I tried setting
>>>> typical `debugger: on_failed` hooks etc and tried many incantations on
the
>>>> command line and config files to get ansible to help me out. The
>>>> debugger: directive and other debugger related ansible config file stuff
>>>> wouldn't result in any debugger popping up. I also can't seem to
pass
>>>> normal -vvvv flags to hosted-engine either and get it to ansible.
>>>> Ultimately I tried to use a `pause` directive and it complained that it
was
>>>> in a non-interactive shell. I figured it might be the result of my ssh
>>>> session so I enabled tty allocation with -t -t. It did not resolve the
>>>> issue.
>>>>
>>>> I eventually wrote-my-own/stole a callback_plugin that checks an
>>>> environmental variable and enables `display.verbosity = int(v)` since I
>>>> can't seem to pass typical -vvvv stuff to ansible through
`hosted-engine
>>>> --deploy`. It give me the best info that I have so far. But it wont
give
>>>> me enough to debug issues around Gathering Facts or what looks like a
>>>> sudo/permission problem in `Copy configuration archive to storage` in
>>>> `create_target_vm.yml`. I took and used the exact command that they use
>>>> manually and it works when I run it manually (But I can't get debug
output
>>>> to show me the exact sudo command being executed), hence my interest in
>>>> passing -vvvv or equivalent to ansible through `hosted-engine`. I
>>>> intentionally disabled the VM_directory cleanup so that I could execute
the
>>>> same stuff.
>>>>
>>>> So....after all that...what is a good way to get deep debug info from
>>>> hosted-engine ansible stuff?
>>>>
>>>
>>> You should already find all the relevant log entries in a file
>>> called /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engin
>>> e-setup-ansible-create_target_vm-{timestamp}-{hash}.log
>>>
>>> Can you please share it?
>>>
>>>
>>>>
>>>> Or does anyone have intuition for the possible sudo problem?
>>>> ~Kyle
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>