On Tue, Apr 3, 2018 at 5:21 PM, RabidCicada <rabidcicada@gmail.com> wrote:I've attached a full debug packet below. I include the log file from ovirt-hosted-engine-setup. I include relevant cmd line info. I include info from the command line where epdb has a breakpoint in playbook.py from ansible itself. I also include info from commands I ran after it failed. I also include attached the ferried over script in /root/.ansible/tmp that is run.Output on cmd line:[ INFO ] TASK [Copy configuration archive to storage][ ERROR ] fatal: [localhost]: FAILED! => {"changed": true,"cmd": ["dd","bs=20480","count=1","oflag=direct","if=/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156b df7c", "of=/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d 3-4e38-9afa-62e133aa8017/image s/c5510e77-1ee0-479c-b6cf-24c1 79313a45/5ef881f5-c992-48d2- b969-a0b6156bdf7c" ],"delta": "0:00:00.004336","end": "2018-04-03 15:01:55.581823","invocation": {"module_args": {"_raw_params": "dd bs=20480 count=1 oflag=direct if=\"/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156b df7c\" of=\"/rhev/data-center/mnt/nod e.local:_srv_data/81292f3f-11d 3-4e38-9afa-62e133aa8017/image s/c5510e77-1ee0-479c-b6cf-24c1 79313a45/5ef881f5-c992-48d2- b969-a0b6156bdf7c\"", "_uses_shell": false,"chdir": null,"creates": null,"executable": null,"removes": null,"stdin": null,"warn": true}},"msg": "non-zero return code","rc": 1,"start": "2018-04-03 15:01:55.577487","stderr": "dd: failed to open ‘/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3-4e 38-9afa-62e133aa8017/images/c5 510e77-1ee0-479c-b6cf-24c17931 3a45/5ef881f5-c992-48d2-b969-a 0b6156bdf7c’: Permission denied", "stderr_lines": ["dd: failed to open ‘/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3-4e 38-9afa-62e133aa8017/images/c5 510e77-1ee0-479c-b6cf-24c17931 3a45/5ef881f5-c992-48d2-b969-a 0b6156bdf7c’: Permission denied" ],"stdout": "","stdout_lines": []}[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbookIn the playbook we have on that task:become_user: vdsmbecome_method: sudobut I fear it got somehow ignored.I'll investigate it.Output from ansible epdb tracepoint:Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/ command.py <localhost> ESTABLISH LOCAL CONNECTION FOR USER: root<localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" && echo ansible-tmp-1522767715.36-8149 6549401055="` echo /root/.ansible/tmp/ansible-tmp -1522767715.36-81496549401055 `" ) && sleep 0' <localhost> PUT /tmp/tmpGMGdjh TO /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/ command.py <localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/ /root/.ansible/tmp/ansible-tmp -1522767715.36-81496549401055/ command.py && sleep 0' <localhost> EXEC /bin/sh -c '/usr/bin/python /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/ command.py && sleep 0' to retry, use: --limit @/usr/share/ovirt-hosted-engine-setup/ansible/create_target_ vm.retry The above command.py is the on I have attached as problematic_command.pyInvestigation After Failure:[root@node ~]# ls -al '/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3-4e 38-9afa-62e133aa8017/images/c5 510e77-1ee0-479c-b6cf-24c17931 3a45/5ef881f5-c992-48d2-b969-a 0b6156bdf7c' -rw-rw----. 1 vdsm kvm 20480 Apr 3 15:01 /rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3-4e3 8-9afa-62e133aa8017/images/c55 10e77-1ee0-479c-b6cf-24c179313 a45/5ef881f5-c992-48d2-b969-a0 b6156bdf7c sudo -u vdsm dd bs=20480 count=1 oflag=direct if="/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bd f7c" of="/rhev/data-center/mnt/node .local:_srv_data/81292f3f-11d3 -4e38-9afa-62e133aa8017/images /c5510e77-1ee0-479c-b6cf-24c17 9313a45/5ef881f5-c992-48d2- b969-a0b6156bdf7c 1+0 records in1+0 records outIt seems to me that somehow it is not getting the right permissions even though the playbook has:- name: Copy configuration archive to storagecommand: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR }}/{{ he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path }}" become_user: vdsmbecome_method: sudochanged_when: Truetags: [ 'skip_ansible_lint' ]On Tue, Apr 3, 2018 at 8:51 AM, RabidCicada <rabidcicada@gmail.com> wrote:I am now also running with:export ANSIBLE_VERBOSITY=5
export ANSIBLE_FORKS=1
export ANSIBLE_KEEP_REMOTE_FILES=1
On Tue, Apr 3, 2018 at 8:49 AM, RabidCicada <rabidcicada@gmail.com> wrote:Here's the log.So the command that it says it ran is:dd bs=20480 count=1 oflag=direct if=/var/tmp/localvmHaWb6G/1cce8df2-1810-4063-b4e2-e19a2c5b19 09 of=/rhev/data-center/mnt/node. local:_srv_data/3c7485ea-14e3- 40c1-b627-f89a819ed1d6/images/ 2c1f7c2f-b8f7-46d4-ac66-8ff1e9 649e29/1cce8df2-1810-4063-b4e2 -e19a2c5b1909 But we all know that was done with:- name: Copy configuration archive to storagecommand: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR }}/{{ he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path }}" become_user: vdsmbecome_method: sudochanged_when: Truetags: [ 'skip_ansible_lint' ]So I manually replicated with `sudo vdsm dd bs=20480 count=1 if=/var/tmp/localvmHaWb6G/1cce8df2-1810-4063-b4e2-e19a2c5b19 09 of=/rhev/data-center/mnt/node. local:_srv_data/3c7485ea-14e3- 40c1-b627-f89a819ed1d6/images/ 2c1f7c2f-b8f7-46d4-ac66-8ff1e9 649e29/1cce8df2-1810-4063-b4e2 -e19a2c5b1909` And it works when I manually do it. Though I think I didn't use the oflag=direct (Just realised this)I eventually put a pause task directly preceeding it with debug output that showed the file paths. I manually ran the command and it worked. Then let it do it....failed. I checked all the permissions e.g. vdsm:kvm. All looks good from a filesystem point of view. I'm beginning (naively) to suspect a race condition for the file access problem...but have come nowhere close to solving it.Can you suggest a good way to continue executing the install process from the create_target_vm.yml playbook (with proper variables and context from otopi etc)? I currently have to restart the entire process over again and wait quite a while for it to circle back around.I have since discovered epdb and I've set breakpoints directly in playbook.py of ansible just to see better log output. I insert epdb.serve and use netcat to connect since epdb on python 2.7.5 and up seems to have problems using epdb.connect() itself.~KyleOn Tue, Apr 3, 2018 at 4:06 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:On Mon, Apr 2, 2018 at 4:52 PM, RabidCicada <rabidcicada@gmail.com> wrote:Heyo everyone. I'm trying to debug hosted-engine --deploy. It is failing in `Copy configuration archive to storage` in `create_target_vm.yml` from `hosted-engine --deploy`. My general and most important query here is how to get good debug output from ansible through hosted-engine. I'm running hosted-engine through an ssh session.I can't figure out how to get good debug output from ansible within that workflow. I see it's running through otopi, I tried setting typical `debugger: on_failed` hooks etc and tried many incantations on the command line and config files to get ansible to help me out. The debugger: directive and other debugger related ansible config file stuff wouldn't result in any debugger popping up. I also can't seem to pass normal -vvvv flags to hosted-engine either and get it to ansible. Ultimately I tried to use a `pause` directive and it complained that it was in a non-interactive shell. I figured it might be the result of my ssh session so I enabled tty allocation with -t -t. It did not resolve the issue.I eventually wrote-my-own/stole a callback_plugin that checks an environmental variable and enables `display.verbosity = int(v)` since I can't seem to pass typical -vvvv stuff to ansible through `hosted-engine --deploy`. It give me the best info that I have so far. But it wont give me enough to debug issues around Gathering Facts or what looks like a sudo/permission problem in `Copy configuration archive to storage` in `create_target_vm.yml`. I took and used the exact command that they use manually and it works when I run it manually (But I can't get debug output to show me the exact sudo command being executed), hence my interest in passing -vvvv or equivalent to ansible through `hosted-engine`. I intentionally disabled the VM_directory cleanup so that I could execute the same stuff.So....after all that...what is a good way to get deep debug info from hosted-engine ansible stuff?You should already find all the relevant log entries in a file called /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engin e-setup-ansible-create_target_ vm-{timestamp}-{hash}.log Can you please share it?Or does anyone have intuition for the possible sudo problem?~Kyle
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users