On Thu, Apr 14, 2016 at 6:53 PM, Richard Neuboeck <hawk(a)tbi.univie.ac.at> wrote:
On 14.04.16 18:46, Simone Tiraboschi wrote:
> On Thu, Apr 14, 2016 at 4:04 PM, Richard Neuboeck <hawk(a)tbi.univie.ac.at>
wrote:
>> On 04/14/2016 02:14 PM, Simone Tiraboschi wrote:
>>> On Thu, Apr 14, 2016 at 12:51 PM, Richard Neuboeck
>>> <hawk(a)tbi.univie.ac.at> wrote:
>>>> On 04/13/2016 10:00 AM, Simone Tiraboschi wrote:
>>>>> On Wed, Apr 13, 2016 at 9:38 AM, Richard Neuboeck
<hawk(a)tbi.univie.ac.at> wrote:
>>>>>> The answers file shows the setup time of both machines.
>>>>>>
>>>>>> On both machines hosted-engine.conf got rotated right before I
wrote
>>>>>> this mail. Is it possible that I managed to interrupt the
rotation with
>>>>>> the reboot so the backup was accurate but the update not yet
written to
>>>>>> hosted-engine.conf?
>>>>>
>>>>> AFAIK we don't have any rotation mechanism for that file;
something
>>>>> else you have in place on that host?
>>>>
>>>> Those machines are all CentOS 7.2 minimal installs. The only
>>>> adaptation I do is installing vim, removing postfix and installing
>>>> exim, removing firewalld and installing iptables-service. Then I add
>>>> the oVirt repos (3.6 and 3.6-snapshot) and deploy the host.
>>>>
>>>> But checking lsof shows that 'ovirt-ha-agent --no-daemon' has
access
>>>> to the config file (and the one ending with ~):
>>>>
>>>> # lsof | grep 'hosted-engine.conf~'
>>>> ovirt-ha- 193446 vdsm 351u REG
>>>> 253,0 1021 135070683
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>>>
>>> This is not that much relevant if the file was renamed after
>>> ovirt-ha-agent opened it.
>>> Try this:
>>>
>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# tail -n1 -f
>>> /etc/ovirt-hosted-engine/hosted-engine.conf &
>>> [1] 28866
>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# port=
>>>
>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep
hosted-engine.conf
>>> tail 28866 root 3r REG
>>> 253,0 1014 1595898 /etc/ovirt-hosted-engine/hosted-engine.conf
>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# mv
>>> /etc/ovirt-hosted-engine/hosted-engine.conf
>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep
hosted-engine.conf
>>> tail 28866 root 3r REG
>>> 253,0 1014 1595898
>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]#
>>>
>>
>> I've issued the commands you suggested but I don't know how that
>> helps to find the process accessing the config files.
>>
>> After moving the hosted-engine.conf file the HA agent crashed
>> logging the information that the config file is not available.
>>
>> Here is the output from every command:
>>
>> # tail -n1 -f /etc/ovirt-hosted-engine/hosted-engine.conf &
>> [1] 167865
>> [root@cube-two ~]# port=
>> # lsof | grep hosted-engine.conf
>> ovirt-ha- 166609 vdsm 5u REG
>> 253,0 1021 134433491
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 7u REG
>> 253,0 1021 134433453
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 8u REG
>> 253,0 1021 134433489
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 9u REG
>> 253,0 1021 134433493
>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>> ovirt-ha- 166609 vdsm 10u REG
>> 253,0 1021 134433495
>> /etc/ovirt-hosted-engine/hosted-engine.conf
>> tail 167865 root 3r REG
>> 253,0 1021 134433493
>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>> # mv /etc/ovirt-hosted-engine/hosted-engine.conf
>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>> # lsof | grep hosted-engine.conf
>> ovirt-ha- 166609 vdsm 5u REG
>> 253,0 1021 134433491
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 7u REG
>> 253,0 1021 134433453
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 8u REG
>> 253,0 1021 134433489
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 9u REG
>> 253,0 1021 134433493
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 10u REG
>> 253,0 1021 134433495
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>> ovirt-ha- 166609 vdsm 12u REG
>> 253,0 1021 134433498
>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>> ovirt-ha- 166609 vdsm 13u REG
>> 253,0 1021 134433499
>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>> tail 167865 root 3r REG
>> 253,0 1021 134433493
>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>
>>
>>> The issue is understanding who renames that file on your host.
>>
>> From what I've seen so far it looks like a child of vdsm accesses
>> /etc/ovirt-hosted-engine/hosted-engine.conf periodically but is not
>> responsible for the ~ file.
>>
>> # auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf
>> and
>> # auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf~
>>
>> auditd.log shows this:
>>
>> type=SYSCALL msg=audit(1460639783.613:482590): arch=c000003e
>> syscall=2 success=yes exit=75 a0=7f29b400f0b0 a1=0 a2=1b6 a3=24
>> items=1 ppid=1 pid=3701 auid=4294967295 uid=36 gid=36 euid=36
>> suid=36 fsuid=36 egid=36 sgid=36 fsgid=36 tty=(none) ses=4294967295
>> comm="jsonrpc.Executo" exe="/usr/bin/python2.7"
>> subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 key=(null)
>> type=CWD msg=audit(1460639783.613:482590): cwd="/"
>> type=PATH msg=audit(1460639783.613:482590): item=0
>> name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433499
>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>> obj=system_u:object_r:etc_t:s0 objtype=NORMAL
>>
>>
>> Now that the HA agent is dead I'm removing the ~ file and starting
>> the HA agent again. The ~ file immediately appears again.
>>
>> # rm hosted-engine.conf~
>> rm: remove regular file ‘hosted-engine.conf~’? y
>> [root@cube-two ovirt-hosted-engine]# ls -l
>> total 6800
>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
>> -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log
>> -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf
>> -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example
>> [root@cube-two ovirt-hosted-engine]# systemctl start ovirt-ha-agent
>> [root@cube-two ovirt-hosted-engine]# ls -l
>> total 6804
>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
>> -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log
>> -rw-r--r--. 1 root root 1021 Apr 14 15:18 hosted-engine.conf
>> -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf~
>> -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example
>>
>> The auditd.log shows that ~ file is moved into place but not what
>> issued the mv:
>>
>> type=CONFIG_CHANGE msg=audit(1460639919.277:482750): auid=4294967295
>> ses=4294967295 op="updated_rules"
>> path="/etc/ovirt-hosted-engine/hosted-engine.conf~" key=(null)
>> list=4 res=1
>> type=SYSCALL msg=audit(1460639919.277:482751): arch=c000003e
>> syscall=82 success=yes exit=0 a0=7ffe4b3c0e90 a1=7ffe4b3bf920
>> a2=7f68083a2778 a3=7ffe4b3bf680 items=5 ppid=170233 pid=170234
>> auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 eg
>> id=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mv"
>> exe="/usr/bin/mv" subj=system_u:system_r:unconfined_service_t:s0
>> key=(null)
>> type=CWD msg=audit(1460639919.277:482751): cwd="/"
>> type=PATH msg=audit(1460639919.277:482751): item=0
>> name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755
>> ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT
>> type=PATH msg=audit(1460639919.277:482751): item=1
>> name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755
>> ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT
>> type=PATH msg=audit(1460639919.277:482751): item=2
>> name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433453
>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>> obj=system_u:object_r:etc_t:s0 objtype=DELETE
>> type=PATH msg=audit(1460639919.277:482751): item=3
>> name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433499
>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>> obj=system_u:object_r:etc_t:s0 objtype=DELETE
>> type=PATH msg=audit(1460639919.277:482751): item=4
>> name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433453
>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>> obj=system_u:object_r:etc_t:s0 objtype=CREATE
>>
>>
>>> As a thumb rule, if a file name is appended with a tilde~, it only
>>> means that it is a backup created by a text editor or similar program.
>>
>> If anyone except myself would have access to these systems I would
>> guess the same. But since I'm not editing anything in
>> /etc/ovirt-hosted-engine there must be another reason. And there is.
>>
>> Aside from auditd I tried to strace the whole thing just to make
>> sure it comes from the HA agent.
>>
>> [root@cube-two ~]# strace -o ha-trace.log -f
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>>
>> Looking at the trace log I found this:
>>
>> 183409 statfs("/etc/ovirt-hosted-engine/.", {f_type=0x58465342,
>> f_bsize=4096, f_blocks=13100800, f_bfree=12523576,
>> f_bavail=12523576, f_files=52428800, f_ffree=52379892,
>> f_fsid={64768, 0}, f_namelen=255, f_frsize=4096}) = 0
>> 183409 rename("/etc/ovirt-hosted-engine/hosted-engine.conf",
>> "/etc/ovirt-hosted-engine/hosted-engine.conf~") = 0
>> 183409 rename("/var/lib/ovirt-hosted-engine-ha/tmpNjTElr",
>> "/etc/ovirt-hosted-engine/hosted-engine.conf") = 0
>> 183409 newfstatat(AT_FDCWD,
>> "/etc/ovirt-hosted-engine/hosted-engine.conf",
>> {st_mode=S_IFREG|0600, st_size=1021, ...}, AT_SYMLINK_NOFOLLOW) = 0
>> 183409 open("/etc/ovirt-hosted-engine/hosted-engine.conf",
>> O_RDONLY|O_NOFOLLOW) = 3
>>
>>
>> Putting it all together I started reading the HA agent sources and
>> found the function _wrote_updated_conf_file in
>> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py
>> which issues a mv -b which creates the ~ file.
>
> This should just trigger during 3.5 to 3.6 upgrade but your host are new.
> Can you please attach /var/log/ovirt-hosted-engine-ha/agent.log from
> one of them?
The agent.log of host cube-two is attached to this mail.
Yes, you are right:
it's looping trying to fix a path in the config file (on 3.5 we didn't
check if an NFS path was ending with a '/' while for other reasons it
wasn't working on 3.6 and so we need to fix it) but its doesn't seams
you case and so the strange loop.
Now I need to understand why it enters there.
Can you please execute
tree /rhev/data-center/
and post me the output?
Thanks again
>> The question now is why is this done so frequently.
Especially
>> considering since there are no modifications to the file. Is this
>> behavior normal?
>>
>> [root@cube-two ~]# diff /etc/ovirt-hosted-engine/hosted-engine.conf*
>> [root@cube-two ~]#
>>
>>
>>>>>> [root@cube-two ~]# ls -l /etc/ovirt-hosted-engine
>>>>>> total 16
>>>>>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
>>>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:31 hosted-engine.conf
>>>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:30 hosted-engine.conf~
>>>>>>
>>>>>> [root@cube-three ~]# ls -l /etc/ovirt-hosted-engine
>>>>>> total 16
>>>>>> -rw-r--r--. 1 root root 3233 Apr 11 08:02 answers.conf
>>>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf
>>>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf~
>>>>>>
>>>>>> On 12.04.16 16:01, Simone Tiraboschi wrote:
>>>>>>> Everything seams fine here,
>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf seams to be
correctly
>>>>>>> created with the right name.
>>>>>>> Can you please check the latest modification time of your
>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf~ and compare it
with the
>>>>>>> setup time?
>>>>>>>
>>>>>>> On Tue, Apr 12, 2016 at 2:34 PM, Richard Neuboeck
<hawk(a)tbi.univie.ac.at> wrote:
>>>>>>>> On 04/12/2016 11:32 AM, Simone Tiraboschi wrote:
>>>>>>>>> On Mon, Apr 11, 2016 at 8:11 AM, Richard Neuboeck
<hawk(a)tbi.univie.ac.at> wrote:
>>>>>>>>>> Hi oVirt Group,
>>>>>>>>>>
>>>>>>>>>> in my attempts to get all aspects of oVirt 3.6 up
and running I
>>>>>>>>>> stumbled upon something I'm not sure how to
fix:
>>>>>>>>>>
>>>>>>>>>> Initially I installed a hosted engine setup.
After that I added
>>>>>>>>>> another HA host (with hosted-engine --deploy).
The host was
>>>>>>>>>> registered in the Engine correctly and HA agent
came up as expected.
>>>>>>>>>>
>>>>>>>>>> However if I reboot the second host (through the
Engine UI or
>>>>>>>>>> manually) HA agent fails to start. The reason
seems to be that
>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf is
empty. The backup
>>>>>>>>>> file ending with ~ exists though.
>>>>>>>>>
>>>>>>>>> Can you please attach hosted-engine-setup logs from
your additional hosts?
>>>>>>>>> AFAIK our code will never take a ~ ending backup of
that file.
>>>>>>>>
>>>>>>>> ovirt-hosted-engine-setup logs from both additional hosts
are
>>>>>>>> attached to this mail.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Here are the log messages from the journal:
>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
systemd[1]: Starting oVirt
>>>>>>>>>> Hosted Engine High Availability Monitoring
Agent...
>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha
>>>>>>>>>> agent 1.3.5.3-0.0.master started
>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found
>>>>>>>>>> certificate common name:
cube-two.tbi.univie.ac.at
>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>> ovirt-ha-agent
>>>>>>>>>>
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Hosted
>>>>>>>>>> Engine is not configured. Shutting down.
>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Hosted
>>>>>>>>>> Engine is not configured. Shutting down.
>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
systemd[1]:
>>>>>>>>>> ovirt-ha-agent.service: main process exited,
code=exited, status=255/n/a
>>>>>>>>>>
>>>>>>>>>> If I restore the configuration from the backup
file and manually
>>>>>>>>>> restart the HA agent it's working properly.
>>>>>>>>>>
>>>>>>>>>> For testing purposes I added a third HA host
which turn out to
>>>>>>>>>> behave exactly the same.
>>>>>>>>>>
>>>>>>>>>> Any help would be appreciated!
>>>>>>>>>> Thanks
>>>>>>>>>> Cheers
>>>>>>>>>> Richard
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> /dev/null
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> /dev/null
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users(a)ovirt.org
>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>
>>>>
>>>> --
>>>> /dev/null
>>>>
>>
>>
>> --
>> /dev/null
>>