On Thu, Apr 14, 2016 at 10:38 PM, Simone Tiraboschi <stirabos(a)redhat.com> wrote:
On Thu, Apr 14, 2016 at 6:53 PM, Richard Neuboeck
<hawk(a)tbi.univie.ac.at> wrote:
> On 14.04.16 18:46, Simone Tiraboschi wrote:
>> On Thu, Apr 14, 2016 at 4:04 PM, Richard Neuboeck <hawk(a)tbi.univie.ac.at>
wrote:
>>> On 04/14/2016 02:14 PM, Simone Tiraboschi wrote:
>>>> On Thu, Apr 14, 2016 at 12:51 PM, Richard Neuboeck
>>>> <hawk(a)tbi.univie.ac.at> wrote:
>>>>> On 04/13/2016 10:00 AM, Simone Tiraboschi wrote:
>>>>>> On Wed, Apr 13, 2016 at 9:38 AM, Richard Neuboeck
<hawk(a)tbi.univie.ac.at> wrote:
>>>>>>> The answers file shows the setup time of both machines.
>>>>>>>
>>>>>>> On both machines hosted-engine.conf got rotated right before
I wrote
>>>>>>> this mail. Is it possible that I managed to interrupt the
rotation with
>>>>>>> the reboot so the backup was accurate but the update not yet
written to
>>>>>>> hosted-engine.conf?
>>>>>>
>>>>>> AFAIK we don't have any rotation mechanism for that file;
something
>>>>>> else you have in place on that host?
>>>>>
>>>>> Those machines are all CentOS 7.2 minimal installs. The only
>>>>> adaptation I do is installing vim, removing postfix and installing
>>>>> exim, removing firewalld and installing iptables-service. Then I add
>>>>> the oVirt repos (3.6 and 3.6-snapshot) and deploy the host.
>>>>>
>>>>> But checking lsof shows that 'ovirt-ha-agent --no-daemon' has
access
>>>>> to the config file (and the one ending with ~):
>>>>>
>>>>> # lsof | grep 'hosted-engine.conf~'
>>>>> ovirt-ha- 193446 vdsm 351u REG
>>>>> 253,0 1021 135070683
>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>>>>
>>>> This is not that much relevant if the file was renamed after
>>>> ovirt-ha-agent opened it.
>>>> Try this:
>>>>
>>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# tail -n1 -f
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf &
>>>> [1] 28866
>>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# port=
>>>>
>>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep
hosted-engine.conf
>>>> tail 28866 root 3r REG
>>>> 253,0 1014 1595898 /etc/ovirt-hosted-engine/hosted-engine.conf
>>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# mv
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep
hosted-engine.conf
>>>> tail 28866 root 3r REG
>>>> 253,0 1014 1595898
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]#
>>>>
>>>
>>> I've issued the commands you suggested but I don't know how that
>>> helps to find the process accessing the config files.
>>>
>>> After moving the hosted-engine.conf file the HA agent crashed
>>> logging the information that the config file is not available.
>>>
>>> Here is the output from every command:
>>>
>>> # tail -n1 -f /etc/ovirt-hosted-engine/hosted-engine.conf &
>>> [1] 167865
>>> [root@cube-two ~]# port=
>>> # lsof | grep hosted-engine.conf
>>> ovirt-ha- 166609 vdsm 5u REG
>>> 253,0 1021 134433491
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 7u REG
>>> 253,0 1021 134433453
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 8u REG
>>> 253,0 1021 134433489
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 9u REG
>>> 253,0 1021 134433493
>>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>>> ovirt-ha- 166609 vdsm 10u REG
>>> 253,0 1021 134433495
>>> /etc/ovirt-hosted-engine/hosted-engine.conf
>>> tail 167865 root 3r REG
>>> 253,0 1021 134433493
>>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>>> # mv /etc/ovirt-hosted-engine/hosted-engine.conf
>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>>> # lsof | grep hosted-engine.conf
>>> ovirt-ha- 166609 vdsm 5u REG
>>> 253,0 1021 134433491
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 7u REG
>>> 253,0 1021 134433453
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 8u REG
>>> 253,0 1021 134433489
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 9u REG
>>> 253,0 1021 134433493
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 10u REG
>>> 253,0 1021 134433495
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>> ovirt-ha- 166609 vdsm 12u REG
>>> 253,0 1021 134433498
>>> /etc/ovirt-hosted-engine/hosted-engine.conf~
>>> ovirt-ha- 166609 vdsm 13u REG
>>> 253,0 1021 134433499
>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123
>>> tail 167865 root 3r REG
>>> 253,0 1021 134433493
>>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
>>>
>>>
>>>> The issue is understanding who renames that file on your host.
>>>
>>> From what I've seen so far it looks like a child of vdsm accesses
>>> /etc/ovirt-hosted-engine/hosted-engine.conf periodically but is not
>>> responsible for the ~ file.
>>>
>>> # auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf
>>> and
>>> # auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf~
>>>
>>> auditd.log shows this:
>>>
>>> type=SYSCALL msg=audit(1460639783.613:482590): arch=c000003e
>>> syscall=2 success=yes exit=75 a0=7f29b400f0b0 a1=0 a2=1b6 a3=24
>>> items=1 ppid=1 pid=3701 auid=4294967295 uid=36 gid=36 euid=36
>>> suid=36 fsuid=36 egid=36 sgid=36 fsgid=36 tty=(none) ses=4294967295
>>> comm="jsonrpc.Executo" exe="/usr/bin/python2.7"
>>> subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 key=(null)
>>> type=CWD msg=audit(1460639783.613:482590): cwd="/"
>>> type=PATH msg=audit(1460639783.613:482590): item=0
>>> name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433499
>>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>>> obj=system_u:object_r:etc_t:s0 objtype=NORMAL
>>>
>>>
>>> Now that the HA agent is dead I'm removing the ~ file and starting
>>> the HA agent again. The ~ file immediately appears again.
>>>
>>> # rm hosted-engine.conf~
>>> rm: remove regular file ‘hosted-engine.conf~’? y
>>> [root@cube-two ovirt-hosted-engine]# ls -l
>>> total 6800
>>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
>>> -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log
>>> -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf
>>> -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example
>>> [root@cube-two ovirt-hosted-engine]# systemctl start ovirt-ha-agent
>>> [root@cube-two ovirt-hosted-engine]# ls -l
>>> total 6804
>>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
>>> -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log
>>> -rw-r--r--. 1 root root 1021 Apr 14 15:18 hosted-engine.conf
>>> -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf~
>>> -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example
>>>
>>> The auditd.log shows that ~ file is moved into place but not what
>>> issued the mv:
>>>
>>> type=CONFIG_CHANGE msg=audit(1460639919.277:482750): auid=4294967295
>>> ses=4294967295 op="updated_rules"
>>> path="/etc/ovirt-hosted-engine/hosted-engine.conf~" key=(null)
>>> list=4 res=1
>>> type=SYSCALL msg=audit(1460639919.277:482751): arch=c000003e
>>> syscall=82 success=yes exit=0 a0=7ffe4b3c0e90 a1=7ffe4b3bf920
>>> a2=7f68083a2778 a3=7ffe4b3bf680 items=5 ppid=170233 pid=170234
>>> auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 eg
>>> id=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mv"
>>> exe="/usr/bin/mv" subj=system_u:system_r:unconfined_service_t:s0
>>> key=(null)
>>> type=CWD msg=audit(1460639919.277:482751): cwd="/"
>>> type=PATH msg=audit(1460639919.277:482751): item=0
>>> name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755
>>> ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT
>>> type=PATH msg=audit(1460639919.277:482751): item=1
>>> name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755
>>> ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT
>>> type=PATH msg=audit(1460639919.277:482751): item=2
>>> name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433453
>>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>>> obj=system_u:object_r:etc_t:s0 objtype=DELETE
>>> type=PATH msg=audit(1460639919.277:482751): item=3
>>> name="/etc/ovirt-hosted-engine/hosted-engine.conf~"
inode=134433499
>>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>>> obj=system_u:object_r:etc_t:s0 objtype=DELETE
>>> type=PATH msg=audit(1460639919.277:482751): item=4
>>> name="/etc/ovirt-hosted-engine/hosted-engine.conf~"
inode=134433453
>>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
>>> obj=system_u:object_r:etc_t:s0 objtype=CREATE
>>>
>>>
>>>> As a thumb rule, if a file name is appended with a tilde~, it only
>>>> means that it is a backup created by a text editor or similar program.
>>>
>>> If anyone except myself would have access to these systems I would
>>> guess the same. But since I'm not editing anything in
>>> /etc/ovirt-hosted-engine there must be another reason. And there is.
>>>
>>> Aside from auditd I tried to strace the whole thing just to make
>>> sure it comes from the HA agent.
>>>
>>> [root@cube-two ~]# strace -o ha-trace.log -f
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>>>
>>> Looking at the trace log I found this:
>>>
>>> 183409 statfs("/etc/ovirt-hosted-engine/.", {f_type=0x58465342,
>>> f_bsize=4096, f_blocks=13100800, f_bfree=12523576,
>>> f_bavail=12523576, f_files=52428800, f_ffree=52379892,
>>> f_fsid={64768, 0}, f_namelen=255, f_frsize=4096}) = 0
>>> 183409 rename("/etc/ovirt-hosted-engine/hosted-engine.conf",
>>> "/etc/ovirt-hosted-engine/hosted-engine.conf~") = 0
>>> 183409 rename("/var/lib/ovirt-hosted-engine-ha/tmpNjTElr",
>>> "/etc/ovirt-hosted-engine/hosted-engine.conf") = 0
>>> 183409 newfstatat(AT_FDCWD,
>>> "/etc/ovirt-hosted-engine/hosted-engine.conf",
>>> {st_mode=S_IFREG|0600, st_size=1021, ...}, AT_SYMLINK_NOFOLLOW) = 0
>>> 183409 open("/etc/ovirt-hosted-engine/hosted-engine.conf",
>>> O_RDONLY|O_NOFOLLOW) = 3
>>>
>>>
>>> Putting it all together I started reading the HA agent sources and
>>> found the function _wrote_updated_conf_file in
>>> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py
>>> which issues a mv -b which creates the ~ file.
>>
>> This should just trigger during 3.5 to 3.6 upgrade but your host are new.
>> Can you please attach /var/log/ovirt-hosted-engine-ha/agent.log from
>> one of them?
>
> The agent.log of host cube-two is attached to this mail.
Yes, you are right:
it's looping trying to fix a path in the config file (on 3.5 we didn't
check if an NFS path was ending with a '/' while for other reasons it
wasn't working on 3.6 and so we need to fix it) but its doesn't seams
you case and so the strange loop.
Now I need to understand why it enters there.
Can you please execute
tree /rhev/data-center/
and post me the output?
Thanks again
OK, I think it's just a side effect if this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1317699
The local mount point of NFS and GlusterFS storage domain are slightly
different.
The hosted-engine storage domain autoimport procedure didn't probably
recognized the gluster storage domain as a gluster one and so marked
it as nfs in the engine and so it got mounted twice on different paths
(by ovirt-ha-agent at boot time and by engine further on);
ovirt-ha-agent notices it and think that's due to the old trailing
slash issue on NFS paths and so it tries to fix but the there is
nothing to fix in the config file and so the infinite loop.
I'll try to push a patch ASAP.
Can you please still provide the output of
tree /rhev/data-center/
>>> The question now is why is this done so frequently.
Especially
>>> considering since there are no modifications to the file. Is this
>>> behavior normal?
>>>
>>> [root@cube-two ~]# diff /etc/ovirt-hosted-engine/hosted-engine.conf*
>>> [root@cube-two ~]#
>>>
>>>
>>>>>>> [root@cube-two ~]# ls -l /etc/ovirt-hosted-engine
>>>>>>> total 16
>>>>>>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
>>>>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:31 hosted-engine.conf
>>>>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:30
hosted-engine.conf~
>>>>>>>
>>>>>>> [root@cube-three ~]# ls -l /etc/ovirt-hosted-engine
>>>>>>> total 16
>>>>>>> -rw-r--r--. 1 root root 3233 Apr 11 08:02 answers.conf
>>>>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf
>>>>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31
hosted-engine.conf~
>>>>>>>
>>>>>>> On 12.04.16 16:01, Simone Tiraboschi wrote:
>>>>>>>> Everything seams fine here,
>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf seams to be
correctly
>>>>>>>> created with the right name.
>>>>>>>> Can you please check the latest modification time of
your
>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf~ and compare
it with the
>>>>>>>> setup time?
>>>>>>>>
>>>>>>>> On Tue, Apr 12, 2016 at 2:34 PM, Richard Neuboeck
<hawk(a)tbi.univie.ac.at> wrote:
>>>>>>>>> On 04/12/2016 11:32 AM, Simone Tiraboschi wrote:
>>>>>>>>>> On Mon, Apr 11, 2016 at 8:11 AM, Richard Neuboeck
<hawk(a)tbi.univie.ac.at> wrote:
>>>>>>>>>>> Hi oVirt Group,
>>>>>>>>>>>
>>>>>>>>>>> in my attempts to get all aspects of oVirt
3.6 up and running I
>>>>>>>>>>> stumbled upon something I'm not sure how
to fix:
>>>>>>>>>>>
>>>>>>>>>>> Initially I installed a hosted engine setup.
After that I added
>>>>>>>>>>> another HA host (with hosted-engine
--deploy). The host was
>>>>>>>>>>> registered in the Engine correctly and HA
agent came up as expected.
>>>>>>>>>>>
>>>>>>>>>>> However if I reboot the second host (through
the Engine UI or
>>>>>>>>>>> manually) HA agent fails to start. The reason
seems to be that
>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf
is empty. The backup
>>>>>>>>>>> file ending with ~ exists though.
>>>>>>>>>>
>>>>>>>>>> Can you please attach hosted-engine-setup logs
from your additional hosts?
>>>>>>>>>> AFAIK our code will never take a ~ ending backup
of that file.
>>>>>>>>>
>>>>>>>>> ovirt-hosted-engine-setup logs from both additional
hosts are
>>>>>>>>> attached to this mail.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Here are the log messages from the journal:
>>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
systemd[1]: Starting oVirt
>>>>>>>>>>> Hosted Engine High Availability Monitoring
Agent...
>>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>>
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha
>>>>>>>>>>> agent 1.3.5.3-0.0.master started
>>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>>
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found
>>>>>>>>>>> certificate common name:
cube-two.tbi.univie.ac.at
>>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>> ovirt-ha-agent
>>>>>>>>>>>
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Hosted
>>>>>>>>>>> Engine is not configured. Shutting down.
>>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>>
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Hosted
>>>>>>>>>>> Engine is not configured. Shutting down.
>>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
ovirt-ha-agent[3747]:
>>>>>>>>>>>
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at
systemd[1]:
>>>>>>>>>>> ovirt-ha-agent.service: main process exited,
code=exited, status=255/n/a
>>>>>>>>>>>
>>>>>>>>>>> If I restore the configuration from the
backup file and manually
>>>>>>>>>>> restart the HA agent it's working
properly.
>>>>>>>>>>>
>>>>>>>>>>> For testing purposes I added a third HA host
which turn out to
>>>>>>>>>>> behave exactly the same.
>>>>>>>>>>>
>>>>>>>>>>> Any help would be appreciated!
>>>>>>>>>>> Thanks
>>>>>>>>>>> Cheers
>>>>>>>>>>> Richard
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> /dev/null
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> Users mailing list
>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> /dev/null
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users(a)ovirt.org
>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> /dev/null
>>>>>
>>>
>>>
>>> --
>>> /dev/null
>>>
>