
On Thu, Apr 14, 2016 at 6:53 PM, Richard Neuboeck <hawk@tbi.univie.ac.at> wrote:
On 14.04.16 18:46, Simone Tiraboschi wrote:
On Thu, Apr 14, 2016 at 4:04 PM, Richard Neuboeck <hawk@tbi.univie.ac.at> wrote:
On 04/14/2016 02:14 PM, Simone Tiraboschi wrote:
On Thu, Apr 14, 2016 at 12:51 PM, Richard Neuboeck <hawk@tbi.univie.ac.at> wrote:
On 04/13/2016 10:00 AM, Simone Tiraboschi wrote:
On Wed, Apr 13, 2016 at 9:38 AM, Richard Neuboeck <hawk@tbi.univie.ac.at> wrote: > The answers file shows the setup time of both machines. > > On both machines hosted-engine.conf got rotated right before I wrote > this mail. Is it possible that I managed to interrupt the rotation with > the reboot so the backup was accurate but the update not yet written to > hosted-engine.conf?
AFAIK we don't have any rotation mechanism for that file; something else you have in place on that host?
Those machines are all CentOS 7.2 minimal installs. The only adaptation I do is installing vim, removing postfix and installing exim, removing firewalld and installing iptables-service. Then I add the oVirt repos (3.6 and 3.6-snapshot) and deploy the host.
But checking lsof shows that 'ovirt-ha-agent --no-daemon' has access to the config file (and the one ending with ~):
# lsof | grep 'hosted-engine.conf~' ovirt-ha- 193446 vdsm 351u REG 253,0 1021 135070683 /etc/ovirt-hosted-engine/hosted-engine.conf~
This is not that much relevant if the file was renamed after ovirt-ha-agent opened it. Try this:
[root@c72he20160405h1 ovirt-hosted-engine-setup]# tail -n1 -f /etc/ovirt-hosted-engine/hosted-engine.conf & [1] 28866 [root@c72he20160405h1 ovirt-hosted-engine-setup]# port=
[root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep hosted-engine.conf tail 28866 root 3r REG 253,0 1014 1595898 /etc/ovirt-hosted-engine/hosted-engine.conf [root@c72he20160405h1 ovirt-hosted-engine-setup]# mv /etc/ovirt-hosted-engine/hosted-engine.conf /etc/ovirt-hosted-engine/hosted-engine.conf_123 [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep hosted-engine.conf tail 28866 root 3r REG 253,0 1014 1595898 /etc/ovirt-hosted-engine/hosted-engine.conf_123 [root@c72he20160405h1 ovirt-hosted-engine-setup]#
I've issued the commands you suggested but I don't know how that helps to find the process accessing the config files.
After moving the hosted-engine.conf file the HA agent crashed logging the information that the config file is not available.
Here is the output from every command:
# tail -n1 -f /etc/ovirt-hosted-engine/hosted-engine.conf & [1] 167865 [root@cube-two ~]# port= # lsof | grep hosted-engine.conf ovirt-ha- 166609 vdsm 5u REG 253,0 1021 134433491 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 7u REG 253,0 1021 134433453 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 8u REG 253,0 1021 134433489 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 9u REG 253,0 1021 134433493 /etc/ovirt-hosted-engine/hosted-engine.conf~ ovirt-ha- 166609 vdsm 10u REG 253,0 1021 134433495 /etc/ovirt-hosted-engine/hosted-engine.conf tail 167865 root 3r REG 253,0 1021 134433493 /etc/ovirt-hosted-engine/hosted-engine.conf~ # mv /etc/ovirt-hosted-engine/hosted-engine.conf /etc/ovirt-hosted-engine/hosted-engine.conf_123 # lsof | grep hosted-engine.conf ovirt-ha- 166609 vdsm 5u REG 253,0 1021 134433491 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 7u REG 253,0 1021 134433453 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 8u REG 253,0 1021 134433489 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 9u REG 253,0 1021 134433493 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 10u REG 253,0 1021 134433495 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) ovirt-ha- 166609 vdsm 12u REG 253,0 1021 134433498 /etc/ovirt-hosted-engine/hosted-engine.conf~ ovirt-ha- 166609 vdsm 13u REG 253,0 1021 134433499 /etc/ovirt-hosted-engine/hosted-engine.conf_123 tail 167865 root 3r REG 253,0 1021 134433493 /etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
The issue is understanding who renames that file on your host.
From what I've seen so far it looks like a child of vdsm accesses /etc/ovirt-hosted-engine/hosted-engine.conf periodically but is not responsible for the ~ file.
# auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf and # auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf~
auditd.log shows this:
type=SYSCALL msg=audit(1460639783.613:482590): arch=c000003e syscall=2 success=yes exit=75 a0=7f29b400f0b0 a1=0 a2=1b6 a3=24 items=1 ppid=1 pid=3701 auid=4294967295 uid=36 gid=36 euid=36 suid=36 fsuid=36 egid=36 sgid=36 fsgid=36 tty=(none) ses=4294967295 comm="jsonrpc.Executo" exe="/usr/bin/python2.7" subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 key=(null) type=CWD msg=audit(1460639783.613:482590): cwd="/" type=PATH msg=audit(1460639783.613:482590): item=0 name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433499 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=NORMAL
Now that the HA agent is dead I'm removing the ~ file and starting the HA agent again. The ~ file immediately appears again.
# rm hosted-engine.conf~ rm: remove regular file ‘hosted-engine.conf~’? y [root@cube-two ovirt-hosted-engine]# ls -l total 6800 -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example [root@cube-two ovirt-hosted-engine]# systemctl start ovirt-ha-agent [root@cube-two ovirt-hosted-engine]# ls -l total 6804 -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log -rw-r--r--. 1 root root 1021 Apr 14 15:18 hosted-engine.conf -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf~ -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example
The auditd.log shows that ~ file is moved into place but not what issued the mv:
type=CONFIG_CHANGE msg=audit(1460639919.277:482750): auid=4294967295 ses=4294967295 op="updated_rules" path="/etc/ovirt-hosted-engine/hosted-engine.conf~" key=(null) list=4 res=1 type=SYSCALL msg=audit(1460639919.277:482751): arch=c000003e syscall=82 success=yes exit=0 a0=7ffe4b3c0e90 a1=7ffe4b3bf920 a2=7f68083a2778 a3=7ffe4b3bf680 items=5 ppid=170233 pid=170234 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 eg id=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mv" exe="/usr/bin/mv" subj=system_u:system_r:unconfined_service_t:s0 key=(null) type=CWD msg=audit(1460639919.277:482751): cwd="/" type=PATH msg=audit(1460639919.277:482751): item=0 name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT type=PATH msg=audit(1460639919.277:482751): item=1 name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT type=PATH msg=audit(1460639919.277:482751): item=2 name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433453 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=DELETE type=PATH msg=audit(1460639919.277:482751): item=3 name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433499 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=DELETE type=PATH msg=audit(1460639919.277:482751): item=4 name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433453 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=CREATE
As a thumb rule, if a file name is appended with a tilde~, it only means that it is a backup created by a text editor or similar program.
If anyone except myself would have access to these systems I would guess the same. But since I'm not editing anything in /etc/ovirt-hosted-engine there must be another reason. And there is.
Aside from auditd I tried to strace the whole thing just to make sure it comes from the HA agent.
[root@cube-two ~]# strace -o ha-trace.log -f /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
Looking at the trace log I found this:
183409 statfs("/etc/ovirt-hosted-engine/.", {f_type=0x58465342, f_bsize=4096, f_blocks=13100800, f_bfree=12523576, f_bavail=12523576, f_files=52428800, f_ffree=52379892, f_fsid={64768, 0}, f_namelen=255, f_frsize=4096}) = 0 183409 rename("/etc/ovirt-hosted-engine/hosted-engine.conf", "/etc/ovirt-hosted-engine/hosted-engine.conf~") = 0 183409 rename("/var/lib/ovirt-hosted-engine-ha/tmpNjTElr", "/etc/ovirt-hosted-engine/hosted-engine.conf") = 0 183409 newfstatat(AT_FDCWD, "/etc/ovirt-hosted-engine/hosted-engine.conf", {st_mode=S_IFREG|0600, st_size=1021, ...}, AT_SYMLINK_NOFOLLOW) = 0 183409 open("/etc/ovirt-hosted-engine/hosted-engine.conf", O_RDONLY|O_NOFOLLOW) = 3
Putting it all together I started reading the HA agent sources and found the function _wrote_updated_conf_file in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py which issues a mv -b which creates the ~ file.
This should just trigger during 3.5 to 3.6 upgrade but your host are new. Can you please attach /var/log/ovirt-hosted-engine-ha/agent.log from one of them?
The agent.log of host cube-two is attached to this mail.
Yes, you are right: it's looping trying to fix a path in the config file (on 3.5 we didn't check if an NFS path was ending with a '/' while for other reasons it wasn't working on 3.6 and so we need to fix it) but its doesn't seams you case and so the strange loop. Now I need to understand why it enters there. Can you please execute tree /rhev/data-center/ and post me the output? Thanks again
The question now is why is this done so frequently. Especially considering since there are no modifications to the file. Is this behavior normal?
[root@cube-two ~]# diff /etc/ovirt-hosted-engine/hosted-engine.conf* [root@cube-two ~]#
> [root@cube-two ~]# ls -l /etc/ovirt-hosted-engine > total 16 > -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf > -rw-r--r--. 1 root root 1021 Apr 13 09:31 hosted-engine.conf > -rw-r--r--. 1 root root 1021 Apr 13 09:30 hosted-engine.conf~ > > [root@cube-three ~]# ls -l /etc/ovirt-hosted-engine > total 16 > -rw-r--r--. 1 root root 3233 Apr 11 08:02 answers.conf > -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf > -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf~ > > On 12.04.16 16:01, Simone Tiraboschi wrote: >> Everything seams fine here, >> /etc/ovirt-hosted-engine/hosted-engine.conf seams to be correctly >> created with the right name. >> Can you please check the latest modification time of your >> /etc/ovirt-hosted-engine/hosted-engine.conf~ and compare it with the >> setup time? >> >> On Tue, Apr 12, 2016 at 2:34 PM, Richard Neuboeck <hawk@tbi.univie.ac.at> wrote: >>> On 04/12/2016 11:32 AM, Simone Tiraboschi wrote: >>>> On Mon, Apr 11, 2016 at 8:11 AM, Richard Neuboeck <hawk@tbi.univie.ac.at> wrote: >>>>> Hi oVirt Group, >>>>> >>>>> in my attempts to get all aspects of oVirt 3.6 up and running I >>>>> stumbled upon something I'm not sure how to fix: >>>>> >>>>> Initially I installed a hosted engine setup. After that I added >>>>> another HA host (with hosted-engine --deploy). The host was >>>>> registered in the Engine correctly and HA agent came up as expected. >>>>> >>>>> However if I reboot the second host (through the Engine UI or >>>>> manually) HA agent fails to start. The reason seems to be that >>>>> /etc/ovirt-hosted-engine/hosted-engine.conf is empty. The backup >>>>> file ending with ~ exists though. >>>> >>>> Can you please attach hosted-engine-setup logs from your additional hosts? >>>> AFAIK our code will never take a ~ ending backup of that file. >>> >>> ovirt-hosted-engine-setup logs from both additional hosts are >>> attached to this mail. >>> >>>> >>>>> Here are the log messages from the journal: >>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at systemd[1]: Starting oVirt >>>>> Hosted Engine High Availability Monitoring Agent... >>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha >>>>> agent 1.3.5.3-0.0.master started >>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>> INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found >>>>> certificate common name: cube-two.tbi.univie.ac.at >>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>> ovirt-ha-agent >>>>> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Hosted >>>>> Engine is not configured. Shutting down. >>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Hosted >>>>> Engine is not configured. Shutting down. >>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at systemd[1]: >>>>> ovirt-ha-agent.service: main process exited, code=exited, status=255/n/a >>>>> >>>>> If I restore the configuration from the backup file and manually >>>>> restart the HA agent it's working properly. >>>>> >>>>> For testing purposes I added a third HA host which turn out to >>>>> behave exactly the same. >>>>> >>>>> Any help would be appreciated! >>>>> Thanks >>>>> Cheers >>>>> Richard >>>>> >>>>> -- >>>>> /dev/null >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>> >>> >>> -- >>> /dev/null > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >
-- /dev/null
-- /dev/null