Hi,
On Sat, November 26, 2016 2:11 pm, Nir Soffer wrote:
On Sat, Nov 26, 2016 at 7:02 PM, Derek Atkins <derek(a)ihtfp.com>
wrote:
> Hi,
> Last night at approximately 22:03 (give or take a few minutes) my
> single-server ovirt system rebooted. I saw a root login (presumably
> from
> the engine) and then the host rebooted.
>
> I don't see anything specific in the logs that would indicate WHY it
> rebooted.
>
> Any suggestions for where to look or what to look for? I had updated
> the
> host yesterday to 4.0.5 and did reboot the system myself several times.
> The
> good news is that I succeeded in getting my scripts working so all my
> VMs
> came back. But before I go and deploy anything mission critical I'd
> like to
> determine why the system decided it should reboot.
>
> Suggestions?
One reason for reboot is loosing access to storage on the spm host.
sanlock
would terminate vdsm, and if it could not terminate vdsm it will
reboot the host.
To check for this you can look into /var/log/sanlock.log
Nothing there except the restart (which happened between 22:03 and 22:06):
2016-11-25 15:37:45-0500 730 [5654]: s3:r3 resource
efc01877-2748-4f50-8165-976ed152feea:SDM:/rhev/data-center/mnt/ovirt-0.dom.ain:_ovirt_data_data/efc01877-2748-4f50-8165-976ed152feea/dom_md/leases:1048576
for 2,15,3578
2016-11-25 22:06:16-0500 28 [1556]: sanlock daemon started 3.2.4 host
c24b508e-8808-405b-bebd-0ba9e28cdf37.ovirt-0.ih
Another reason may be engine trying to do soft-fencing because it
cannot
restart vdsm. Looking in vdsm log you can find if vdsm was restarted, and
in engine log you should find logs about this.
I did not see anything in the engine log, but then again I'm not sure what
to look for. I looked for "reboot", "restart", and "kill",
but the only
stuff that appeared in the engine logs are the times earlier in the day
when I actually rebooted it myself. Here's the exact log entries across
the reboot time:
2016-11-25 21:38:52,237 INFO
[org.ovirt.engine.core.bll.storage.domain.IsoDomainListSyncronizer]
(org.ovirt.thread.pool-6-thread-31) [e463ce4] Finished automatic refresh
process for 'ISO' file type with success, for storage domain id
'fd2a26a5-2851-4495-891f-cb7a2f069411'.
2016-11-25 22:13:56,891 INFO
[org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread
Pool -- 42) [] Loaded file
'/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf'.
The reboot at 22:03 was spurious (I know it wasn't me -- I'd been in bed
for an hour). I've attached the VDSM logs for the hour before/after the
reboot occurred in case there is something in there that might help. I do
see one instance of "reboot" in the vdsm log:
<on_poweroff>destroy</on_poweroff><on_reboot>destroy</on_reboot><on_crash>destroy</on_crash></domain>
but this is from after the reboot (c.f. log15) not before.
Are you sure you don't have any cron job or some other code that
is
configured reboot the host?
Absolutely! I never set up anything to reboot the system. I didn't see
anything in /var/log/messages or /var/log/secure that would indicate
anything, either. Here's the /var/log/messages for across the reboot:
Nov 25 22:03:07 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree:238982
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:13 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing
VDSM
Nov 25 22:03:15 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Nov 25 22:03:15 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting
the storage
Nov 25 22:03:15 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting
storage server
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree:
238979
Nov 25 22:03:19 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Nov 25 22:03:19 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting
storage server
Nov 25 22:03:20 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing
the storage domain
Nov 25 22:03:20 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Nov 25 22:03:20 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing
images
Nov 25 22:03:20 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
Nov 25 22:06:16 ovirt-0 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.7" x-pid="1509"
x-info="http://www.rsyslog.com"] start
Nov 25 22:06:06 ovirt-0 journal: Runtime journal is using 8.0M (max
allowed 4.0G, trying to leave 4.0G free of 125.8G available → current
limit 4.0G).
Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuset
Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpu
Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuacct
Any other suggestions?
Thanks,
Nir
-derek
--
Derek Atkins 617-623-3745
derek(a)ihtfp.com
www.ihtfp.com
Computer and Internet Security Consultant