[ovirt-users] Spurious reboot?
Derek Atkins
derek at ihtfp.com
Sun Nov 27 01:17:06 UTC 2016
Hi,
On Sat, November 26, 2016 2:11 pm, Nir Soffer wrote:
> On Sat, Nov 26, 2016 at 7:02 PM, Derek Atkins <derek at ihtfp.com> wrote:
>> Hi,
>> Last night at approximately 22:03 (give or take a few minutes) my
>> single-server ovirt system rebooted. I saw a root login (presumably
>> from
>> the engine) and then the host rebooted.
>>
>> I don't see anything specific in the logs that would indicate WHY it
>> rebooted.
>>
>> Any suggestions for where to look or what to look for? I had updated
>> the
>> host yesterday to 4.0.5 and did reboot the system myself several times.
>> The
>> good news is that I succeeded in getting my scripts working so all my
>> VMs
>> came back. But before I go and deploy anything mission critical I'd
>> like to
>> determine why the system decided it should reboot.
>>
>> Suggestions?
>
> One reason for reboot is loosing access to storage on the spm host.
> sanlock
> would terminate vdsm, and if it could not terminate vdsm it will
> reboot the host.
>
> To check for this you can look into /var/log/sanlock.log
Nothing there except the restart (which happened between 22:03 and 22:06):
2016-11-25 15:37:45-0500 730 [5654]: s3:r3 resource
efc01877-2748-4f50-8165-976ed152feea:SDM:/rhev/data-center/mnt/ovirt-0.dom.ain:_ovirt_data_data/efc01877-2748-4f50-8165-976ed152feea/dom_md/leases:1048576
for 2,15,3578
2016-11-25 22:06:16-0500 28 [1556]: sanlock daemon started 3.2.4 host
c24b508e-8808-405b-bebd-0ba9e28cdf37.ovirt-0.ih
> Another reason may be engine trying to do soft-fencing because it cannot
> restart vdsm. Looking in vdsm log you can find if vdsm was restarted, and
> in engine log you should find logs about this.
I did not see anything in the engine log, but then again I'm not sure what
to look for. I looked for "reboot", "restart", and "kill", but the only
stuff that appeared in the engine logs are the times earlier in the day
when I actually rebooted it myself. Here's the exact log entries across
the reboot time:
2016-11-25 21:38:52,237 INFO
[org.ovirt.engine.core.bll.storage.domain.IsoDomainListSyncronizer]
(org.ovirt.thread.pool-6-thread-31) [e463ce4] Finished automatic refresh
process for 'ISO' file type with success, for storage domain id
'fd2a26a5-2851-4495-891f-cb7a2f069411'.
2016-11-25 22:13:56,891 INFO
[org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread
Pool -- 42) [] Loaded file
'/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf'.
The reboot at 22:03 was spurious (I know it wasn't me -- I'd been in bed
for an hour). I've attached the VDSM logs for the hour before/after the
reboot occurred in case there is something in there that might help. I do
see one instance of "reboot" in the vdsm log:
<on_poweroff>destroy</on_poweroff><on_reboot>destroy</on_reboot><on_crash>destroy</on_crash></domain>
but this is from after the reboot (c.f. log15) not before.
> Are you sure you don't have any cron job or some other code that is
> configured reboot the host?
Absolutely! I never set up anything to reboot the system. I didn't see
anything in /var/log/messages or /var/log/secure that would indicate
anything, either. Here's the /var/log/messages for across the reboot:
Nov 25 22:03:07 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree:238982
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:11 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:13 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing
VDSM
Nov 25 22:03:15 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Nov 25 22:03:15 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting
the storage
Nov 25 22:03:15 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting
storage server
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
established
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker:
INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection
closed
Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree:
238979
Nov 25 22:03:19 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Nov 25 22:03:19 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting
storage server
Nov 25 22:03:20 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing
the storage domain
Nov 25 22:03:20 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Nov 25 22:03:20 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing
images
Nov 25 22:03:20 ovirt-0 ovirt-ha-agent:
INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
Nov 25 22:06:16 ovirt-0 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.7" x-pid="1509" x-info="http://www.rsyslog.com"] start
Nov 25 22:06:06 ovirt-0 journal: Runtime journal is using 8.0M (max
allowed 4.0G, trying to leave 4.0G free of 125.8G available → current
limit 4.0G).
Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuset
Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpu
Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuacct
Any other suggestions?
Thanks,
> Nir
-derek
--
Derek Atkins 617-623-3745
derek at ihtfp.com www.ihtfp.com
Computer and Internet Security Consultant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.log.15.xz
Type: application/x-xz
Size: 1600128 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161126/1e938a06/attachment-0002.xz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.log.16.xz
Type: application/x-xz
Size: 1798380 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161126/1e938a06/attachment-0003.xz>
More information about the Users
mailing list