
Hi, On Sat, November 26, 2016 2:11 pm, Nir Soffer wrote:
On Sat, Nov 26, 2016 at 7:02 PM, Derek Atkins <derek@ihtfp.com> wrote:
Hi, Last night at approximately 22:03 (give or take a few minutes) my single-server ovirt system rebooted. I saw a root login (presumably from the engine) and then the host rebooted.
I don't see anything specific in the logs that would indicate WHY it rebooted.
Any suggestions for where to look or what to look for? I had updated the host yesterday to 4.0.5 and did reboot the system myself several times. The good news is that I succeeded in getting my scripts working so all my VMs came back. But before I go and deploy anything mission critical I'd like to determine why the system decided it should reboot.
Suggestions?
One reason for reboot is loosing access to storage on the spm host. sanlock would terminate vdsm, and if it could not terminate vdsm it will reboot the host.
To check for this you can look into /var/log/sanlock.log
Nothing there except the restart (which happened between 22:03 and 22:06): 2016-11-25 15:37:45-0500 730 [5654]: s3:r3 resource efc01877-2748-4f50-8165-976ed152feea:SDM:/rhev/data-center/mnt/ovirt-0.dom.ain:_ovirt_data_data/efc01877-2748-4f50-8165-976ed152feea/dom_md/leases:1048576 for 2,15,3578 2016-11-25 22:06:16-0500 28 [1556]: sanlock daemon started 3.2.4 host c24b508e-8808-405b-bebd-0ba9e28cdf37.ovirt-0.ih
Another reason may be engine trying to do soft-fencing because it cannot restart vdsm. Looking in vdsm log you can find if vdsm was restarted, and in engine log you should find logs about this.
I did not see anything in the engine log, but then again I'm not sure what to look for. I looked for "reboot", "restart", and "kill", but the only stuff that appeared in the engine logs are the times earlier in the day when I actually rebooted it myself. Here's the exact log entries across the reboot time: 2016-11-25 21:38:52,237 INFO [org.ovirt.engine.core.bll.storage.domain.IsoDomainListSyncronizer] (org.ovirt.thread.pool-6-thread-31) [e463ce4] Finished automatic refresh process for 'ISO' file type with success, for storage domain id 'fd2a26a5-2851-4495-891f-cb7a2f069411'. 2016-11-25 22:13:56,891 INFO [org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread Pool -- 42) [] Loaded file '/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf'. The reboot at 22:03 was spurious (I know it wasn't me -- I'd been in bed for an hour). I've attached the VDSM logs for the hour before/after the reboot occurred in case there is something in there that might help. I do see one instance of "reboot" in the vdsm log: <on_poweroff>destroy</on_poweroff><on_reboot>destroy</on_reboot><on_crash>destroy</on_crash></domain> but this is from after the reboot (c.f. log15) not before.
Are you sure you don't have any cron job or some other code that is configured reboot the host?
Absolutely! I never set up anything to reboot the system. I didn't see anything in /var/log/messages or /var/log/secure that would indicate anything, either. Here's the /var/log/messages for across the reboot: Nov 25 22:03:07 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree:238982 Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:13 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Nov 25 22:03:15 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Nov 25 22:03:15 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Nov 25 22:03:15 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree: 238979 Nov 25 22:03:19 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Nov 25 22:03:19 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Nov 25 22:03:20 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain Nov 25 22:03:20 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Nov 25 22:03:20 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Nov 25 22:03:20 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Nov 25 22:06:16 ovirt-0 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="1509" x-info="http://www.rsyslog.com"] start Nov 25 22:06:06 ovirt-0 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 125.8G available → current limit 4.0G). Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuset Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpu Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuacct Any other suggestions? Thanks,
Nir
-derek -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant