
------=_Part_0_1480179727848 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: base64 Content-Disposition: inline SGksCkxhc3QgbmlnaHQgYXQgYXBwcm94aW1hdGVseSAyMjowMyAoZ2l2ZSBvciB0YWtlIGEgZmV3 IG1pbnV0ZXMpIG15IHNpbmdsZS1zZXJ2ZXIgb3ZpcnQgc3lzdGVtIHJlYm9vdGVkLiAgSSBzYXcg YSByb290IGxvZ2luIChwcmVzdW1hYmx5IGZyb20gdGhlIGVuZ2luZSkgYW5kIHRoZW4gdGhlIGhv c3QgcmVib290ZWQuCgpJIGRvbid0IHNlZSBhbnl0aGluZyBzcGVjaWZpYyBpbiB0aGUgbG9ncyB0 aGF0IHdvdWxkIGluZGljYXRlIFdIWSBpdCByZWJvb3RlZC4KCkFueSBzdWdnZXN0aW9ucyBmb3Ig d2hlcmUgdG8gbG9vayBvciB3aGF0IHRvIGxvb2sgZm9yPyAgSSBoYWQgdXBkYXRlZCB0aGUgaG9z dCB5ZXN0ZXJkYXkgdG8gNC4wLjUgYW5kIGRpZCByZWJvb3QgdGhlIHN5c3RlbSBteXNlbGYgc2V2 ZXJhbCB0aW1lcy4gIFRoZSBnb29kIG5ld3MgaXMgdGhhdCBJIHN1Y2NlZWRlZCBpbiBnZXR0aW5n IG15IHNjcmlwdHMgd29ya2luZyBzbyBhbGwgbXkgVk1zIGNhbWUgYmFjay4gIEJ1dCBiZWZvcmUg SSBnbyBhbmQgZGVwbG95IGFueXRoaW5nIG1pc3Npb24gY3JpdGljYWwgSSdkIGxpa2UgdG8gZGV0 ZXJtaW5lIHdoeSB0aGUgc3lzdGVtIGRlY2lkZWQgaXQgc2hvdWxkIHJlYm9vdC4KClN1Z2dlc3Rp b25zPwoKLWRlcmVrCgpTZW50IGZyb20gbXkgbW9iaWxlIGRldmljZS4gUGxlYXNlIGV4Y3VzZSBh bnkgdHlwb3MuCgo= ------=_Part_0_1480179727848 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: base64 Content-Disposition: inline PGRpdiBzdHlsZT0iZm9udC1mYW1pbHk6ICdDYWxpYnJpJywgJ3NhbnMtc2VyaWYnOyI+PGRpdiBk aXI9Imx0ciI+CjxkaXYgZGlyPSJsdHIiPkhpLDwvZGl2PjxkaXYgZGlyPSJsdHIiPkxhc3Qgbmln aHQgYXQgYXBwcm94aW1hdGVseSAyMjowMyAoZ2l2ZSBvciB0YWtlIGEgZmV3IG1pbnV0ZXMpIG15 IHNpbmdsZS1zZXJ2ZXIgb3ZpcnQmbmJzcDtzeXN0ZW0gcmVib290ZWQuICZuYnNwO0kgc2F3IGEg cm9vdCBsb2dpbiAocHJlc3VtYWJseSBmcm9tIHRoZSBlbmdpbmUpIGFuZCB0aGVuIHRoZSBob3N0 IHJlYm9vdGVkLjwvZGl2PjxkaXYgZGlyPSJsdHIiPjxicj48L2Rpdj48ZGl2IGRpcj0ibHRyIj5J IGRvbid0IHNlZSBhbnl0aGluZyBzcGVjaWZpYyBpbiB0aGUgbG9ncyB0aGF0IHdvdWxkIGluZGlj YXRlIFdIWSBpdCByZWJvb3RlZC48L2Rpdj48ZGl2IGRpcj0ibHRyIj48YnI+PC9kaXY+PGRpdiBk aXI9Imx0ciI+QW55IHN1Z2dlc3Rpb25zIGZvciB3aGVyZSB0byBsb29rIG9yIHdoYXQgdG8gbG9v ayBmb3I/ICZuYnNwO0kgaGFkIHVwZGF0ZWQgdGhlIGhvc3QgeWVzdGVyZGF5IHRvIDQuMC41IGFu ZCBkaWQgcmVib290IHRoZSBzeXN0ZW0gbXlzZWxmIHNldmVyYWwgdGltZXMuICZuYnNwO1RoZSBn b29kIG5ld3MgaXMgdGhhdCBJIHN1Y2NlZWRlZCBpbiBnZXR0aW5nIG15IHNjcmlwdHMgd29ya2lu ZyBzbyBhbGwgbXkgVk1zJm5ic3A7Y2FtZSBiYWNrLiAmbmJzcDtCdXQgYmVmb3JlIEkgZ28gYW5k IGRlcGxveSBhbnl0aGluZyBtaXNzaW9uIGNyaXRpY2FsIEknZCBsaWtlIHRvIGRldGVybWluZSB3 aHkgdGhlIHN5c3RlbSBkZWNpZGVkIGl0IHNob3VsZCByZWJvb3QuPC9kaXY+PGRpdiBkaXI9Imx0 ciI+PGJyPjwvZGl2PjxkaXYgZGlyPSJsdHIiPlN1Z2dlc3Rpb25zPzwvZGl2PjxkaXYgZGlyPSJs dHIiPjxicj48L2Rpdj48ZGl2IGRpcj0ibHRyIj48ZGl2IGRpcj0ibHRyIj4tZGVyZWs8L2Rpdj48 ZGl2IGRpcj0ibHRyIj48YnI+PC9kaXY+PGRpdiBkaXI9Imx0ciI+U2VudCBmcm9tIG15IG1vYmls ZSBkZXZpY2UuIFBsZWFzZSBleGN1c2UgYW55IHR5cG9zLjwvZGl2PjwvZGl2PgoKPC9kaXY+PC9k aXY+PGJyPg== ------=_Part_0_1480179727848--

On Sat, Nov 26, 2016 at 7:02 PM, Derek Atkins <derek@ihtfp.com> wrote:
Hi, Last night at approximately 22:03 (give or take a few minutes) my single-server ovirt system rebooted. I saw a root login (presumably from the engine) and then the host rebooted.
I don't see anything specific in the logs that would indicate WHY it rebooted.
Any suggestions for where to look or what to look for? I had updated the host yesterday to 4.0.5 and did reboot the system myself several times. The good news is that I succeeded in getting my scripts working so all my VMs came back. But before I go and deploy anything mission critical I'd like to determine why the system decided it should reboot.
Suggestions?
One reason for reboot is loosing access to storage on the spm host. sanlock would terminate vdsm, and if it could not terminate vdsm it will reboot the host. To check for this you can look into /var/log/sanlock.log Another reason may be engine trying to do soft-fencing because it cannot restart vdsm. Looking in vdsm log you can find if vdsm was restarted, and in engine log you should find logs about this. Are you sure you don't have any cron job or some other code that is configured reboot the host? Nir

Hi, On Sat, November 26, 2016 2:11 pm, Nir Soffer wrote:
On Sat, Nov 26, 2016 at 7:02 PM, Derek Atkins <derek@ihtfp.com> wrote:
Hi, Last night at approximately 22:03 (give or take a few minutes) my single-server ovirt system rebooted. I saw a root login (presumably from the engine) and then the host rebooted.
I don't see anything specific in the logs that would indicate WHY it rebooted.
Any suggestions for where to look or what to look for? I had updated the host yesterday to 4.0.5 and did reboot the system myself several times. The good news is that I succeeded in getting my scripts working so all my VMs came back. But before I go and deploy anything mission critical I'd like to determine why the system decided it should reboot.
Suggestions?
One reason for reboot is loosing access to storage on the spm host. sanlock would terminate vdsm, and if it could not terminate vdsm it will reboot the host.
To check for this you can look into /var/log/sanlock.log
Nothing there except the restart (which happened between 22:03 and 22:06): 2016-11-25 15:37:45-0500 730 [5654]: s3:r3 resource efc01877-2748-4f50-8165-976ed152feea:SDM:/rhev/data-center/mnt/ovirt-0.dom.ain:_ovirt_data_data/efc01877-2748-4f50-8165-976ed152feea/dom_md/leases:1048576 for 2,15,3578 2016-11-25 22:06:16-0500 28 [1556]: sanlock daemon started 3.2.4 host c24b508e-8808-405b-bebd-0ba9e28cdf37.ovirt-0.ih
Another reason may be engine trying to do soft-fencing because it cannot restart vdsm. Looking in vdsm log you can find if vdsm was restarted, and in engine log you should find logs about this.
I did not see anything in the engine log, but then again I'm not sure what to look for. I looked for "reboot", "restart", and "kill", but the only stuff that appeared in the engine logs are the times earlier in the day when I actually rebooted it myself. Here's the exact log entries across the reboot time: 2016-11-25 21:38:52,237 INFO [org.ovirt.engine.core.bll.storage.domain.IsoDomainListSyncronizer] (org.ovirt.thread.pool-6-thread-31) [e463ce4] Finished automatic refresh process for 'ISO' file type with success, for storage domain id 'fd2a26a5-2851-4495-891f-cb7a2f069411'. 2016-11-25 22:13:56,891 INFO [org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread Pool -- 42) [] Loaded file '/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf'. The reboot at 22:03 was spurious (I know it wasn't me -- I'd been in bed for an hour). I've attached the VDSM logs for the hour before/after the reboot occurred in case there is something in there that might help. I do see one instance of "reboot" in the vdsm log: <on_poweroff>destroy</on_poweroff><on_reboot>destroy</on_reboot><on_crash>destroy</on_crash></domain> but this is from after the reboot (c.f. log15) not before.
Are you sure you don't have any cron job or some other code that is configured reboot the host?
Absolutely! I never set up anything to reboot the system. I didn't see anything in /var/log/messages or /var/log/secure that would indicate anything, either. Here's the /var/log/messages for across the reboot: Nov 25 22:03:07 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree:238982 Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:11 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:13 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Nov 25 22:03:15 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Nov 25 22:03:15 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Nov 25 22:03:15 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Nov 25 22:03:17 ovirt-0 ovirt-ha-broker: INFO:mem_free.MemFree:memFree: 238979 Nov 25 22:03:19 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Nov 25 22:03:19 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Nov 25 22:03:20 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain Nov 25 22:03:20 ovirt-0 journal: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Nov 25 22:03:20 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Nov 25 22:03:20 ovirt-0 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Nov 25 22:06:16 ovirt-0 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="1509" x-info="http://www.rsyslog.com"] start Nov 25 22:06:06 ovirt-0 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 125.8G available → current limit 4.0G). Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuset Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpu Nov 25 22:06:06 ovirt-0 kernel: Initializing cgroup subsys cpuacct Any other suggestions? Thanks,
Nir
-derek -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant
participants (2)
-
Derek Atkins
-
Nir Soffer