I have oVirt 3.1 with two nodes, one of them is running good without
restarting the other keep restarting, the maximum uptime that the server
can get is 10 days before it restart, I think that it might be something
related to the disk.
FYI, the disk are 2 disk of 1TB (RAID-0) to get 2TB.
# /var/log/messages
Dec 21 05:28:44 hypervisor01a ntpd[945]: 0.0.0.0 c61c 0c clock_step
+17997.588918 s
Dec 21 05:28:44 hypervisor01a ntpd[945]: 0.0.0.0 c614 04 freq_mode
Dec 21 05:28:45 hypervisor01a kdump: No crashkernel parameter specified for
running kernel
Dec 21 05:28:45 hypervisor01a kdumpctl[1366]: Starting kdump:
Dec 21 05:28:45 hypervisor01a kdump: failed to start up
Dec 21 05:28:45 hypervisor01a systemd[1]: kdump.service: main process
exited, code=exited, status=1
Dec 21 05:28:45 hypervisor01a systemd[1]: Unit kdump.service entered failed
state.
Dec 21 05:28:45 hypervisor01a systemd[1]: Startup finished in 888ms 157us
(kernel) + 2s 521ms 289us (initrd) + 15s 577ms 672us (userspace) = 18s
987ms 118us.
Dec 21 05:28:45 hypervisor01a ntpd[945]: 0.0.0.0 c618 08 no_sys_peer
Dec 21 05:29:04 hypervisor01a vdsm TaskManager.Task ERROR
Task=`5f51ff52-f9a4-4854-a41d-d5d33c872458`::Unexpected error
Dec 21 05:29:04 hypervisor01a vdsm Storage.Dispatcher.Protect ERROR
{'status': {'message': "Unknown pool id, pool not connected:
('dbb49db6-9a24-4395-a8bd-c9f222eaecab',)", 'code': 309}}
Dec 21 05:29:04 hypervisor01a vdsm TaskManager.Task ERROR
Task=`7b0cf3b0-6d26-4421-a221-29f2ecaaeb1f`::Unexpected error
Dec 21 05:29:04 hypervisor01a vdsm Storage.Dispatcher.Protect ERROR
{'status': {'message': "Unknown pool id, pool not connected:
('dbb49db6-9a24-4395-a8bd-c9f222eaecab',)", 'code': 309}}
Dec 21 05:29:04 hypervisor01a kernel: [ 37.944421] ata1: hard resetting
link
Dec 21 05:29:04 hypervisor01a kernel: [ 38.247979] ata1: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:04 hypervisor01a kernel: [ 38.248802] ata1.00: configured
for UDMA/133
Dec 21 05:29:04 hypervisor01a kernel: [ 38.248807] ata1: EH complete
Dec 21 05:29:04 hypervisor01a kernel: [ 38.249013] ata2: hard resetting
link
Dec 21 05:29:04 hypervisor01a kernel: [ 38.553112] ata2: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:04 hypervisor01a kernel: [ 38.553881] ata2.00: configured
for UDMA/133
Dec 21 05:29:04 hypervisor01a kernel: [ 38.553886] ata2: EH complete
Dec 21 05:29:04 hypervisor01a kernel: [ 38.554064] ata3: hard resetting
link
Dec 21 05:29:05 hypervisor01a kernel: [ 38.858275] ata3: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:05 hypervisor01a kernel: [ 38.861154] ata3.00: configured
for UDMA/133
Dec 21 05:29:05 hypervisor01a kernel: [ 38.861159] ata3: EH complete
Dec 21 05:29:05 hypervisor01a kernel: [ 38.861352] ata4: hard resetting
link
Dec 21 05:29:05 hypervisor01a kernel: [ 39.165397] ata4: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:05 hypervisor01a kernel: [ 39.168223] ata4.00: configured
for UDMA/133
Dec 21 05:29:05 hypervisor01a kernel: [ 39.168229] ata4: EH complete
Dec 21 05:29:05 hypervisor01a kernel: [ 39.168421] ata5: hard resetting
link
Dec 21 05:29:05 hypervisor01a kernel: [ 39.472459] ata5: SATA link up 1.5
Gbps (SStatus 113 SControl 300)
Dec 21 05:29:05 hypervisor01a kernel: [ 39.480040] ata5.00: configured
for UDMA/100
Dec 21 05:29:05 hypervisor01a kernel: [ 39.485478] ata5: EH complete
Dec 21 05:29:05 hypervisor01a kernel: [ 39.485642] ata6: limiting SATA
link speed to 1.5 Gbps
Dec 21 05:29:05 hypervisor01a kernel: [ 39.485647] ata6: hard resetting
link
Dec 21 05:29:06 hypervisor01a kernel: [ 39.790610] ata6: SATA link down
(SStatus 0 SControl 310)
# RAID-0
mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Sun Nov 18 14:47:15 2012
Raid Level : raid0
Array Size : 1953524736 (1863.03 GiB 2000.41 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Nov 18 14:47:15 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : hypervisor01-a:0 (local to host hypervisor01-a)
UUID : 9eb1324d:57eed46d:c23ae815:0666e238
Events : 0
Number Major Minor RaidDevice State
0 253 2 0 active sync /dev/dm-2
1 253 3 1 active sync /dev/dm-3