On Wed, Jan 11, 2017 at 7:35 PM, Mark Greenall
<m.greenall(a)iontrading.com> wrote:
Hi Ovirt Champions,
I am pulling my hair out and in need of advice / help.
Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
Stoage: Dell Equallogic (Firmware V8.1.4)
OS: Centos 7.3 (although the same thing happens on 7.2)
Ovirt: 4.0.6.3-1 (although also happens on 4.0.5)
I can’t exactly pinpoint when this started happening but it’s certainly been
happening with Ovirt 4.0.5 and CentOS 7.2. Today I updated Hosted Engine and
one host to 4.0.6 and CentOS 7.3 but we still see the same problem. Our
hosts are connected to Dell iSCSI Eqallogic storage. We have one storage
domain defined per VM guest, so do have quite a few LUN’s presented to the
cluster (around 45 in total).
Problem Description:
1) Reboot a host.
2) Activate a host in Ovirt Admin Gui.
3) A few minutes later host is shown as activated.
4) Approx 10-15 mins later host goes offline complaining that it can’t
connect to storage.
5) Constantly then loops around (activating, non operational,
connecting, initialising) and the host ends up with a high CPU load and
large number of lvm commands in the process tree.
6) Multipath and iscsi show all storage is available and logged in.
7) Equallogic shows host connected and no errors.
8) Admin GUI ends up saying the host can’t connect to storage
‘UNKNOWN’.
The strange thing is that every now and again step 5 doesn’t happen and the
host will actually activate again and then stays up. However, it still
takes step 4 to take the host offline first.
Expected Behaviour:
1) Reboot a host.
2) Activate a host in Ovirt Admin Gui.
3) A few minutes later host is shown as activated.
4) Begin using host with confidence.
I’ve attached the engine.log from Hosted Engine and vdsm.log from the host.
The following is a timeline of the latest event.
Host Activation : 15:07
Host Up: 15:10
Non-Operational: 15:17
Seriously hoping someone can spot something obvious as this is making the
clusters somewhat unstable and unreliable.
Can you share /var/log/messages and /var/log/sanlock.log?
Nir