[ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage
Nir Soffer
nsoffer at redhat.com
Wed Jan 11 19:39:16 UTC 2017
On Wed, Jan 11, 2017 at 9:23 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> On Wed, Jan 11, 2017 at 7:35 PM, Mark Greenall
> <m.greenall at iontrading.com> wrote:
>> Hi Ovirt Champions,
>>
>>
>>
>> I am pulling my hair out and in need of advice / help.
>>
>>
>>
>> Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
>>
>> Stoage: Dell Equallogic (Firmware V8.1.4)
>>
>> OS: Centos 7.3 (although the same thing happens on 7.2)
>>
>> Ovirt: 4.0.6.3-1 (although also happens on 4.0.5)
>>
>>
>>
>> I can’t exactly pinpoint when this started happening but it’s certainly been
>> happening with Ovirt 4.0.5 and CentOS 7.2. Today I updated Hosted Engine and
>> one host to 4.0.6 and CentOS 7.3 but we still see the same problem. Our
>> hosts are connected to Dell iSCSI Eqallogic storage. We have one storage
>> domain defined per VM guest, so do have quite a few LUN’s presented to the
>> cluster (around 45 in total).
>>
>>
>>
>> Problem Description:
>>
>> 1) Reboot a host.
>>
>> 2) Activate a host in Ovirt Admin Gui.
>>
>> 3) A few minutes later host is shown as activated.
>>
>> 4) Approx 10-15 mins later host goes offline complaining that it can’t
>> connect to storage.
>>
>> 5) Constantly then loops around (activating, non operational,
>> connecting, initialising) and the host ends up with a high CPU load and
>> large number of lvm commands in the process tree.
>>
>> 6) Multipath and iscsi show all storage is available and logged in.
>>
>> 7) Equallogic shows host connected and no errors.
>>
>> 8) Admin GUI ends up saying the host can’t connect to storage
>> ‘UNKNOWN’.
>>
>>
>>
>> The strange thing is that every now and again step 5 doesn’t happen and the
>> host will actually activate again and then stays up. However, it still
>> takes step 4 to take the host offline first.
>>
>>
>>
>> Expected Behaviour:
>>
>> 1) Reboot a host.
>>
>> 2) Activate a host in Ovirt Admin Gui.
>>
>> 3) A few minutes later host is shown as activated.
>>
>> 4) Begin using host with confidence.
>>
>>
>>
>> I’ve attached the engine.log from Hosted Engine and vdsm.log from the host.
>> The following is a timeline of the latest event.
>>
>>
>>
>> Host Activation : 15:07
>>
>> Host Up: 15:10
>>
>> Non-Operational: 15:17
>>
>>
>>
>> Seriously hoping someone can spot something obvious as this is making the
>> clusters somewhat unstable and unreliable.
>
> Can you share /var/log/messages and /var/log/sanlock.log?
And /etc/multipath.conf
>
> Nir
More information about the Users
mailing list