[ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

Yaniv Kaul ykaul at redhat.com
Wed Jan 11 19:16:24 UTC 2017


On Wed, Jan 11, 2017 at 7:35 PM, Mark Greenall <m.greenall at iontrading.com>
wrote:

> Hi Ovirt Champions,
>
>
>
> I am pulling my hair out and in need of advice / help.
>
>
>
> Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
>
> Stoage: Dell Equallogic (Firmware V8.1.4)
>
> OS: Centos 7.3 (although the same thing happens on 7.2)
>
> Ovirt: 4.0.6.3-1 (although also happens on 4.0.5)
>
>
>
> I can’t exactly pinpoint when this started happening but it’s certainly
> been happening with Ovirt 4.0.5 and CentOS 7.2. Today I updated Hosted
> Engine and one host to 4.0.6 and CentOS 7.3 but we still see the same
> problem. Our hosts are connected to Dell iSCSI Eqallogic storage. We have
> one storage domain defined per VM guest, so do have quite a few LUN’s
> presented to the cluster (around 45 in total).
>

Why do you have 1 SD per VM?

Can you try and disable (mask) the lvmetad service on the hosts and see if
it improves matters?
Also /var/log/messages from the host may give us some clues.
TIA,
Y.


>
>
> Problem Description:
>
> 1)      Reboot a host.
>
> 2)      Activate a host in Ovirt Admin Gui.
>
> 3)      A few minutes later host is shown as activated.
>
> 4)      Approx 10-15 mins later host goes offline complaining that it
> can’t connect to storage.
>
> 5)      Constantly then loops around (activating, non operational,
> connecting, initialising) and the host ends up with a high CPU load and
> large number of lvm commands in the process tree.
>
> 6)      Multipath and iscsi show all storage is available and logged in.
>
> 7)      Equallogic shows host connected and no errors.
>
> 8)      Admin GUI ends up saying the host can’t connect to storage
> ‘UNKNOWN’.
>
>
>
> The strange thing is that every now and again step 5 doesn’t happen and
> the host will actually activate again and then stays up.  However, it still
> takes step 4 to take the host offline first.
>
>
>
> Expected Behaviour:
>
> 1)      Reboot a host.
>
> 2)      Activate a host in Ovirt Admin Gui.
>
> 3)      A few minutes later host is shown as activated.
>
> 4)      Begin using host with confidence.
>
>
>
> I’ve attached the engine.log from Hosted Engine and vdsm.log from the
> host. The following is a timeline of the latest event.
>
>
>
> Host Activation : 15:07
>
> Host Up: 15:10
>
> Non-Operational: 15:17
>
>
>
> Seriously hoping someone can spot something obvious as this is making the
> clusters somewhat unstable and unreliable.
>
>
>
> Many Thanks,
>
> Mark
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170111/41e930cc/attachment.html>


More information about the Users mailing list