[ovirt-users] MoM is failing!!!

Piotr Kliczewski piotr.kliczewski at gmail.com
Mon Oct 16 14:13:53 UTC 2017


Erekle,

In the logs you provided I see:

IOError: [Errno 5] _handleRequests._checkForMail - Could not read
mailbox: /rhev/data-center/6d52512e-1c02-4509-880a-bf57cbad4bdf/mastersd/dom_md/inbox

and

StorageDomainMasterError: Error validating master storage domain: ('MD
read error',)

which seems to be cause for vdsm being killed by sanlock which caused
connection reset by peer.

After vdsm restart storage looks good.

@Nir can you take a look?

Thanks,
Piotr

On Mon, Oct 16, 2017 at 3:59 PM, Erekle Magradze
<erekle.magradze at recogizer.de> wrote:
> Hi,
>
> The issue is the following, after installation of ovirt 4.1 on three nodes
> with glusterFS as a storage, oVirt engine reported the failed events, with
> the following message
>
> VDSM hostname command GetStatsVDS failed: Connection reset by peer
>
> after that oVirt was trying to fence the affected host and it was excluded
> from production, luckily I am not running any VMs on it yet.
>
> The logs are attached, don't be surprised with the hostnames :)
>
> Thanks in advance
>
> Cheers
>
> Erekle
>
>
> On 10/16/2017 03:37 PM, Dafna Ron wrote:
>
> Hi,
>
> Can you please tell us what is the issue that you are actually facing? :) it
> would be easier to debug an issue and not an error message that can be cause
> by several things.
>
> Also, can you provide the engine and the vdsm logs?
>
> thank you,
> Dafna
>
>
> On 10/16/2017 02:30 PM, Erekle Magradze wrote:
>
> It's was a typo in the failure message,
>
> that's what I was getting:
>
> VDSM hostname command GetStatsVDS failed: Connection reset by peer
>
>
> On 10/16/2017 03:21 PM, Erekle Magradze wrote:
>
> Hi,
>
> It's getting clear now, indeed momd service is disabled
>
> ● momd.service - Memory Overcommitment Manager Daemon
>    Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor
> preset: disabled)
>    Active: inactive (dead)
>
> mom-vdsm is enable and running.
>
> ● mom-vdsm.service - MOM instance configured for VDSM purposes
>    Loaded: loaded (/usr/lib/systemd/system/mom-vdsm.service; enabled; vendor
> preset: enabled)
>    Active: active (running) since Mon 2017-10-16 15:14:35 CEST; 1min 3s ago
>  Main PID: 27638 (python)
>    CGroup: /system.slice/mom-vdsm.service
>            └─27638 python /usr/sbin/momd -c /etc/vdsm/mom.conf
>
> The reason why I came up with digging in mom problems is the following
> problem
>
>
> VDSM hostname command GetStatsVDSThanks failed: Connection reset by peer
>
> that is causing fencing of the node where the failure is happening, what
> could be the reason of GetStatsVDS failure?
>
> Best Regards
> Erekle
>
>
> On 10/16/2017 03:11 PM, Martin Sivak wrote:
>
> Hi,
>
> how do you start MOM? MOM is supposed to talk to vdsm, we do not talk
> to libvirt directly. The line you posted comes from vdsm and vdsm is
> telling you it can't talk to MOM.
>
> Which MOM service is enabled? Because there are two momd and mom-vdsm,
> the second one is the one that should be enabled.
>
> Best regards
>
> Martin Sivak
>
>
> On Mon, Oct 16, 2017 at 3:04 PM, Erekle Magradze
> <erekle.magradze at recogizer.de> wrote:
>
> Hi Martin,
>
> Thanks for the answer, unfortunately this warning message persists, does it
> mean that mom cannot communicate with libvirt? how critical is it?
>
> Best
>
> Erekle
>
>
>
> On 10/16/2017 03:03 PM, Martin Sivak wrote:
>
> Hi,
>
> it is just a warning, there is nothing you have to solve unless it
> does not resolve itself within a minute or so. If it happens only once
> or twice after vdsm or mom restart then you are fine.
>
> Best regards
>
> --
> Martin Sivak
> SLA / oVirt
>
> On Mon, Oct 16, 2017 at 2:44 PM, Erekle Magradze
> <erekle.magradze at recogizer.de> wrote:
>
> Hi,
>
> after running
>
> systemctl status vdsm I am getting that it's running and this message at
> the
> end.
>
> Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
> available.
> Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
> available,
> KSM stats will be missing.
> Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was deprecated
> in
> favor of ping2 and confirmConnectivity
>
> how critical it is? and how to solve that warning?
>
> I am using libvirt
>
> Cheers
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
> --
> Recogizer Group GmbH
>
> Dr.rer.nat. Erekle Magradze
> Lead Big Data Engineering & DevOps
> Rheinwerkallee 2, 53227 Bonn
> Tel: +49 228 29974555
>
> E-Mail erekle.magradze at recogizer.de
> Web: www.recogizer.com
>
> Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/
> Folgen Sie uns auf Twitter https://twitter.com/recogizer
>
> -----------------------------------------------------------------
> Recogizer Group GmbH
> Geschäftsführer: Oliver Habisch, Carsten Kreutze
> Handelsregister: Amtsgericht Bonn HRB 20724
> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich
> erhalten haben,
> informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der
> darin enthaltenen Informationen ist nicht gestattet.
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> --
> Recogizer Group GmbH
>
> Dr.rer.nat. Erekle Magradze
> Lead Big Data Engineering & DevOps
> Rheinwerkallee 2, 53227 Bonn
> Tel: +49 228 29974555
>
> E-Mail erekle.magradze at recogizer.de
> Web: www.recogizer.com
>
> Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/
> Folgen Sie uns auf Twitter https://twitter.com/recogizer
>
> -----------------------------------------------------------------
> Recogizer Group GmbH
> Geschäftsführer: Oliver Habisch, Carsten Kreutze
> Handelsregister: Amtsgericht Bonn HRB 20724
> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich
> erhalten haben,
> informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der
> darin enthaltenen Informationen ist nicht gestattet.
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>


More information about the Users mailing list