But even if vdsm did this, mom still needs a retry mechanism.
No it doesn't. Systemd handles that just fine (I might increase the
retry count in the service file though).
I rather like the supervised way of handling errors in "let it crash"
projects. It simplifies the code tremendously. MOM does not have
almost any state and plain restart is the right thing to do. You might
be surprised how nice this approach can be (Erlang is based around
supervisors and crashing on error - and is used to achieve couple of
nines in telco).
Martin
On Fri, Nov 18, 2016 at 12:14 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
> On Fri, Nov 18, 2016 at 12:21 PM, Martin Sivak <msivak(a)redhat.com> wrote:
>> What about making vdsm ready to answer connections when it returns to
>> systemd instead? I hate workarounds and this always worked fine.
>
> This is clearly a mom bug. Mom must have retry mechanism when and do not
> expect that vdsm is ready to accept connections when mom starts.
>
> Vdsm can be nicer and notify systemd when vdsm is ready. I already mentioned it
> in the other thread here:
>
http://lists.ovirt.org/pipermail/devel/2016-November/014104.html
>
But even if vdsm did this, mom still needs a retry mechanism.
>
> Nir
>
>>
>> Martin
>>
>> On Fri, Nov 18, 2016 at 11:13 AM, Oved Ourfali <oourfali(a)redhat.com>
wrote:
>>> Seems like a race regardless of the protocol.
>>> Should you add a retry?
>>>
>>>
>>> On Nov 18, 2016 11:52 AM, "Martin Sivak" <msivak(a)redhat.com>
wrote:
>>>>
>>>> Yes, because VDSM is supposed to be up (there is systemd dependency).
>>>> This always worked fine with xml-rpc.
>>>>
>>>> Martin
>>>>
>>>> On Fri, Nov 18, 2016 at 10:14 AM, Nir Soffer <nsoffer(a)redhat.com>
wrote:
>>>> > On Fri, Nov 18, 2016 at 10:45 AM, Martin Sivak
<msivak(a)redhat.com>
>>>> > wrote:
>>>> >> This happens because MOM can't connect to VDSM and so it
quits.
>>>> >
>>>> > So mom try once to connect and if the connection fails it quits?
>>>> >
>>>> >> We
>>>> >> discussed it on the mailinglist
>>>> >>
>>>> >>
>>>> >>
https://lists.fedoraproject.org/archives/list/vdsm-devel@lists.fedorahost...
>>>> >>
http://lists.ovirt.org/pipermail/devel/2016-November/014101.html
>>>> >>
>>>> >> This issue never happened with XML-RPC.
>>>> >>
>>>> >> Shira reported it as
>>>> >>
https://bugzilla.redhat.com/show_bug.cgi?id=1393012
>>>> >>
>>>> >> Martin
>>>> >>
>>>> >> On Thu, Nov 17, 2016 at 7:42 PM, Yaniv Kaul
<ykaul(a)redhat.com> wrote:
>>>> >>> I've recently seen, including now on Master, the
following warnings:
>>>> >>> Nov 17 13:33:25 lago-basic-suite-master-host0 systemd[1]:
Started MOM
>>>> >>> instance configured for VDSM purposes.
>>>> >>> Nov 17 13:33:25 lago-basic-suite-master-host0 systemd[1]:
Starting MOM
>>>> >>> instance configured for VDSM purposes...
>>>> >>> Nov 17 13:33:35 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available, Policy could not be set.
>>>> >>> Nov 17 13:33:39 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available.
>>>> >>> Nov 17 13:33:39 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available, KSM stats will be missing.
>>>> >>> Nov 17 13:33:55 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available.
>>>> >>> Nov 17 13:33:55 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available, KSM stats will be missing.
>>>> >>> Nov 17 13:34:10 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available.
>>>> >>> Nov 17 13:34:10 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available, KSM stats will be missing.
>>>> >>> Nov 17 13:34:26 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available.
>>>> >>> Nov 17 13:34:26 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available, KSM stats will be missing.
>>>> >>> Nov 17 13:34:42 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available.
>>>> >>> Nov 17 13:34:42 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available, KSM stats will be missing.
>>>> >>> Nov 17 13:34:57 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available.
>>>> >>> Nov 17 13:34:57 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available, KSM stats will be missing.
>>>> >>> Nov 17 13:35:12 lago-basic-suite-master-host0 vdsm[2012]:
vdsm MOM
>>>> >>> WARN MOM
>>>> >>> not available.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> Any ideas what this is and why?
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> Devel mailing list
>>>> >>> Devel(a)ovirt.org
>>>> >>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>> >> _______________________________________________
>>>> >> Devel mailing list
>>>> >> Devel(a)ovirt.org
>>>> >>
http://lists.ovirt.org/mailman/listinfo/devel
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>>
>>>>
>>>