then why don’t you handle the connection state as well? isn’t that a
simple fix?
VDSM socket availability during startup is probably the most important
requirement for MOM and the whole service is based around that
assumption. We could handle that differently, but letting the service
crash saves us tons of code (as it should not happen in the first
place). We simply do not need the code that would decide between a
permanent and temporary failure during startup.
XML-RPC was easy as it was stateless (new request for every call).
JSON-RPC is a bit harder as it keeps the socket internally. Does the
client reconnect by itself btw? What happens when we use it after a
socket error?
Martin
On Fri, Nov 18, 2016 at 12:55 PM, Michal Skrivanek
<michal.skrivanek(a)redhat.com> wrote:
>
>> On 18 Nov 2016, at 12:35, Martin Sivak <msivak(a)redhat.com> wrote:
>>
>>> I don't think it is related to version X or Y. It is a race, so might be
>>> related to other factors.
>>
>> It never (seriously: NEVER) happened with xml-rpc before 4.0.5.
>
> that is surprising
> but we also didn’t have lago before;-)
>
>>
>>> likely because json-rpc is initialized after xml-rpc….or indeed whatever
>>> else;-)
>>
>> But this is not about jsonrpc. The socket itself is shared according
>> to what Piotr said.
>
> it is
>
>>
>>> btw you likely still want to have a retry in mom once it
>>> starts responding due to delayed vdsm async recovery taking potentially
>>> minutes
>>
>> We handle this already. The only issue is the connection refused state.
>
then why don’t you handle the connection state as well? isn’t that a
simple fix?
>
>>
>>
>> Martin
>>
>>
>> On Fri, Nov 18, 2016 at 12:19 PM, Michal Skrivanek
>> <michal.skrivanek(a)redhat.com> wrote:
>>>
>>> On 18 Nov 2016, at 12:12, Oved Ourfali <oourfali(a)redhat.com> wrote:
>>>
>>> I don't think it is related to version X or Y. It is a race, so might be
>>> related to other factors.
>>>
>>>
>>> likely because json-rpc is initialized after xml-rpc….or indeed whatever
>>> else;-)
>>>
>>> either way it needs to be solved. Either by improving the systemd service
>>> file or mom retry (btw you likely still want to have a retry in mom once it
>>> starts responding due to delayed vdsm async recovery taking potentially
>>> minutes)
>>>
>>>
>>> On Nov 18, 2016 12:59 PM, "Martin Sivak" <msivak(a)redhat.com>
wrote:
>>>>
>>>>> Are we / can we use systemd socket activation there?
>>>>
>>>> That actually requires systemd specific code iirc (to take over the
>>>> standing by socket). I am actually wondering why the xml-rpc in 4.0.4
>>>> was fine and json-rpc in 4.0.6 is too slow.
>>>>
>>>> Martin
>>>>
>>>> On Fri, Nov 18, 2016 at 11:53 AM, Anton Marchukov
<amarchuk(a)redhat.com>
>>>> wrote:
>>>>> Hello All.
>>>>>
>>>>> Are we / can we use systemd socket activation there?
>>>>>
>>>>> Anton.
>>>>>
>>>>> On Fri, Nov 18, 2016 at 11:21 AM, Martin Sivak
<msivak(a)redhat.com>
>>>>> wrote:
>>>>>>
>>>>>> What about making vdsm ready to answer connections when it
returns to
>>>>>> systemd instead? I hate workarounds and this always worked fine.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> On Fri, Nov 18, 2016 at 11:13 AM, Oved Ourfali
<oourfali(a)redhat.com>
>>>>>> wrote:
>>>>>>> Seems like a race regardless of the protocol.
>>>>>>> Should you add a retry?
>>>>>>>
>>>>>>>
>>>>>>> On Nov 18, 2016 11:52 AM, "Martin Sivak"
<msivak(a)redhat.com> wrote:
>>>>>>>>
>>>>>>>> Yes, because VDSM is supposed to be up (there is systemd
>>>>>>>> dependency).
>>>>>>>> This always worked fine with xml-rpc.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> On Fri, Nov 18, 2016 at 10:14 AM, Nir Soffer
<nsoffer(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>>> On Fri, Nov 18, 2016 at 10:45 AM, Martin Sivak
<msivak(a)redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>> This happens because MOM can't connect to
VDSM and so it quits.
>>>>>>>>>
>>>>>>>>> So mom try once to connect and if the connection
fails it quits?
>>>>>>>>>
>>>>>>>>>> We
>>>>>>>>>> discussed it on the mailinglist
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
https://lists.fedoraproject.org/archives/list/vdsm-devel@lists.fedorahost...
>>>>>>>>>>
http://lists.ovirt.org/pipermail/devel/2016-November/014101.html
>>>>>>>>>>
>>>>>>>>>> This issue never happened with XML-RPC.
>>>>>>>>>>
>>>>>>>>>> Shira reported it as
>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1393012
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 17, 2016 at 7:42 PM, Yaniv Kaul
<ykaul(a)redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> I've recently seen, including now on
Master, the following
>>>>>>>>>>> warnings:
>>>>>>>>>>> Nov 17 13:33:25 lago-basic-suite-master-host0
systemd[1]:
>>>>>>>>>>> Started
>>>>>>>>>>> MOM
>>>>>>>>>>> instance configured for VDSM purposes.
>>>>>>>>>>> Nov 17 13:33:25 lago-basic-suite-master-host0
systemd[1]:
>>>>>>>>>>> Starting
>>>>>>>>>>> MOM
>>>>>>>>>>> instance configured for VDSM purposes...
>>>>>>>>>>> Nov 17 13:33:35 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available, Policy could not be set.
>>>>>>>>>>> Nov 17 13:33:39 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available.
>>>>>>>>>>> Nov 17 13:33:39 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available, KSM stats will be missing.
>>>>>>>>>>> Nov 17 13:33:55 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available.
>>>>>>>>>>> Nov 17 13:33:55 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available, KSM stats will be missing.
>>>>>>>>>>> Nov 17 13:34:10 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available.
>>>>>>>>>>> Nov 17 13:34:10 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available, KSM stats will be missing.
>>>>>>>>>>> Nov 17 13:34:26 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available.
>>>>>>>>>>> Nov 17 13:34:26 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available, KSM stats will be missing.
>>>>>>>>>>> Nov 17 13:34:42 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available.
>>>>>>>>>>> Nov 17 13:34:42 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available, KSM stats will be missing.
>>>>>>>>>>> Nov 17 13:34:57 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available.
>>>>>>>>>>> Nov 17 13:34:57 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available, KSM stats will be missing.
>>>>>>>>>>> Nov 17 13:35:12 lago-basic-suite-master-host0
vdsm[2012]: vdsm
>>>>>>>>>>> MOM
>>>>>>>>>>> WARN MOM
>>>>>>>>>>> not available.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any ideas what this is and why?
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> Devel mailing list
>>>>>>>>>>> Devel(a)ovirt.org
>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Devel mailing list
>>>>>>>>>> Devel(a)ovirt.org
>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>>>>>> _______________________________________________
>>>>>>>> Devel mailing list
>>>>>>>> Devel(a)ovirt.org
>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Devel mailing list
>>>>>> Devel(a)ovirt.org
>>>>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Anton Marchukov
>>>>> Senior Software Engineer - RHEV CI - Red Hat
>>>>>
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>>
>