Le 29/01/2014 13:36, Itamar Heim a écrit :
On 01/29/2014 02:35 PM, Nicolas Ecarnot wrote:
> Le 29/01/2014 13:29, Maor Lipchuk a écrit :
>> Hi Nicolas,
>>
>> Can u please attach the VDSM logs of the problematic nodes and valid
>> nodes, the engine log and also the sanlock log.
>>
>> You wrote that many nodes suddenly began to become
>> unresponsive,
>> Do you mean that the hosts switched to non-responsive status in the
>> engine?
>> I'm asking that because non-responsive status indicate that the engine
>> could not communicate with the hosts, it could be related to sanlock
>> since if the host encountered a problem to write to the master domain it
>> causes sanlock to restart VDSM and make the hosts non responsive.
non-resposneive for engine is if vdsm is up/responsive.
run locally;
# vdsClient -s 0 getVdsCaps
to check vdsm is ok
When I find the time for it, I'll reproduce the crash and run this
command and let you know.
I must admit this was scary.
--
Nicolas Ecarnot