[Users] hosted engine issues

René Koch rkoch at linuxland.at
Mon Mar 3 16:17:49 UTC 2014


On 03/03/2014 03:25 PM, Yedidyah Bar David wrote:
> ----- Original Message -----
>> From: "René Koch" <rkoch at linuxland.at>
>> To: "Yedidyah Bar David" <didi at redhat.com>, "Martin Sivak" <msivak at redhat.com>
>> Cc: users at ovirt.org
>> Sent: Monday, March 3, 2014 4:10:51 PM
>> Subject: Re: [Users] hosted engine issues
>>
>> On 03/03/2014 02:13 PM, Yedidyah Bar David wrote:
>>>> Me neither. Is everything Read-Write? Read-Only FS might report no space
>>>> left
>>>> as well in some cases. Other than that, I do not know.
>>>
>>> Perhaps some ipc resource? semaphores?
>>>
>>> Please check:
>>>
>>> ipcs
>>>
>>> cat /proc/sys/kernel/sem
>>>
>>> I know nothing about libvirt, that's just a wild guess.
>>
>> # ipcs
>>
>> ------ Shared Memory Segments --------
>> key        shmid      owner      perms      bytes      nattch     status
>>
>> 0x00000000 0          root       644        80         2
>>
>> 0x00000000 32769      root       644        16384      2
>>
>> 0x00000000 65538      root       644        280        2
>>
>>
>> ------ Semaphore Arrays --------
>> key        semid      owner      perms      nsems
>> 0x00000000 0          root       600        1
>> 0x00000000 65537      root       600        1
>> 0x000000a7 163842     root       600        1
>
> This means you have 3 semaphore sets, of one semaphore each.
>
>>
>> ------ Message Queues --------
>> key        msqid      owner      perms      used-bytes   messages
>>
>
> Also the rest is moderate usage.
>
>> # cat /proc/sys/kernel/sem
>> 250	32000	32	128
>
> So you are far from the maxima (250 per set, 32000 total, 128 sets).
>
>>
>>
>> Do you see anything in this output?
>> I have no clue how to interpret this...
>
> See e.g. http://man7.org/linux/man-pages/man5/proc.5.html
>
> Is the above on a node? engine? both nodes are similar? If so, that's
> not the reason for the "no space left on device".

Same on both hosts.
These are CentOS 6.5 hosts which are the base for hosted engine.

>
> If this error is reproducible, you can try to find the process that this
> happens to (perhaps libvirtd, vdsmd, or the hosted-engine ha daemon) and do:
> strace -f -o /tmp/trace1 -tt -s 512 -p PID
> where PID is the pid of that process, then search /tmp/trace1 for 'no space
> left on device' and see the exact call that failed.

Thanks a lot for the troubleshooting tips.
I figured the following out:

strace of libvirtd:

3296  17:10:05.396192 write(4, "2014-03-03 16:10:05.396+0000: 3296: 
error : virLockManagerSanlockAcquire:974 : Failed to acquire lock: No 
space left on device\n", 127 <unfinished ...>

Then I checked sanlock.log where I found the following error message 
(which could to be the reason for No space left on device):
2014-03-03 17:10:05+0100 25094 [3105]: r6 cmd_acquire 2,9,11852 invalid 
lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82

So my question is now if I can remove the lockspace file (it should be 
hosted-engine.lockspace located in 
/rhev/data-center/mnt/ovirt-host01\:_engine/2851af27-8744-445d-9fb1-a0d083c8dc82/ha_agent/, 
right?) and it will be created again. I fear the GlusterFS split-brain 
situation destroyed it as this file was affected.


Thanks,
René


>



More information about the Users mailing list