[Users] hosted engine issues
René Koch
rkoch at linuxland.at
Mon Mar 3 11:17:49 EST 2014
On 03/03/2014 03:25 PM, Yedidyah Bar David wrote:
> ----- Original Message -----
>> From: "René Koch" <rkoch at linuxland.at>
>> To: "Yedidyah Bar David" <didi at redhat.com>, "Martin Sivak" <msivak at redhat.com>
>> Cc: users at ovirt.org
>> Sent: Monday, March 3, 2014 4:10:51 PM
>> Subject: Re: [Users] hosted engine issues
>>
>> On 03/03/2014 02:13 PM, Yedidyah Bar David wrote:
>>>> Me neither. Is everything Read-Write? Read-Only FS might report no space
>>>> left
>>>> as well in some cases. Other than that, I do not know.
>>>
>>> Perhaps some ipc resource? semaphores?
>>>
>>> Please check:
>>>
>>> ipcs
>>>
>>> cat /proc/sys/kernel/sem
>>>
>>> I know nothing about libvirt, that's just a wild guess.
>>
>> # ipcs
>>
>> ------ Shared Memory Segments --------
>> key shmid owner perms bytes nattch status
>>
>> 0x00000000 0 root 644 80 2
>>
>> 0x00000000 32769 root 644 16384 2
>>
>> 0x00000000 65538 root 644 280 2
>>
>>
>> ------ Semaphore Arrays --------
>> key semid owner perms nsems
>> 0x00000000 0 root 600 1
>> 0x00000000 65537 root 600 1
>> 0x000000a7 163842 root 600 1
>
> This means you have 3 semaphore sets, of one semaphore each.
>
>>
>> ------ Message Queues --------
>> key msqid owner perms used-bytes messages
>>
>
> Also the rest is moderate usage.
>
>> # cat /proc/sys/kernel/sem
>> 250 32000 32 128
>
> So you are far from the maxima (250 per set, 32000 total, 128 sets).
>
>>
>>
>> Do you see anything in this output?
>> I have no clue how to interpret this...
>
> See e.g. http://man7.org/linux/man-pages/man5/proc.5.html
>
> Is the above on a node? engine? both nodes are similar? If so, that's
> not the reason for the "no space left on device".
Same on both hosts.
These are CentOS 6.5 hosts which are the base for hosted engine.
>
> If this error is reproducible, you can try to find the process that this
> happens to (perhaps libvirtd, vdsmd, or the hosted-engine ha daemon) and do:
> strace -f -o /tmp/trace1 -tt -s 512 -p PID
> where PID is the pid of that process, then search /tmp/trace1 for 'no space
> left on device' and see the exact call that failed.
Thanks a lot for the troubleshooting tips.
I figured the following out:
strace of libvirtd:
3296 17:10:05.396192 write(4, "2014-03-03 16:10:05.396+0000: 3296:
error : virLockManagerSanlockAcquire:974 : Failed to acquire lock: No
space left on device\n", 127 <unfinished ...>
Then I checked sanlock.log where I found the following error message
(which could to be the reason for No space left on device):
2014-03-03 17:10:05+0100 25094 [3105]: r6 cmd_acquire 2,9,11852 invalid
lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
So my question is now if I can remove the lockspace file (it should be
hosted-engine.lockspace located in
/rhev/data-center/mnt/ovirt-host01\:_engine/2851af27-8744-445d-9fb1-a0d083c8dc82/ha_agent/,
right?) and it will be created again. I fear the GlusterFS split-brain
situation destroyed it as this file was affected.
Thanks,
René
>
More information about the Users
mailing list