Increasing network ping timeout and lowering the number of io threads
helped. Disk image gets created, but during that time nodes are pretty
much unresponsive. I should've expected that on my setup...
In any case, I hope this helps...
Ivan
On 01/19/2016 06:43 PM, combuster wrote:
OK, setting up gluster on a dedicated network is easier this time
around, mostly point and click adventure (setting everything up from
scratch):
- 4 NIC's, 2 bonds, one for ovirtmgmt and the other one for gluster
- Tagged gluster network for gluster traffic
- configured IP addresses without gateways on gluster dedicated bonds
on both nodes
- allowed_replica_counts=1,2,3 in gluster section within
/etc/vdsm/vdsm.conf to allow replica 2
- added transport.socket.bind-address to /etc/glusterfs/glusterd.vol
to force glusterd to listen only from gluster dedicated IP address
- modified /etc/hosts so that the nodes can resolve each other by
gluster dedicated hostnames (optional)
- probed the peers by their gluster hostnames
- created the volume in the same fashion (I've tried creating another
one from oVirt webadmin and it works also)
- oVirt picked it up and I was able to create gluster storage domain
on this volume (+ optimized the volume for virt store)
- tcpdump and iftop shows that replication is going through gluster
dedicated interfaces
One problem so far, creating preallocated disk images fails. It broke
after zeroing out some 37GB of 40GB in total, but it's an intermittent
issue (sometimes it fails earlier), I'm still poking around to find
the culprit. Thin provisioning works. Bricks and volume are fine, as
are gluster services. Bandwidth related issues from what I can see
(large amount of net traffic during flushes,
rpc_clnt_ping_timer_expired followed by sanlock renewal errors), but
I'll report it as soon as I can confirm it's not a
hardware/configuration related issue.
vdsm.log:
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,782::utils::716::Storage.Misc.excCmd::(watchCmd) FAILED:
> <err> = ["/usr/bin/dd: error writing
>
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae':
> Transport endpoint is not connected", "/usr/bin/dd: closing output
> file
>
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae':
> Transport endpoint is not connected"]; <rc> = 1
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19
> 18:03:20,783::fileVolume::133::Storage.Volume::(_create) Unexpected
> error
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/fileVolume.py", line 129, in _create
> vars.task.aborting, sizeBytes)
> File "/usr/share/vdsm/storage/misc.py", line 350, in ddWatchCopy
> raise se.MiscBlockWriteException(dst, offset, size)
> MiscBlockWriteException: Internal block device write failure:
>
u'name=/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae,
> offset=0, size=42949672960'
> jsonrpc.Executor/7::DEBUG::2016-01-19
> 18:03:20,784::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest)
> Return 'GlusterTask.list' in bridge with {'tasks': {}}
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19
> 18:03:20,790::volume::515::Storage.Volume::(create) Unexpected error
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/volume.py", line 476, in create
> initialSize=initialSize)
> File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
> raise se.VolumesZeroingError(volPath)
> VolumesZeroingError: Cannot zero out volume:
>
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,795::resourceManager::616::Storage.ResourceManager::(releaseResource)
> Trying to release resource
>
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,796::resourceManager::635::Storage.ResourceManager::(releaseResource)
> Released resource
>
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
> (0 active users)
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,796::resourceManager::641::Storage.ResourceManager::(releaseResource)
> Resource
>
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
> is free, finding out if anyone is waiting for it.
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,796::resourceManager::649::Storage.ResourceManager::(releaseResource)
> No one is waiting for resource
>
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de',
> Clearing records.
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19
> 18:03:20,797::task::866::Storage.TaskManager.Task::(_setError)
> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Unexpected error
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/task.py", line 873, in _run
> return fn(*args, **kargs)
> File "/usr/share/vdsm/storage/task.py", line 332, in run
> return self.cmd(*self.argslist, **self.argsdict)
> File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
> return method(self, *args, **kwargs)
> File "/usr/share/vdsm/storage/sp.py", line 1886, in createVolume
> initialSize=initialSize)
> File "/usr/share/vdsm/storage/sd.py", line 488, in createVolume
> initialSize=initialSize)
> File "/usr/share/vdsm/storage/volume.py", line 476, in create
> initialSize=initialSize)
> File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
> raise se.VolumesZeroingError(volPath)
> VolumesZeroingError: Cannot zero out volume:
>
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,798::task::885::Storage.TaskManager.Task::(_run)
> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Task._run:
> bf482d82-d8f9-442d-ba93-da5ec225c8c3 () {} failed - stopping task
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,798::task::1246::Storage.TaskManager.Task::(stop)
> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::stopping in state
> running (force False)
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,798::task::993::Storage.TaskManager.Task::(_decref)
> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::ref 1 aborting True
> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
> 18:03:20,799::task::919::Storage.TaskManager.Task::(_runJobs)
> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::aborting: Task is
> aborted: 'Cannot zero out volume' - code 374
On 01/18/2016 04:00 PM, combuster wrote:
> oVirt is still managing the cluster via ovirtmgmt network. The same
> rule applies for tagging networks as VM networks, Live Migration
> networks etc. Gluster is no different, except that it involved a
> couple of manual steps for us to configure it.
>
> On 01/18/2016 03:53 PM, Fil Di Noto wrote:
>> Thanks I will try this. I am running ovirt-engine 3.6.1.3-1.el7.centos
>>
>> In the configuration described, is oVirt able to manage gluster? I am
>> confused because if oVirt knows the nodes by their ovirtmgmt network
>> IP/hostname aren't all the VDSM commands going to fail?
>>
>>
>>
>> On Mon, Jan 18, 2016 at 6:39 AM, combuster <combuster(a)gmail.com> wrote:
>>> Hi Fil,
>>>
>>> this worked for me a couple of months back:
>>>
>>>
http://lists.ovirt.org/pipermail/users/2015-November/036235.html
>>>
>>> I'll try to set this up again, and see if there are any issues.
>>> Which oVirt
>>> release are you running ?
>>>
>>> Ivan
>>>
>>> On 01/18/2016 02:56 PM, Fil Di Noto wrote:
>>>> I'm having trouble setting up a dedicated storage network.
>>>>
>>>> I have a separate VLAN designated for storage, and configured
>>>> separate
>>>> IP addresses for each host that correspond to that subnet. I have
>>>> tested this subnet extensively and it is working as expected.
>>>>
>>>> Prior to adding the hosts, I configured a storage network and
>>>> configured the cluster to use that network for storage and not the
>>>> ovirtmgmt network. I was hopping that this would be recognized when
>>>> the hosts were added but it was not. I had to actually reconfigure
>>>> the
>>>> storage VLAN interface via oVirt "manage host networks" just to
bring
>>>> the host networks into compliance. The IP is configured directly on
>>>> the bond0.<vlanid>, not on a bridge interface which I assume is
>>>> correct since it is not a "VM" network.
>>>>
>>>> In this setup I was not able to activate any of the hosts due to VDSM
>>>> gluster errors, I think it was because VDSM was trying to use the
>>>> hostname/IP of the ovirtmgmt network. I manually set up the peers
>>>> using "gluster peer probe" and I was able to activate the hosts
but
>>>> they were not using the storage network (tcpdump). I also tried
>>>> adding
>>>> DNS records for the storage network interfaces using different
>>>> hostnames but gluster seemed to still consider the ovirtmgmt
>>>> interface
>>>> as the primary.
>>>>
>>>> With the hosts active, I couldn't create/activate any volumes until
I
>>>> changed the cluster network settings to use the ovirtmgmt network for
>>>> storage. I ended up abandoning the dedicated storage subnet for the
>>>> time being and I'm starting to wonder if running virtualization and
>>>> gluster on the same hosts is intended to work this way.
>>>>
>>>> Assuming that it should work, what is the correct way to configure
>>>> it?
>>>> I can't find any docs that go in detail about storage networks. Is
>>>> reverse DNS a factor? If I had a better understanding of what
>>>> oVirt is
>>>> expecting to see that would be helpful.
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>
>