OK, setting up gluster on a dedicated network is easier this time
around, mostly point and click adventure (setting everything up from
scratch):
- 4 NIC's, 2 bonds, one for ovirtmgmt and the other one for gluster
- Tagged gluster network for gluster traffic
- configured IP addresses without gateways on gluster dedicated bonds on
both nodes
- allowed_replica_counts=1,2,3 in gluster section within
/etc/vdsm/vdsm.conf to allow replica 2
- added transport.socket.bind-address to /etc/glusterfs/glusterd.vol to
force glusterd to listen only from gluster dedicated IP address
- modified /etc/hosts so that the nodes can resolve each other by
gluster dedicated hostnames (optional)
- probed the peers by their gluster hostnames
- created the volume in the same fashion (I've tried creating another
one from oVirt webadmin and it works also)
- oVirt picked it up and I was able to create gluster storage domain on
this volume (+ optimized the volume for virt store)
- tcpdump and iftop shows that replication is going through gluster
dedicated interfaces
One problem so far, creating preallocated disk images fails. It broke
after zeroing out some 37GB of 40GB in total, but it's an intermittent
issue (sometimes it fails earlier), I'm still poking around to find the
culprit. Thin provisioning works. Bricks and volume are fine, as are
gluster services. Bandwidth related issues from what I can see (large
amount of net traffic during flushes, rpc_clnt_ping_timer_expired
followed by sanlock renewal errors), but I'll report it as soon as I can
confirm it's not a hardware/configuration related issue.
vdsm.log:
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,782::utils::716::Storage.Misc.excCmd::(watchCmd) FAILED:
<err> = ["/usr/bin/dd: error writing
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae':
Transport endpoint is not connected", "/usr/bin/dd: closing output
file
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae':
Transport endpoint is not connected"]; <rc> = 1
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19
18:03:20,783::fileVolume::133::Storage.Volume::(_create) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/fileVolume.py", line 129, in _create
vars.task.aborting, sizeBytes)
File "/usr/share/vdsm/storage/misc.py", line 350, in ddWatchCopy
raise se.MiscBlockWriteException(dst, offset, size)
MiscBlockWriteException: Internal block device write failure:
u'name=/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae,
offset=0, size=42949672960'
jsonrpc.Executor/7::DEBUG::2016-01-19
18:03:20,784::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest)
Return 'GlusterTask.list' in bridge with {'tasks': {}}
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19
18:03:20,790::volume::515::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/volume.py", line 476, in create
initialSize=initialSize)
File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
raise se.VolumesZeroingError(volPath)
VolumesZeroingError: Cannot zero out volume:
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,795::resourceManager::616::Storage.ResourceManager::(releaseResource)
Trying to release resource
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,796::resourceManager::635::Storage.ResourceManager::(releaseResource)
Released resource
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
(0 active users)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,796::resourceManager::641::Storage.ResourceManager::(releaseResource)
Resource
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
is free, finding out if anyone is waiting for it.
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,796::resourceManager::649::Storage.ResourceManager::(releaseResource)
No one is waiting for resource
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de',
Clearing records.
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19
18:03:20,797::task::866::Storage.TaskManager.Task::(_setError)
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/storage/task.py", line 332, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
return method(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 1886, in createVolume
initialSize=initialSize)
File "/usr/share/vdsm/storage/sd.py", line 488, in createVolume
initialSize=initialSize)
File "/usr/share/vdsm/storage/volume.py", line 476, in create
initialSize=initialSize)
File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
raise se.VolumesZeroingError(volPath)
VolumesZeroingError: Cannot zero out volume:
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,798::task::885::Storage.TaskManager.Task::(_run)
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Task._run:
bf482d82-d8f9-442d-ba93-da5ec225c8c3 () {} failed - stopping task
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,798::task::1246::Storage.TaskManager.Task::(stop)
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::stopping in state running
(force False)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,798::task::993::Storage.TaskManager.Task::(_decref)
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::ref 1 aborting True
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19
18:03:20,799::task::919::Storage.TaskManager.Task::(_runJobs)
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::aborting: Task is
aborted: 'Cannot zero out volume' - code 374
On 01/18/2016 04:00 PM, combuster wrote:
oVirt is still managing the cluster via ovirtmgmt network. The same
rule applies for tagging networks as VM networks, Live Migration
networks etc. Gluster is no different, except that it involved a
couple of manual steps for us to configure it.
On 01/18/2016 03:53 PM, Fil Di Noto wrote:
> Thanks I will try this. I am running ovirt-engine 3.6.1.3-1.el7.centos
>
> In the configuration described, is oVirt able to manage gluster? I am
> confused because if oVirt knows the nodes by their ovirtmgmt network
> IP/hostname aren't all the VDSM commands going to fail?
>
>
>
> On Mon, Jan 18, 2016 at 6:39 AM, combuster <combuster(a)gmail.com> wrote:
>> Hi Fil,
>>
>> this worked for me a couple of months back:
>>
>>
http://lists.ovirt.org/pipermail/users/2015-November/036235.html
>>
>> I'll try to set this up again, and see if there are any issues.
>> Which oVirt
>> release are you running ?
>>
>> Ivan
>>
>> On 01/18/2016 02:56 PM, Fil Di Noto wrote:
>>> I'm having trouble setting up a dedicated storage network.
>>>
>>> I have a separate VLAN designated for storage, and configured separate
>>> IP addresses for each host that correspond to that subnet. I have
>>> tested this subnet extensively and it is working as expected.
>>>
>>> Prior to adding the hosts, I configured a storage network and
>>> configured the cluster to use that network for storage and not the
>>> ovirtmgmt network. I was hopping that this would be recognized when
>>> the hosts were added but it was not. I had to actually reconfigure the
>>> storage VLAN interface via oVirt "manage host networks" just to
bring
>>> the host networks into compliance. The IP is configured directly on
>>> the bond0.<vlanid>, not on a bridge interface which I assume is
>>> correct since it is not a "VM" network.
>>>
>>> In this setup I was not able to activate any of the hosts due to VDSM
>>> gluster errors, I think it was because VDSM was trying to use the
>>> hostname/IP of the ovirtmgmt network. I manually set up the peers
>>> using "gluster peer probe" and I was able to activate the hosts
but
>>> they were not using the storage network (tcpdump). I also tried adding
>>> DNS records for the storage network interfaces using different
>>> hostnames but gluster seemed to still consider the ovirtmgmt interface
>>> as the primary.
>>>
>>> With the hosts active, I couldn't create/activate any volumes until I
>>> changed the cluster network settings to use the ovirtmgmt network for
>>> storage. I ended up abandoning the dedicated storage subnet for the
>>> time being and I'm starting to wonder if running virtualization and
>>> gluster on the same hosts is intended to work this way.
>>>
>>> Assuming that it should work, what is the correct way to configure it?
>>> I can't find any docs that go in detail about storage networks. Is
>>> reverse DNS a factor? If I had a better understanding of what oVirt is
>>> expecting to see that would be helpful.
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>