[ovirt-users] Storage network clarification

Tue Jan 19 20:16:30 UTC 2016

Increasing network ping timeout and lowering the number of io threads 
helped. Disk image gets created, but during that time nodes are pretty 
much unresponsive. I should've expected that on my setup...

In any case, I hope this helps...

Ivan

On 01/19/2016 06:43 PM, combuster wrote:
> OK, setting up gluster on a dedicated network is easier this time 
> around, mostly point and click adventure (setting everything up from 
> scratch):
>
> - 4 NIC's, 2 bonds, one for ovirtmgmt and the other one for gluster
> - Tagged gluster network for gluster traffic
> - configured IP addresses without gateways on gluster dedicated bonds 
> on both nodes
> - allowed_replica_counts=1,2,3 in gluster section within 
> /etc/vdsm/vdsm.conf to allow replica 2
> - added transport.socket.bind-address to /etc/glusterfs/glusterd.vol 
> to force glusterd to listen only from gluster dedicated IP address
> - modified /etc/hosts so that the nodes can resolve each other by 
> gluster dedicated hostnames (optional)
> - probed the peers by their gluster hostnames
> - created the volume in the same fashion (I've tried creating another 
> one from oVirt webadmin and it works also)
> - oVirt picked it up and I was able to create gluster storage domain 
> on this volume (+ optimized the volume for virt store)
> - tcpdump and iftop shows that replication is going through gluster 
> dedicated interfaces
>
> One problem so far, creating preallocated disk images fails. It broke 
> after zeroing out some 37GB of 40GB in total, but it's an intermittent 
> issue (sometimes it fails earlier), I'm still poking around to find 
> the culprit. Thin provisioning works. Bricks and volume are fine, as 
> are gluster services. Bandwidth related issues from what I can see 
> (large amount of net traffic during flushes, 
> rpc_clnt_ping_timer_expired followed by sanlock renewal errors), but 
> I'll report it as soon as I can confirm it's not a 
> hardware/configuration related issue.
>
> vdsm.log:
>
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,782::utils::716::Storage.Misc.excCmd::(watchCmd) FAILED: 
>> <err> = ["/usr/bin/dd: error writing 
>> '/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae': 
>> Transport endpoint is not connected", "/usr/bin/dd: closing output 
>> file 
>> '/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae': 
>> Transport endpoint is not connected"]; <rc> = 1
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
>> 18:03:20,783::fileVolume::133::Storage.Volume::(_create) Unexpected 
>> error
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/fileVolume.py", line 129, in _create
>>     vars.task.aborting, sizeBytes)
>>   File "/usr/share/vdsm/storage/misc.py", line 350, in ddWatchCopy
>>     raise se.MiscBlockWriteException(dst, offset, size)
>> MiscBlockWriteException: Internal block device write failure: 
>> u'name=/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae, 
>> offset=0, size=42949672960'
>> jsonrpc.Executor/7::DEBUG::2016-01-19 
>> 18:03:20,784::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) 
>> Return 'GlusterTask.list' in bridge with {'tasks': {}}
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
>> 18:03:20,790::volume::515::Storage.Volume::(create) Unexpected error
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/volume.py", line 476, in create
>>     initialSize=initialSize)
>>   File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
>>     raise se.VolumesZeroingError(volPath)
>> VolumesZeroingError: Cannot zero out volume: 
>> (u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,795::resourceManager::616::Storage.ResourceManager::(releaseResource) 
>> Trying to release resource 
>> '9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,796::resourceManager::635::Storage.ResourceManager::(releaseResource) 
>> Released resource 
>> '9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de' 
>> (0 active users)
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,796::resourceManager::641::Storage.ResourceManager::(releaseResource) 
>> Resource 
>> '9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de' 
>> is free, finding out if anyone is waiting for it.
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,796::resourceManager::649::Storage.ResourceManager::(releaseResource) 
>> No one is waiting for resource 
>> '9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de', 
>> Clearing records.
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
>> 18:03:20,797::task::866::Storage.TaskManager.Task::(_setError) 
>> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Unexpected error
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/task.py", line 873, in _run
>>     return fn(*args, **kargs)
>>   File "/usr/share/vdsm/storage/task.py", line 332, in run
>>     return self.cmd(*self.argslist, **self.argsdict)
>>   File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
>>     return method(self, *args, **kwargs)
>>   File "/usr/share/vdsm/storage/sp.py", line 1886, in createVolume
>>     initialSize=initialSize)
>>   File "/usr/share/vdsm/storage/sd.py", line 488, in createVolume
>>     initialSize=initialSize)
>>   File "/usr/share/vdsm/storage/volume.py", line 476, in create
>>     initialSize=initialSize)
>>   File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
>>     raise se.VolumesZeroingError(volPath)
>> VolumesZeroingError: Cannot zero out volume: 
>> (u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,798::task::885::Storage.TaskManager.Task::(_run) 
>> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Task._run: 
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3 () {} failed - stopping task
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,798::task::1246::Storage.TaskManager.Task::(stop) 
>> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::stopping in state 
>> running (force False)
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,798::task::993::Storage.TaskManager.Task::(_decref) 
>> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::ref 1 aborting True
>> bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
>> 18:03:20,799::task::919::Storage.TaskManager.Task::(_runJobs) 
>> Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::aborting: Task is 
>> aborted: 'Cannot zero out volume' - code 374
>
>
> On 01/18/2016 04:00 PM, combuster wrote:
>> oVirt is still managing the cluster via ovirtmgmt network. The same 
>> rule applies for tagging networks as VM networks, Live Migration 
>> networks etc. Gluster is no different, except that it involved a 
>> couple of manual steps for us to configure it.
>>
>> On 01/18/2016 03:53 PM, Fil Di Noto wrote:
>>> Thanks I will try this. I am running ovirt-engine 3.6.1.3-1.el7.centos
>>>
>>> In the configuration described, is oVirt able to manage gluster? I am
>>> confused because if oVirt knows the nodes by their ovirtmgmt network
>>> IP/hostname aren't all the VDSM commands going to fail?
>>>
>>>
>>>
>>> On Mon, Jan 18, 2016 at 6:39 AM, combuster <combuster at gmail.com> wrote:
>>>> Hi Fil,
>>>>
>>>> this worked for me a couple of months back:
>>>>
>>>> http://lists.ovirt.org/pipermail/users/2015-November/036235.html
>>>>
>>>> I'll try to set this up again, and see if there are any issues. 
>>>> Which oVirt
>>>> release are you running ?
>>>>
>>>> Ivan
>>>>
>>>> On 01/18/2016 02:56 PM, Fil Di Noto wrote:
>>>>> I'm having trouble setting up a dedicated storage network.
>>>>>
>>>>> I have a separate VLAN designated for storage, and configured 
>>>>> separate
>>>>> IP addresses for each host that correspond to that subnet. I have
>>>>> tested this subnet extensively and it is working as expected.
>>>>>
>>>>> Prior to adding the hosts, I configured a storage network and
>>>>> configured the cluster to use that network for storage and not the
>>>>> ovirtmgmt network. I was hopping that this would be recognized when
>>>>> the hosts were added but it was not. I had to actually reconfigure 
>>>>> the
>>>>> storage VLAN interface via oVirt "manage host networks" just to bring
>>>>> the host networks into compliance. The IP is configured directly on
>>>>> the bond0.<vlanid>, not on a bridge interface which I assume is
>>>>> correct since it is not a "VM" network.
>>>>>
>>>>> In this setup I was not able to activate any of the hosts due to VDSM
>>>>> gluster errors, I think it was because VDSM was trying to use the
>>>>> hostname/IP of the ovirtmgmt network. I manually set up the peers
>>>>> using "gluster peer probe" and I was able to activate the hosts but
>>>>> they were not using the storage network (tcpdump). I also tried 
>>>>> adding
>>>>> DNS records for the storage network interfaces using different
>>>>> hostnames but gluster seemed to still consider the ovirtmgmt 
>>>>> interface
>>>>> as the primary.
>>>>>
>>>>> With the hosts active, I couldn't create/activate any volumes until I
>>>>> changed the cluster network settings to use the ovirtmgmt network for
>>>>> storage. I ended up abandoning the dedicated storage subnet for the
>>>>> time being and I'm starting to wonder if running virtualization and
>>>>> gluster on the same hosts is intended to work this way.
>>>>>
>>>>> Assuming that it should work, what is the correct way to configure 
>>>>> it?
>>>>> I can't find any docs that go in detail about storage networks. Is
>>>>> reverse DNS a factor? If I had a better understanding of what 
>>>>> oVirt is
>>>>> expecting to see that would be helpful.
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>
>