Re: [ovirt-users] Storage network clarification

19 Jan 2016

      OK, setting up gluster on a dedicated network is easier this time 
around, mostly point and click adventure (setting everything up from 
scratch):

- 4 NIC's, 2 bonds, one for ovirtmgmt and the other one for gluster
- Tagged gluster network for gluster traffic
- configured IP addresses without gateways on gluster dedicated bonds on 
both nodes
- allowed_replica_counts=1,2,3 in gluster section within 
/etc/vdsm/vdsm.conf to allow replica 2
- added transport.socket.bind-address to /etc/glusterfs/glusterd.vol to 
force glusterd to listen only from gluster dedicated IP address
- modified /etc/hosts so that the nodes can resolve each other by 
gluster dedicated hostnames (optional)
- probed the peers by their gluster hostnames
- created the volume in the same fashion (I've tried creating another 
one from oVirt webadmin and it works also)
- oVirt picked it up and I was able to create gluster storage domain on 
this volume (+ optimized the volume for virt store)
- tcpdump and iftop shows that replication is going through gluster 
dedicated interfaces

One problem so far, creating preallocated disk images fails. It broke 
after zeroing out some 37GB of 40GB in total, but it's an intermittent 
issue (sometimes it fails earlier), I'm still poking around to find the 
culprit. Thin provisioning works. Bricks and volume are fine, as are 
gluster services. Bandwidth related issues from what I can see (large 
amount of net traffic during flushes, rpc_clnt_ping_timer_expired 
followed by sanlock renewal errors), but I'll report it as soon as I can 
confirm it's not a hardware/configuration related issue.

vdsm.log:
...
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,782::utils::716::Storage.Misc.excCmd::(watchCmd) FAILED: 
<err> = ["/usr/bin/dd: error writing 
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae': 
Transport endpoint is not connected", "/usr/bin/dd: closing output 
file 
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae': 
Transport endpoint is not connected"]; <rc> = 1
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
18:03:20,783::fileVolume::133::Storage.Volume::(_create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/fileVolume.py", line 129, in _create
    vars.task.aborting, sizeBytes)
  File "/usr/share/vdsm/storage/misc.py", line 350, in ddWatchCopy
    raise se.MiscBlockWriteException(dst, offset, size)
MiscBlockWriteException: Internal block device write failure: 
u'name=/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae, 
offset=0, size=42949672960'
jsonrpc.Executor/7::DEBUG::2016-01-19 
18:03:20,784::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) 
Return 'GlusterTask.list' in bridge with {'tasks': {}}
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
18:03:20,790::volume::515::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 476, in create
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
    raise se.VolumesZeroingError(volPath)
VolumesZeroingError: Cannot zero out volume: 
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,795::resourceManager::616::Storage.ResourceManager::(releaseResource) 
Trying to release resource 
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,796::resourceManager::635::Storage.ResourceManager::(releaseResource) 
Released resource 
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de' 
(0 active users)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,796::resourceManager::641::Storage.ResourceManager::(releaseResource) 
Resource 
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de' 
is free, finding out if anyone is waiting for it.
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,796::resourceManager::649::Storage.ResourceManager::(releaseResource) 
No one is waiting for resource 
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de', 
Clearing records.
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
18:03:20,797::task::866::Storage.TaskManager.Task::(_setError) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 332, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1886, in createVolume
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/sd.py", line 488, in createVolume
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/volume.py", line 476, in create
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
    raise se.VolumesZeroingError(volPath)
VolumesZeroingError: Cannot zero out volume: 
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,798::task::885::Storage.TaskManager.Task::(_run) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Task._run: 
bf482d82-d8f9-442d-ba93-da5ec225c8c3 () {} failed - stopping task
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,798::task::1246::Storage.TaskManager.Task::(stop) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::stopping in state running 
(force False)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,798::task::993::Storage.TaskManager.Task::(_decref) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::ref 1 aborting True
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,799::task::919::Storage.TaskManager.Task::(_runJobs) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::aborting: Task is 
aborted: 'Cannot zero out volume' - code 374
On 01/18/2016 04:00 PM, combuster wrote:
...
oVirt is still managing the cluster via ovirtmgmt network. The same 
rule applies for tagging networks as VM networks, Live Migration 
networks etc. Gluster is no different, except that it involved a 
couple of manual steps for us to configure it.
On 01/18/2016 03:53 PM, Fil Di Noto wrote:
...
Thanks I will try this. I am running ovirt-engine 3.6.1.3-1.el7.centos
In the configuration described, is oVirt able to manage gluster? I am
confused because if oVirt knows the nodes by their ovirtmgmt network
IP/hostname aren't all the VDSM commands going to fail?
On Mon, Jan 18, 2016 at 6:39 AM, combuster <combuster@gmail.com> wrote:
...
Hi Fil,
this worked for me a couple of months back:
http://lists.ovirt.org/pipermail/users/2015-November/036235.html
I'll try to set this up again, and see if there are any issues. 
Which oVirt
release are you running ?
Ivan
On 01/18/2016 02:56 PM, Fil Di Noto wrote:
...
I'm having trouble setting up a dedicated storage network.
I have a separate VLAN designated for storage, and configured separate
IP addresses for each host that correspond to that subnet. I have
tested this subnet extensively and it is working as expected.
Prior to adding the hosts, I configured a storage network and
configured the cluster to use that network for storage and not the
ovirtmgmt network. I was hopping that this would be recognized when
the hosts were added but it was not. I had to actually reconfigure the
storage VLAN interface via oVirt "manage host networks" just to bring
the host networks into compliance. The IP is configured directly on
the bond0.<vlanid>, not on a bridge interface which I assume is
correct since it is not a "VM" network.
In this setup I was not able to activate any of the hosts due to VDSM
gluster errors, I think it was because VDSM was trying to use the
hostname/IP of the ovirtmgmt network. I manually set up the peers
using "gluster peer probe" and I was able to activate the hosts but
they were not using the storage network (tcpdump). I also tried adding
DNS records for the storage network interfaces using different
hostnames but gluster seemed to still consider the ovirtmgmt interface
as the primary.
With the hosts active, I couldn't create/activate any volumes until I
changed the cluster network settings to use the ovirtmgmt network for
storage. I ended up abandoning the dedicated storage subnet for the
time being and I'm starting to wonder if running virtualization and
gluster on the same hosts is intended to work this way.
Assuming that it should work, what is the correct way to configure it?
I can't find any docs that go in detail about storage networks. Is
reverse DNS a factor? If I had a better understanding of what oVirt is
expecting to see that would be helpful.
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users