Hi All,
With the help of gluster community and ovirt-china community...my issue got
resolved...
The main root cause was the following :-
1. the glob operation takes quite a long time, longer than the ioprocess
default 60s..
2. python-ioprocess updated which makes a single change of configuration
file doesn't work properly, only because this we should hack the code
manually...
Solution (Need to do on all the hosts) :-
1. Add the the ioprocess timeout value in the /etc/vdsm/vdsm.conf file as
:-
------------
[irs]
process_pool_timeout = 180
-------------
2. Check /usr/share/vdsm/storage/outOfProcess.py, line 71 and see whether
there is still "IOProcess(DEFAULT_TIMEOUT)" in it,if yes...then changing
the configuration file takes no effect because now timeout is the third
parameter not the second of IOProcess.__init__().
3. Change IOProcess(DEFAULT_TIMEOUT) to
IOProcess(timeout=DEFAULT_TIMEOUT) and remove the
/usr/share/vdsm/storage/outOfProcess.pyc file and restart vdsm and
supervdsm service on all hosts....
Thanks,
Punit Dambiwal
On Mon, Mar 23, 2015 at 9:18 AM, Punit Dambiwal <hypunit(a)gmail.com> wrote:
Hi All,
Still i am facing the same issue...please help me to overcome this issue...
Thanks,
punit
On Fri, Mar 20, 2015 at 12:22 AM, Thomas Holkenbrink <
thomas.holkenbrink(a)fibercloud.com> wrote:
> I’ve seen this before. The system thinks the storage system us up and
> running and then attempts to utilize it.
>
> The way I got around it was to put a delay in the startup of the gluster
> Node on the interface that the clients use to communicate.
>
>
>
> I use a bonded link, I then add a LINKDELAY to the interface to get the
> underlying system up and running before the network comes up. This then
> causes Network dependent features to wait for the network to finish.
>
> It adds about 10seconds to the startup time, in our environment it works
> well, you may not need as long of a delay.
>
>
>
> CentOS
>
> root@gls1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
>
>
>
> DEVICE=bond0
>
> ONBOOT=yes
>
> BOOTPROTO=static
>
> USERCTL=no
>
> NETMASK=255.255.248.0
>
> IPADDR=10.10.1.17
>
> MTU=9000
>
> IPV6INIT=no
>
> IPV6_AUTOCONF=no
>
> NETWORKING_IPV6=no
>
> NM_CONTROLLED=no
>
> LINKDELAY=10
>
> NAME="System Storage Bond0"
>
>
>
>
>
>
>
>
>
> Hi Michal,
>
>
>
> The Storage domain is up and running and mounted on all the host
> nodes...as i updated before that it was working perfectly before but just
> after reboot can not make the VM poweron...
>
>
>
> [image: Inline image 1]
>
>
>
> [image: Inline image 2]
>
>
>
> [root@cpu01 log]# gluster volume info
>
>
>
> Volume Name: ds01
>
> Type: Distributed-Replicate
>
> Volume ID: 369d3fdc-c8eb-46b7-a33e-0a49f2451ff6
>
> Status: Started
>
> Number of Bricks: 48 x 2 = 96
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: cpu01:/bricks/1/vol1
>
> Brick2: cpu02:/bricks/1/vol1
>
> Brick3: cpu03:/bricks/1/vol1
>
> Brick4: cpu04:/bricks/1/vol1
>
> Brick5: cpu01:/bricks/2/vol1
>
> Brick6: cpu02:/bricks/2/vol1
>
> Brick7: cpu03:/bricks/2/vol1
>
> Brick8: cpu04:/bricks/2/vol1
>
> Brick9: cpu01:/bricks/3/vol1
>
> Brick10: cpu02:/bricks/3/vol1
>
> Brick11: cpu03:/bricks/3/vol1
>
> Brick12: cpu04:/bricks/3/vol1
>
> Brick13: cpu01:/bricks/4/vol1
>
> Brick14: cpu02:/bricks/4/vol1
>
> Brick15: cpu03:/bricks/4/vol1
>
> Brick16: cpu04:/bricks/4/vol1
>
> Brick17: cpu01:/bricks/5/vol1
>
> Brick18: cpu02:/bricks/5/vol1
>
> Brick19: cpu03:/bricks/5/vol1
>
> Brick20: cpu04:/bricks/5/vol1
>
> Brick21: cpu01:/bricks/6/vol1
>
> Brick22: cpu02:/bricks/6/vol1
>
> Brick23: cpu03:/bricks/6/vol1
>
> Brick24: cpu04:/bricks/6/vol1
>
> Brick25: cpu01:/bricks/7/vol1
>
> Brick26: cpu02:/bricks/7/vol1
>
> Brick27: cpu03:/bricks/7/vol1
>
> Brick28: cpu04:/bricks/7/vol1
>
> Brick29: cpu01:/bricks/8/vol1
>
> Brick30: cpu02:/bricks/8/vol1
>
> Brick31: cpu03:/bricks/8/vol1
>
> Brick32: cpu04:/bricks/8/vol1
>
> Brick33: cpu01:/bricks/9/vol1
>
> Brick34: cpu02:/bricks/9/vol1
>
> Brick35: cpu03:/bricks/9/vol1
>
> Brick36: cpu04:/bricks/9/vol1
>
> Brick37: cpu01:/bricks/10/vol1
>
> Brick38: cpu02:/bricks/10/vol1
>
> Brick39: cpu03:/bricks/10/vol1
>
> Brick40: cpu04:/bricks/10/vol1
>
> Brick41: cpu01:/bricks/11/vol1
>
> Brick42: cpu02:/bricks/11/vol1
>
> Brick43: cpu03:/bricks/11/vol1
>
> Brick44: cpu04:/bricks/11/vol1
>
> Brick45: cpu01:/bricks/12/vol1
>
> Brick46: cpu02:/bricks/12/vol1
>
> Brick47: cpu03:/bricks/12/vol1
>
> Brick48: cpu04:/bricks/12/vol1
>
> Brick49: cpu01:/bricks/13/vol1
>
> Brick50: cpu02:/bricks/13/vol1
>
> Brick51: cpu03:/bricks/13/vol1
>
> Brick52: cpu04:/bricks/13/vol1
>
> Brick53: cpu01:/bricks/14/vol1
>
> Brick54: cpu02:/bricks/14/vol1
>
> Brick55: cpu03:/bricks/14/vol1
>
> Brick56: cpu04:/bricks/14/vol1
>
> Brick57: cpu01:/bricks/15/vol1
>
> Brick58: cpu02:/bricks/15/vol1
>
> Brick59: cpu03:/bricks/15/vol1
>
> Brick60: cpu04:/bricks/15/vol1
>
> Brick61: cpu01:/bricks/16/vol1
>
> Brick62: cpu02:/bricks/16/vol1
>
> Brick63: cpu03:/bricks/16/vol1
>
> Brick64: cpu04:/bricks/16/vol1
>
> Brick65: cpu01:/bricks/17/vol1
>
> Brick66: cpu02:/bricks/17/vol1
>
> Brick67: cpu03:/bricks/17/vol1
>
> Brick68: cpu04:/bricks/17/vol1
>
> Brick69: cpu01:/bricks/18/vol1
>
> Brick70: cpu02:/bricks/18/vol1
>
> Brick71: cpu03:/bricks/18/vol1
>
> Brick72: cpu04:/bricks/18/vol1
>
> Brick73: cpu01:/bricks/19/vol1
>
> Brick74: cpu02:/bricks/19/vol1
>
> Brick75: cpu03:/bricks/19/vol1
>
> Brick76: cpu04:/bricks/19/vol1
>
> Brick77: cpu01:/bricks/20/vol1
>
> Brick78: cpu02:/bricks/20/vol1
>
> Brick79: cpu03:/bricks/20/vol1
>
> Brick80: cpu04:/bricks/20/vol1
>
> Brick81: cpu01:/bricks/21/vol1
>
> Brick82: cpu02:/bricks/21/vol1
>
> Brick83: cpu03:/bricks/21/vol1
>
> Brick84: cpu04:/bricks/21/vol1
>
> Brick85: cpu01:/bricks/22/vol1
>
> Brick86: cpu02:/bricks/22/vol1
>
> Brick87: cpu03:/bricks/22/vol1
>
> Brick88: cpu04:/bricks/22/vol1
>
> Brick89: cpu01:/bricks/23/vol1
>
> Brick90: cpu02:/bricks/23/vol1
>
> Brick91: cpu03:/bricks/23/vol1
>
> Brick92: cpu04:/bricks/23/vol1
>
> Brick93: cpu01:/bricks/24/vol1
>
> Brick94: cpu02:/bricks/24/vol1
>
> Brick95: cpu03:/bricks/24/vol1
>
> Brick96: cpu04:/bricks/24/vol1
>
> Options Reconfigured:
>
> diagnostics.count-fop-hits: on
>
> diagnostics.latency-measurement: on
>
> nfs.disable: on
>
> user.cifs: enable
>
> auth.allow: 10.10.0.*
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
> server.allow-insecure: on
>
> network.ping-timeout: 100
>
> [root@cpu01 log]#
>
>
>
> -----------------------------------------
>
>
>
> [root@cpu01 log]# gluster volume status
>
> Status of volume: ds01
>
> Gluster process Port Online
> Pid
>
>
> ------------------------------------------------------------------------------
>
> Brick cpu01:/bricks/1/vol1 49152 Y
> 33474
>
> Brick cpu02:/bricks/1/vol1 49152 Y
> 40717
>
> Brick cpu03:/bricks/1/vol1 49152 Y
> 18080
>
> Brick cpu04:/bricks/1/vol1 49152 Y
> 40447
>
> Brick cpu01:/bricks/2/vol1 49153 Y
> 33481
>
> Brick cpu02:/bricks/2/vol1 49153 Y
> 40724
>
> Brick cpu03:/bricks/2/vol1 49153 Y
> 18086
>
> Brick cpu04:/bricks/2/vol1 49153 Y
> 40453
>
> Brick cpu01:/bricks/3/vol1 49154 Y
> 33489
>
> Brick cpu02:/bricks/3/vol1 49154 Y
> 40731
>
> Brick cpu03:/bricks/3/vol1 49154 Y
> 18097
>
> Brick cpu04:/bricks/3/vol1 49154 Y
> 40460
>
> Brick cpu01:/bricks/4/vol1 49155 Y
> 33495
>
> Brick cpu02:/bricks/4/vol1 49155 Y
> 40738
>
> Brick cpu03:/bricks/4/vol1 49155 Y
> 18103
>
> Brick cpu04:/bricks/4/vol1 49155 Y
> 40468
>
> Brick cpu01:/bricks/5/vol1 49156 Y
> 33502
>
> Brick cpu02:/bricks/5/vol1 49156 Y
> 40745
>
> Brick cpu03:/bricks/5/vol1 49156 Y
> 18110
>
> Brick cpu04:/bricks/5/vol1 49156 Y
> 40474
>
> Brick cpu01:/bricks/6/vol1 49157 Y
> 33509
>
> Brick cpu02:/bricks/6/vol1 49157 Y
> 40752
>
> Brick cpu03:/bricks/6/vol1 49157 Y
> 18116
>
> Brick cpu04:/bricks/6/vol1 49157 Y
> 40481
>
> Brick cpu01:/bricks/7/vol1 49158 Y
> 33516
>
> Brick cpu02:/bricks/7/vol1 49158 Y
> 40759
>
> Brick cpu03:/bricks/7/vol1 49158 Y
> 18122
>
> Brick cpu04:/bricks/7/vol1 49158 Y
> 40488
>
> Brick cpu01:/bricks/8/vol1 49159 Y
> 33525
>
> Brick cpu02:/bricks/8/vol1 49159 Y
> 40766
>
> Brick cpu03:/bricks/8/vol1 49159 Y
> 18130
>
> Brick cpu04:/bricks/8/vol1 49159 Y
> 40495
>
> Brick cpu01:/bricks/9/vol1 49160 Y
> 33530
>
> Brick cpu02:/bricks/9/vol1 49160 Y
> 40773
>
> Brick cpu03:/bricks/9/vol1 49160 Y
> 18137
>
> Brick cpu04:/bricks/9/vol1 49160 Y
> 40502
>
> Brick cpu01:/bricks/10/vol1 49161 Y
> 33538
>
> Brick cpu02:/bricks/10/vol1 49161 Y
> 40780
>
> Brick cpu03:/bricks/10/vol1 49161 Y
> 18143
>
> Brick cpu04:/bricks/10/vol1 49161 Y
> 40509
>
> Brick cpu01:/bricks/11/vol1 49162 Y
> 33544
>
> Brick cpu02:/bricks/11/vol1 49162 Y
> 40787
>
> Brick cpu03:/bricks/11/vol1 49162 Y
> 18150
>
> Brick cpu04:/bricks/11/vol1 49162 Y
> 40516
>
> Brick cpu01:/bricks/12/vol1 49163 Y
> 33551
>
> Brick cpu02:/bricks/12/vol1 49163 Y
> 40794
>
> Brick cpu03:/bricks/12/vol1 49163 Y
> 18157
>
> Brick cpu04:/bricks/12/vol1 49163 Y
> 40692
>
> Brick cpu01:/bricks/13/vol1 49164 Y
> 33558
>
> Brick cpu02:/bricks/13/vol1 49164 Y
> 40801
>
> Brick cpu03:/bricks/13/vol1 49164 Y
> 18165
>
> Brick cpu04:/bricks/13/vol1 49164 Y
> 40700
>
> Brick cpu01:/bricks/14/vol1 49165 Y
> 33566
>
> Brick cpu02:/bricks/14/vol1 49165 Y
> 40809
>
> Brick cpu03:/bricks/14/vol1 49165 Y
> 18172
>
> Brick cpu04:/bricks/14/vol1 49165 Y
> 40706
>
> Brick cpu01:/bricks/15/vol1 49166 Y
> 33572
>
> Brick cpu02:/bricks/15/vol1 49166 Y
> 40815
>
> Brick cpu03:/bricks/15/vol1 49166 Y
> 18179
>
> Brick cpu04:/bricks/15/vol1 49166 Y
> 40714
>
> Brick cpu01:/bricks/16/vol1 49167 Y
> 33579
>
> Brick cpu02:/bricks/16/vol1 49167 Y
> 40822
>
> Brick cpu03:/bricks/16/vol1 49167 Y
> 18185
>
> Brick cpu04:/bricks/16/vol1 49167 Y
> 40722
>
> Brick cpu01:/bricks/17/vol1 49168 Y
> 33586
>
> Brick cpu02:/bricks/17/vol1 49168 Y
> 40829
>
> Brick cpu03:/bricks/17/vol1 49168 Y
> 18192
>
> Brick cpu04:/bricks/17/vol1 49168 Y
> 40727
>
> Brick cpu01:/bricks/18/vol1 49169 Y
> 33593
>
> Brick cpu02:/bricks/18/vol1 49169 Y
> 40836
>
> Brick cpu03:/bricks/18/vol1 49169 Y
> 18201
>
> Brick cpu04:/bricks/18/vol1 49169 Y
> 40735
>
> Brick cpu01:/bricks/19/vol1 49170 Y
> 33600
>
> Brick cpu02:/bricks/19/vol1 49170 Y
> 40843
>
> Brick cpu03:/bricks/19/vol1 49170 Y
> 18207
>
> Brick cpu04:/bricks/19/vol1 49170 Y
> 40741
>
> Brick cpu01:/bricks/20/vol1 49171 Y
> 33608
>
> Brick cpu02:/bricks/20/vol1 49171 Y
> 40850
>
> Brick cpu03:/bricks/20/vol1 49171 Y
> 18214
>
> Brick cpu04:/bricks/20/vol1 49171 Y
> 40748
>
> Brick cpu01:/bricks/21/vol1 49172 Y
> 33614
>
> Brick cpu02:/bricks/21/vol1 49172 Y
> 40858
>
> Brick cpu03:/bricks/21/vol1 49172 Y
> 18222
>
> Brick cpu04:/bricks/21/vol1 49172 Y
> 40756
>
> Brick cpu01:/bricks/22/vol1 49173 Y
> 33621
>
> Brick cpu02:/bricks/22/vol1 49173 Y
> 40864
>
> Brick cpu03:/bricks/22/vol1 49173 Y
> 18227
>
> Brick cpu04:/bricks/22/vol1 49173 Y
> 40762
>
> Brick cpu01:/bricks/23/vol1 49174 Y
> 33626
>
> Brick cpu02:/bricks/23/vol1 49174 Y
> 40869
>
> Brick cpu03:/bricks/23/vol1 49174 Y
> 18234
>
> Brick cpu04:/bricks/23/vol1 49174 Y
> 40769
>
> Brick cpu01:/bricks/24/vol1 49175 Y
> 33631
>
> Brick cpu02:/bricks/24/vol1 49175 Y
> 40874
>
> Brick cpu03:/bricks/24/vol1 49175 Y
> 18239
>
> Brick cpu04:/bricks/24/vol1 49175 Y
> 40774
>
> Self-heal Daemon on localhost N/A Y
> 33361
>
> Self-heal Daemon on cpu05 N/A Y
> 2353
>
> Self-heal Daemon on cpu04 N/A Y
> 40786
>
> Self-heal Daemon on cpu02 N/A Y
> 32442
>
> Self-heal Daemon on cpu03 N/A Y
> 18664
>
>
>
> Task Status of Volume ds01
>
>
> ------------------------------------------------------------------------------
>
> Task : Rebalance
>
> ID : 5db24b30-4b9f-4b65-8910-a7a0a6d327a4
>
> Status : completed
>
>
>
> [root@cpu01 log]#
>
>
>
> [root@cpu01 log]# gluster pool list
>
> UUID Hostname State
>
> 626c9360-8c09-480f-9707-116e67cc38e6 cpu02 Connected
>
> dc475d62-b035-4ee6-9006-6f03bf68bf24 cpu05 Connected
>
> 41b5b2ff-3671-47b4-b477-227a107e718d cpu03 Connected
>
> c0afe114-dfa7-407d-bad7-5a3f97a6f3fc cpu04 Connected
>
> 9b61b0a5-be78-4ac2-b6c0-2db588da5c35 localhost Connected
>
> [root@cpu01 log]#
>
>
>
> [image: Inline image 3]
>
>
>
> Thanks,
>
> Punit
>
>
>
> On Thu, Mar 19, 2015 at 2:53 PM, Michal Skrivanek <
> michal.skrivanek(a)redhat.com> wrote:
>
>
> On Mar 19, 2015, at 03:18 , Punit Dambiwal <hypunit(a)gmail.com> wrote:
>
> > Hi All,
> >
> > Is there any one have any idea about this problem...it seems it's bug
> either in Ovirt or Glusterfs...that's why no one has the idea about
> it....please correct me if i am wrong….
>
> Hi,
> as I said, storage access times out; so it seems to me as a gluster setup
> problem, the storage domain you have your VMs on is not working…
>
> Thanks,
> michal
>
>
> >
> > Thanks,
> > Punit
> >
> > On Wed, Mar 18, 2015 at 5:05 PM, Punit Dambiwal <hypunit(a)gmail.com>
> wrote:
> > Hi Michal,
> >
> > Would you mind to let me know the possible messedup things...i will
> check and try to resolve it....still i am communicating gluster community
> to resolve this issue...
> >
> > But in the ovirt....gluster setup is quite straight....so how come it
> will be messedup with reboot ?? if it can be messedup with reboot then it
> seems not good and stable technology for the production storage....
> >
> > Thanks,
> > Punit
> >
> > On Wed, Mar 18, 2015 at 3:51 PM, Michal Skrivanek <
> michal.skrivanek(a)redhat.com> wrote:
> >
> > On Mar 18, 2015, at 03:33 , Punit Dambiwal <hypunit(a)gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Is there any one from community can help me to solve this issue...??
> > >
> > > Thanks,
> > > Punit
> > >
> > > On Tue, Mar 17, 2015 at 12:52 PM, Punit Dambiwal <hypunit(a)gmail.com>
> wrote:
> > > Hi,
> > >
> > > I am facing one strange issue with ovirt/glusterfs....still didn't
> find this issue is related with glusterfs or Ovirt....
> > >
> > > Ovirt :- 3.5.1
> > > Glusterfs :- 3.6.1
> > > Host :- 4 Hosts (Compute+ Storage)...each server has 24 bricks
> > > Guest VM :- more then 100
> > >
> > > Issue :- When i deploy this cluster first time..it work well for
> me(all the guest VM created and running successfully)....but suddenly one
> day my one of the host node rebooted and none of the VM can boot up
> now...and failed with the following error "Bad Volume Specification"
> > >
> > > VMId :- d877313c18d9783ca09b62acf5588048
> > >
> > > VDSM Logs :-
http://ur1.ca/jxabi
> >
> > you've got timeouts while accessing storage…so I guess something got
> messed up on reboot, it may also be just a gluster misconfiguration…
> >
> > > Engine Logs :-
http://ur1.ca/jxabv
> > >
> > > ------------------------
> > > [root@cpu01 ~]# vdsClient -s 0 getVolumeInfo
> e732a82f-bae9-4368-8b98-dedc1c3814de 00000002-0002-0002-0002-000000000145
> 6d123509-6867-45cf-83a2-6d679b77d3c5 9030bb43-6bc9-462f-a1b9-f6d5a02fb180
> > > status = OK
> > > domain = e732a82f-bae9-4368-8b98-dedc1c3814de
> > > capacity = 21474836480
> > > voltype = LEAF
> > > description =
> > > parent = 00000000-0000-0000-0000-000000000000
> > > format = RAW
> > > image = 6d123509-6867-45cf-83a2-6d679b77d3c5
> > > uuid = 9030bb43-6bc9-462f-a1b9-f6d5a02fb180
> > > disktype = 2
> > > legality = LEGAL
> > > mtime = 0
> > > apparentsize = 21474836480
> > > truesize = 4562972672
> > > type = SPARSE
> > > children = []
> > > pool =
> > > ctime = 1422676305
> > > ---------------------
> > >
> > > I opened same thread earlier but didn't get any perfect answers to
> solve this issue..so i reopen it...
> > >
> > >
https://www.mail-archive.com/users@ovirt.org/msg25011.html
> > >
> > > Thanks,
> > > Punit
> > >
> > >
> > >
> >
> >
> >
>
>
>