[ovirt-users] [Gluster-users] VM failed to start | Bad volume specification
Thomas Holkenbrink
thomas.holkenbrink at fibercloud.com
Thu Mar 19 16:22:15 UTC 2015
I’ve seen this before. The system thinks the storage system us up and running and then attempts to utilize it.
The way I got around it was to put a delay in the startup of the gluster Node on the interface that the clients use to communicate.
I use a bonded link, I then add a LINKDELAY to the interface to get the underlying system up and running before the network comes up. This then causes Network dependent features to wait for the network to finish.
It adds about 10seconds to the startup time, in our environment it works well, you may not need as long of a delay.
CentOS
root at gls1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=static
USERCTL=no
NETMASK=255.255.248.0
IPADDR=10.10.1.17
MTU=9000
IPV6INIT=no
IPV6_AUTOCONF=no
NETWORKING_IPV6=no
NM_CONTROLLED=no
LINKDELAY=10
NAME="System Storage Bond0"
Hi Michal,
The Storage domain is up and running and mounted on all the host nodes...as i updated before that it was working perfectly before but just after reboot can not make the VM poweron...
[Inline image 1]
[Inline image 2]
[root at cpu01 log]# gluster volume info
Volume Name: ds01
Type: Distributed-Replicate
Volume ID: 369d3fdc-c8eb-46b7-a33e-0a49f2451ff6
Status: Started
Number of Bricks: 48 x 2 = 96
Transport-type: tcp
Bricks:
Brick1: cpu01:/bricks/1/vol1
Brick2: cpu02:/bricks/1/vol1
Brick3: cpu03:/bricks/1/vol1
Brick4: cpu04:/bricks/1/vol1
Brick5: cpu01:/bricks/2/vol1
Brick6: cpu02:/bricks/2/vol1
Brick7: cpu03:/bricks/2/vol1
Brick8: cpu04:/bricks/2/vol1
Brick9: cpu01:/bricks/3/vol1
Brick10: cpu02:/bricks/3/vol1
Brick11: cpu03:/bricks/3/vol1
Brick12: cpu04:/bricks/3/vol1
Brick13: cpu01:/bricks/4/vol1
Brick14: cpu02:/bricks/4/vol1
Brick15: cpu03:/bricks/4/vol1
Brick16: cpu04:/bricks/4/vol1
Brick17: cpu01:/bricks/5/vol1
Brick18: cpu02:/bricks/5/vol1
Brick19: cpu03:/bricks/5/vol1
Brick20: cpu04:/bricks/5/vol1
Brick21: cpu01:/bricks/6/vol1
Brick22: cpu02:/bricks/6/vol1
Brick23: cpu03:/bricks/6/vol1
Brick24: cpu04:/bricks/6/vol1
Brick25: cpu01:/bricks/7/vol1
Brick26: cpu02:/bricks/7/vol1
Brick27: cpu03:/bricks/7/vol1
Brick28: cpu04:/bricks/7/vol1
Brick29: cpu01:/bricks/8/vol1
Brick30: cpu02:/bricks/8/vol1
Brick31: cpu03:/bricks/8/vol1
Brick32: cpu04:/bricks/8/vol1
Brick33: cpu01:/bricks/9/vol1
Brick34: cpu02:/bricks/9/vol1
Brick35: cpu03:/bricks/9/vol1
Brick36: cpu04:/bricks/9/vol1
Brick37: cpu01:/bricks/10/vol1
Brick38: cpu02:/bricks/10/vol1
Brick39: cpu03:/bricks/10/vol1
Brick40: cpu04:/bricks/10/vol1
Brick41: cpu01:/bricks/11/vol1
Brick42: cpu02:/bricks/11/vol1
Brick43: cpu03:/bricks/11/vol1
Brick44: cpu04:/bricks/11/vol1
Brick45: cpu01:/bricks/12/vol1
Brick46: cpu02:/bricks/12/vol1
Brick47: cpu03:/bricks/12/vol1
Brick48: cpu04:/bricks/12/vol1
Brick49: cpu01:/bricks/13/vol1
Brick50: cpu02:/bricks/13/vol1
Brick51: cpu03:/bricks/13/vol1
Brick52: cpu04:/bricks/13/vol1
Brick53: cpu01:/bricks/14/vol1
Brick54: cpu02:/bricks/14/vol1
Brick55: cpu03:/bricks/14/vol1
Brick56: cpu04:/bricks/14/vol1
Brick57: cpu01:/bricks/15/vol1
Brick58: cpu02:/bricks/15/vol1
Brick59: cpu03:/bricks/15/vol1
Brick60: cpu04:/bricks/15/vol1
Brick61: cpu01:/bricks/16/vol1
Brick62: cpu02:/bricks/16/vol1
Brick63: cpu03:/bricks/16/vol1
Brick64: cpu04:/bricks/16/vol1
Brick65: cpu01:/bricks/17/vol1
Brick66: cpu02:/bricks/17/vol1
Brick67: cpu03:/bricks/17/vol1
Brick68: cpu04:/bricks/17/vol1
Brick69: cpu01:/bricks/18/vol1
Brick70: cpu02:/bricks/18/vol1
Brick71: cpu03:/bricks/18/vol1
Brick72: cpu04:/bricks/18/vol1
Brick73: cpu01:/bricks/19/vol1
Brick74: cpu02:/bricks/19/vol1
Brick75: cpu03:/bricks/19/vol1
Brick76: cpu04:/bricks/19/vol1
Brick77: cpu01:/bricks/20/vol1
Brick78: cpu02:/bricks/20/vol1
Brick79: cpu03:/bricks/20/vol1
Brick80: cpu04:/bricks/20/vol1
Brick81: cpu01:/bricks/21/vol1
Brick82: cpu02:/bricks/21/vol1
Brick83: cpu03:/bricks/21/vol1
Brick84: cpu04:/bricks/21/vol1
Brick85: cpu01:/bricks/22/vol1
Brick86: cpu02:/bricks/22/vol1
Brick87: cpu03:/bricks/22/vol1
Brick88: cpu04:/bricks/22/vol1
Brick89: cpu01:/bricks/23/vol1
Brick90: cpu02:/bricks/23/vol1
Brick91: cpu03:/bricks/23/vol1
Brick92: cpu04:/bricks/23/vol1
Brick93: cpu01:/bricks/24/vol1
Brick94: cpu02:/bricks/24/vol1
Brick95: cpu03:/bricks/24/vol1
Brick96: cpu04:/bricks/24/vol1
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: on
user.cifs: enable
auth.allow: 10.10.0.*
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
server.allow-insecure: on
network.ping-timeout: 100
[root at cpu01 log]#
-----------------------------------------
[root at cpu01 log]# gluster volume status
Status of volume: ds01
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick cpu01:/bricks/1/vol1 49152 Y 33474
Brick cpu02:/bricks/1/vol1 49152 Y 40717
Brick cpu03:/bricks/1/vol1 49152 Y 18080
Brick cpu04:/bricks/1/vol1 49152 Y 40447
Brick cpu01:/bricks/2/vol1 49153 Y 33481
Brick cpu02:/bricks/2/vol1 49153 Y 40724
Brick cpu03:/bricks/2/vol1 49153 Y 18086
Brick cpu04:/bricks/2/vol1 49153 Y 40453
Brick cpu01:/bricks/3/vol1 49154 Y 33489
Brick cpu02:/bricks/3/vol1 49154 Y 40731
Brick cpu03:/bricks/3/vol1 49154 Y 18097
Brick cpu04:/bricks/3/vol1 49154 Y 40460
Brick cpu01:/bricks/4/vol1 49155 Y 33495
Brick cpu02:/bricks/4/vol1 49155 Y 40738
Brick cpu03:/bricks/4/vol1 49155 Y 18103
Brick cpu04:/bricks/4/vol1 49155 Y 40468
Brick cpu01:/bricks/5/vol1 49156 Y 33502
Brick cpu02:/bricks/5/vol1 49156 Y 40745
Brick cpu03:/bricks/5/vol1 49156 Y 18110
Brick cpu04:/bricks/5/vol1 49156 Y 40474
Brick cpu01:/bricks/6/vol1 49157 Y 33509
Brick cpu02:/bricks/6/vol1 49157 Y 40752
Brick cpu03:/bricks/6/vol1 49157 Y 18116
Brick cpu04:/bricks/6/vol1 49157 Y 40481
Brick cpu01:/bricks/7/vol1 49158 Y 33516
Brick cpu02:/bricks/7/vol1 49158 Y 40759
Brick cpu03:/bricks/7/vol1 49158 Y 18122
Brick cpu04:/bricks/7/vol1 49158 Y 40488
Brick cpu01:/bricks/8/vol1 49159 Y 33525
Brick cpu02:/bricks/8/vol1 49159 Y 40766
Brick cpu03:/bricks/8/vol1 49159 Y 18130
Brick cpu04:/bricks/8/vol1 49159 Y 40495
Brick cpu01:/bricks/9/vol1 49160 Y 33530
Brick cpu02:/bricks/9/vol1 49160 Y 40773
Brick cpu03:/bricks/9/vol1 49160 Y 18137
Brick cpu04:/bricks/9/vol1 49160 Y 40502
Brick cpu01:/bricks/10/vol1 49161 Y 33538
Brick cpu02:/bricks/10/vol1 49161 Y 40780
Brick cpu03:/bricks/10/vol1 49161 Y 18143
Brick cpu04:/bricks/10/vol1 49161 Y 40509
Brick cpu01:/bricks/11/vol1 49162 Y 33544
Brick cpu02:/bricks/11/vol1 49162 Y 40787
Brick cpu03:/bricks/11/vol1 49162 Y 18150
Brick cpu04:/bricks/11/vol1 49162 Y 40516
Brick cpu01:/bricks/12/vol1 49163 Y 33551
Brick cpu02:/bricks/12/vol1 49163 Y 40794
Brick cpu03:/bricks/12/vol1 49163 Y 18157
Brick cpu04:/bricks/12/vol1 49163 Y 40692
Brick cpu01:/bricks/13/vol1 49164 Y 33558
Brick cpu02:/bricks/13/vol1 49164 Y 40801
Brick cpu03:/bricks/13/vol1 49164 Y 18165
Brick cpu04:/bricks/13/vol1 49164 Y 40700
Brick cpu01:/bricks/14/vol1 49165 Y 33566
Brick cpu02:/bricks/14/vol1 49165 Y 40809
Brick cpu03:/bricks/14/vol1 49165 Y 18172
Brick cpu04:/bricks/14/vol1 49165 Y 40706
Brick cpu01:/bricks/15/vol1 49166 Y 33572
Brick cpu02:/bricks/15/vol1 49166 Y 40815
Brick cpu03:/bricks/15/vol1 49166 Y 18179
Brick cpu04:/bricks/15/vol1 49166 Y 40714
Brick cpu01:/bricks/16/vol1 49167 Y 33579
Brick cpu02:/bricks/16/vol1 49167 Y 40822
Brick cpu03:/bricks/16/vol1 49167 Y 18185
Brick cpu04:/bricks/16/vol1 49167 Y 40722
Brick cpu01:/bricks/17/vol1 49168 Y 33586
Brick cpu02:/bricks/17/vol1 49168 Y 40829
Brick cpu03:/bricks/17/vol1 49168 Y 18192
Brick cpu04:/bricks/17/vol1 49168 Y 40727
Brick cpu01:/bricks/18/vol1 49169 Y 33593
Brick cpu02:/bricks/18/vol1 49169 Y 40836
Brick cpu03:/bricks/18/vol1 49169 Y 18201
Brick cpu04:/bricks/18/vol1 49169 Y 40735
Brick cpu01:/bricks/19/vol1 49170 Y 33600
Brick cpu02:/bricks/19/vol1 49170 Y 40843
Brick cpu03:/bricks/19/vol1 49170 Y 18207
Brick cpu04:/bricks/19/vol1 49170 Y 40741
Brick cpu01:/bricks/20/vol1 49171 Y 33608
Brick cpu02:/bricks/20/vol1 49171 Y 40850
Brick cpu03:/bricks/20/vol1 49171 Y 18214
Brick cpu04:/bricks/20/vol1 49171 Y 40748
Brick cpu01:/bricks/21/vol1 49172 Y 33614
Brick cpu02:/bricks/21/vol1 49172 Y 40858
Brick cpu03:/bricks/21/vol1 49172 Y 18222
Brick cpu04:/bricks/21/vol1 49172 Y 40756
Brick cpu01:/bricks/22/vol1 49173 Y 33621
Brick cpu02:/bricks/22/vol1 49173 Y 40864
Brick cpu03:/bricks/22/vol1 49173 Y 18227
Brick cpu04:/bricks/22/vol1 49173 Y 40762
Brick cpu01:/bricks/23/vol1 49174 Y 33626
Brick cpu02:/bricks/23/vol1 49174 Y 40869
Brick cpu03:/bricks/23/vol1 49174 Y 18234
Brick cpu04:/bricks/23/vol1 49174 Y 40769
Brick cpu01:/bricks/24/vol1 49175 Y 33631
Brick cpu02:/bricks/24/vol1 49175 Y 40874
Brick cpu03:/bricks/24/vol1 49175 Y 18239
Brick cpu04:/bricks/24/vol1 49175 Y 40774
Self-heal Daemon on localhost N/A Y 33361
Self-heal Daemon on cpu05 N/A Y 2353
Self-heal Daemon on cpu04 N/A Y 40786
Self-heal Daemon on cpu02 N/A Y 32442
Self-heal Daemon on cpu03 N/A Y 18664
Task Status of Volume ds01
------------------------------------------------------------------------------
Task : Rebalance
ID : 5db24b30-4b9f-4b65-8910-a7a0a6d327a4
Status : completed
[root at cpu01 log]#
[root at cpu01 log]# gluster pool list
UUID Hostname State
626c9360-8c09-480f-9707-116e67cc38e6 cpu02 Connected
dc475d62-b035-4ee6-9006-6f03bf68bf24 cpu05 Connected
41b5b2ff-3671-47b4-b477-227a107e718d cpu03 Connected
c0afe114-dfa7-407d-bad7-5a3f97a6f3fc cpu04 Connected
9b61b0a5-be78-4ac2-b6c0-2db588da5c35 localhost Connected
[root at cpu01 log]#
[Inline image 3]
Thanks,
Punit
On Thu, Mar 19, 2015 at 2:53 PM, Michal Skrivanek <michal.skrivanek at redhat.com<mailto:michal.skrivanek at redhat.com>> wrote:
On Mar 19, 2015, at 03:18 , Punit Dambiwal <hypunit at gmail.com<mailto:hypunit at gmail.com>> wrote:
> Hi All,
>
> Is there any one have any idea about this problem...it seems it's bug either in Ovirt or Glusterfs...that's why no one has the idea about it....please correct me if i am wrong….
Hi,
as I said, storage access times out; so it seems to me as a gluster setup problem, the storage domain you have your VMs on is not working…
Thanks,
michal
>
> Thanks,
> Punit
>
> On Wed, Mar 18, 2015 at 5:05 PM, Punit Dambiwal <hypunit at gmail.com<mailto:hypunit at gmail.com>> wrote:
> Hi Michal,
>
> Would you mind to let me know the possible messedup things...i will check and try to resolve it....still i am communicating gluster community to resolve this issue...
>
> But in the ovirt....gluster setup is quite straight....so how come it will be messedup with reboot ?? if it can be messedup with reboot then it seems not good and stable technology for the production storage....
>
> Thanks,
> Punit
>
> On Wed, Mar 18, 2015 at 3:51 PM, Michal Skrivanek <michal.skrivanek at redhat.com<mailto:michal.skrivanek at redhat.com>> wrote:
>
> On Mar 18, 2015, at 03:33 , Punit Dambiwal <hypunit at gmail.com<mailto:hypunit at gmail.com>> wrote:
>
> > Hi,
> >
> > Is there any one from community can help me to solve this issue...??
> >
> > Thanks,
> > Punit
> >
> > On Tue, Mar 17, 2015 at 12:52 PM, Punit Dambiwal <hypunit at gmail.com<mailto:hypunit at gmail.com>> wrote:
> > Hi,
> >
> > I am facing one strange issue with ovirt/glusterfs....still didn't find this issue is related with glusterfs or Ovirt....
> >
> > Ovirt :- 3.5.1
> > Glusterfs :- 3.6.1
> > Host :- 4 Hosts (Compute+ Storage)...each server has 24 bricks
> > Guest VM :- more then 100
> >
> > Issue :- When i deploy this cluster first time..it work well for me(all the guest VM created and running successfully)....but suddenly one day my one of the host node rebooted and none of the VM can boot up now...and failed with the following error "Bad Volume Specification"
> >
> > VMId :- d877313c18d9783ca09b62acf5588048
> >
> > VDSM Logs :- http://ur1.ca/jxabi
>
> you've got timeouts while accessing storage…so I guess something got messed up on reboot, it may also be just a gluster misconfiguration…
>
> > Engine Logs :- http://ur1.ca/jxabv
> >
> > ------------------------
> > [root at cpu01 ~]# vdsClient -s 0 getVolumeInfo e732a82f-bae9-4368-8b98-dedc1c3814de 00000002-0002-0002-0002-000000000145 6d123509-6867-45cf-83a2-6d679b77d3c5 9030bb43-6bc9-462f-a1b9-f6d5a02fb180
> > status = OK
> > domain = e732a82f-bae9-4368-8b98-dedc1c3814de
> > capacity = 21474836480
> > voltype = LEAF
> > description =
> > parent = 00000000-0000-0000-0000-000000000000
> > format = RAW
> > image = 6d123509-6867-45cf-83a2-6d679b77d3c5
> > uuid = 9030bb43-6bc9-462f-a1b9-f6d5a02fb180
> > disktype = 2
> > legality = LEGAL
> > mtime = 0
> > apparentsize = 21474836480
> > truesize = 4562972672
> > type = SPARSE
> > children = []
> > pool =
> > ctime = 1422676305
> > ---------------------
> >
> > I opened same thread earlier but didn't get any perfect answers to solve this issue..so i reopen it...
> >
> > https://www.mail-archive.com/users@ovirt.org/msg25011.html
> >
> > Thanks,
> > Punit
> >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150319/403d232e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 10223 bytes
Desc: image002.png
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150319/403d232e/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 11779 bytes
Desc: image004.png
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150319/403d232e/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 22346 bytes
Desc: image006.png
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150319/403d232e/attachment-0005.png>
More information about the Users
mailing list