adding gluster pool list:
UUID Hostname State
2c86fa95-67a2-492d-abf0-54da625417f8
vmm12.mydomain.com Connected
ab099e72-0f56-4d33-a16b-ba67d67bdf9d
vmm13.mydomain.com Connected
c35ad74d-1f83-4032-a459-079a27175ee4
vmm14.mydomain.com Connected
aeb7712a-e74e-4492-b6af-9c266d69bfd3
vmm17.mydomain.com Connected
4476d434-d6ff-480f-b3f1-d976f642df9c
vmm16.mydomain.com Connected
22ec0c0a-a5fc-431c-9f32-8b17fcd80298
vmm15.mydomain.com Connected
caf84e9f-3e03-4e6f-b0f8-4c5ecec4bef6
vmm18.mydomain.com Connected
18385970-aba6-4fd1-85a6-1b13f663e60b
vmm10.mydomain.com * Disconnected
//server that went bad.*
b152fd82-8213-451f-93c6-353e96aa3be9
vmm102.mydomain.com Connected
//vmm10 but with different name
228a9282-c04e-4229-96a6-67cb47629892 localhost
Connected
On Tue, Jun 11, 2019 at 11:24 AM Adrian Quintero <adrianquintero(a)gmail.com>
wrote:
Strahil,
Looking at your suggestions I think I need to provide a bit more info on
my current setup.
1.
I have 9 hosts in total
2.
I have 5 storage domains:
-
hosted_storage (Data Master)
-
vmstore1 (Data)
-
data1 (Data)
-
data2 (Data)
-
ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not
let me upload disk images to a data domain without an ISO (I think this is
due to a bug)
3.
Each volume is of the type “Distributed Replicate” and each one is
composed of 9 bricks.
I started with 3 bricks per volume due to the initial Hyperconverged
setup, then I expanded the cluster and the gluster cluster by 3 hosts at a
time until I got to a total of 9 hosts.
-
*Disks, bricks and sizes used per volume / dev/sdb engine 100GB / dev/sdb
vmstore1 2600GB / dev/sdc data1 2600GB / dev/sdd data2 2600GB / dev/sde
-------- 400GB SSD Used for caching purposes From the above layout a few
questions came up:*
1.
*Using the web UI, How can I create a 100GB brick and a 2600GB brick to
replace the bad bricks for “engine” and “vmstore1” within the same block
device (sdb) ? What about / dev/sde (caching disk), When I tried creating a
new brick thru the UI I saw that I could use / dev/sde for caching but only
for 1 brick (i.e. vmstore1) so if I try to create another brick how would I
specify it is the same / dev/sde device to be used for caching?*
1.
If I want to remove a brick and it being a replica 3, I go to storage
> Volumes > select the volume > bricks once in there I can select the 3
servers that compose the replicated bricks and click remove, this gives a
pop-up window with the following info:
Are you sure you want to remove the following Brick(s)?
- vmm11:/gluster_bricks/vmstore1/vmstore1
- vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1
- 192.168.0.100:/gluster-bricks/vmstore1/vmstore1
- Migrate Data from the bricks?
If I proceed with this that means I will have to do this for all the 4
volumes, that is just not very efficient, but if that is the only way, then
I am hesitant to put this into a real production environment as there is no
way I can take that kind of a hit for +500 vms :) and also I wont have
that much storage or extra volumes to play with in a real sceneario.
2.
After modifying yesterday */ etc/vdsm/vdsm.id <
http://vdsm.id> by
following
(
https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids
<
https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids>) I
was able to add the server **back **to the cluster using a new fqdn
and a new IP, and tested replacing one of the bricks and this is my mistake
as mentioned in #3 above I used / dev/sdb entirely for 1 brick because thru
the UI I could not separate the block device and be used for 2 bricks (one
for the engine and one for vmstore1). **So in the “gluster vol info”
you might see
vmm102.mydomain.com <
http://vmm102.mydomain.com> *
*but in reality it is
myhost1.mydomain.com <
http://myhost1.mydomain.com> *
3.
*I am also attaching gluster_peer_status.txt * *and in the last 2
entries of that file you will see and entry
vmm10.mydomain.com
<
http://vmm10.mydomain.com> (old/bad entry) and
vmm102.mydomain.com
<
http://vmm102.mydomain.com> (new entry, same server vmm10, but renamed to
vmm102). *
*Also please find gluster_vol_info.txt file. *
4.
*I am ready *
*to redeploy this environment if needed, but I am also ready to test any
other suggestion. If I can get a good understanding on how to recover from
this I will be ready to move to production. *
5.
*Wondering if you’d be willing to have a look at my setup through a shared
screen? *
*Thanks *
*Adrian*
On Mon, Jun 10, 2019 at 11:41 PM Strahil <hunter86_bg(a)yahoo.com> wrote:
> Hi Adrian,
>
> You have several options:
> A) If you have space on another gluster volume (or volumes) or on
> NFS-based storage, you can migrate all VMs live . Once you do it, the
> simple way will be to stop and remove the storage domain (from UI) and
> gluster volume that correspond to the problematic brick. Once gone, you
> can remove the entry in oVirt for the old host and add the newly built
> one.Then you can recreate your volume and migrate the data back.
>
> B) If you don't have space you have to use a more riskier approach
> (usually it shouldn't be risky, but I had bad experience in gluster v3):
> - New server has same IP and hostname:
> Use command line and run the 'gluster volume reset-brick VOLNAME
> HOSTNAME:BRICKPATH HOSTNAME:BRICKPATH commit'
> Replace VOLNAME with your volume name.
> A more practical example would be:
> 'gluster volume reset-brick data ovirt3:/gluster_bricks/data/brick
> ovirt3:/gluster_ ricks/data/brick commit'
>
> If it refuses, then you have to cleanup '/gluster_bricks/data' (which
> should be empty).
> Also check if the new peer has been probed via 'gluster peer
> status'.Check the firewall is allowing gluster communication (you can
> compare it to the firewalls on another gluster host).
>
> The automatic healing will kick in 10 minutes (if it succeeds) and will
> stress the other 2 replicas, so pick your time properly.
> Note: I'm not recommending you to use the 'force' option in the previous
> command ... for now :)
>
> - The new server has a different IP/hostname:
> Instead of 'reset-brick' you can use 'replace-brick':
> It should be like this:
> gluster volume replace-brick data old-server:/path/to/brick
> new-server:/new/path/to/brick commit force
>
> In both cases check the status via:
> gluster volume info VOLNAME
>
> If your cluster is in production , I really recommend you the first
> option as it is less risky and the chance for unplanned downtime will be
> minimal.
>
> The 'reset-brick' in your previous e-mail shows that one of the servers
> is not connected. Check peer status on all servers, if they are less than
> they should check for network and/or firewall issues.
> On the new node check if glusterd is enabled and running.
>
> In order to debug - you should provide more info like 'gluster volume
> info' and the peer status from each node.
>
> Best Regards,
> Strahil Nikolov
>
> On Jun 10, 2019 20:10, Adrian Quintero <adrianquintero(a)gmail.com> wrote:
>
> >
> > Can you let me know how to fix the gluster and missing brick?,
> > I tried removing it by going to "storage > Volumes > vmstore >
bricks >
> selected the brick
> > However it is showing as an unknown status (which is expected because
> the server was completely wiped) so if I try to "remove", "replace
brick"
> or "reset brick" it wont work
> > If i do remove brick: Incorrect bricks selected for removal in
> Distributed Replicate volume. Either all the selected bricks should be from
> the same sub volume or one brick each for every sub volume!
> > If I try "replace brick" I cant because I dont have another server
with
> extra bricks/disks
> > And if I try "reset brick": Error while executing action Start
Gluster
> Volume Reset Brick: Volume reset brick commit force failed: rc=-1 out=()
> err=['Host myhost1_mydomain_com not connected']
> >
> > Are you suggesting to try and fix the gluster using command line?
> >
> > Note that I cant "peer detach" the sever , so if I force the
removal
> of the bricks would I need to force downgrade to replica 2 instead of 3?
> what would happen to oVirt as it only supports replica 3?
> >
> > thanks again.
> >
> > On Mon, Jun 10, 2019 at 12:52 PM Strahil <hunter86_bg(a)yahoo.com> wrote:
>
> >>
> >> Hi Adrian,
> >> Did you fix the issue with the gluster and the missing brick?
> >> If yes, try to set the 'old' host in maintenance an
>
>
--
Adrian Quintero
--
Adrian Quintero