Change gluster primary

We are using Gluster as our storage backend. Gluster is configured as 2 node replica. The two nodes are name nix and jupiter. At the Ovirt (RHEV really) end we have the gluster path configured as "nix:/gluster-rhev", with a mount option of "backupvolfile-server=jupiter.om.net". We now need to replace nix with a new server, which cannot have the same name. That new server will be the primary, with jupiter remaining the secondary. We will have all VMs and hypervisors shut down when we make this change. What is the best and/or easiest way to do this? Should we just disconnect the storage and re-attach it using the new gluster primary? If we do that will our VMs just work or do we need to take other steps? An alternative, which I suspect will be somewhat controversial, would be to make a direct edit of the engine database. Would that work any better or does that add more dangers (assuming the edit is done correctly)? regards, John

Hi John, There isn't really a primary in gluster. If you're using a glusterfs storage domain, you could turn off "nix" and the VMs would continue to run (although you'd have to disable quorum if you currently have it enabled on the volume, and you'd have to repoint the domain at some later point). If you're using NFS access you would have to repoint your storage to the remaining machine immediately. The only snag I can see is that you can't detach the master storage domain in Ovirt if any VMs are running. I think you'd have to shut the VMs down, put the storage domain into maintenance, and then edit it. Cheers Alex On 19/01/15 23:44, John Gardeniers wrote:
We are using Gluster as our storage backend. Gluster is configured as 2 node replica. The two nodes are name nix and jupiter. At the Ovirt (RHEV really) end we have the gluster path configured as "nix:/gluster-rhev", with a mount option of "backupvolfile-server=jupiter.om.net". We now need to replace nix with a new server, which cannot have the same name. That new server will be the primary, with jupiter remaining the secondary.
We will have all VMs and hypervisors shut down when we make this change.
What is the best and/or easiest way to do this? Should we just disconnect the storage and re-attach it using the new gluster primary? If we do that will our VMs just work or do we need to take other steps?
An alternative, which I suspect will be somewhat controversial, would be to make a direct edit of the engine database. Would that work any better or does that add more dangers (assuming the edit is done correctly)?
regards, John
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- This message is intended only for the addressee and may contain confidential information. Unless you are that person, you may not disclose its contents or use it in any way and are requested to delete the message along with any attachments and notify us immediately. "Transact" is operated by Integrated Financial Arrangements plc. 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300. (Registered office: as above; Registered in England and Wales under number: 3727592). Authorised and regulated by the Financial Conduct Authority (entered on the Financial Services Register; no. 190856).

Hi Alex, I understand what you're saying and certainly there is no primary from the Gluster perspective. However, things are quite different as far as Ovirt/RHEV is concerned. We had an incident last week where we had to take nix off-line. A network glitch then caused a our RHEV to briefly lose connection to jupiter. This resulted in all VMs crashing because the system was trying to reconnect to nix. It did not try to reconnect to jupiter, despite it being configured as the fail-over server. In the end I had to bring nix back on line. RHEV still wouldn't connect. Finally, I had to reboot each hypervisor. Even then, two of them still failed to reconnect and could only be brought back by performing a full reinstall (we're using the cut-down dedicated RH hypervisors, not the RHEL+hypervisor that you use). All in all, quite a disastrous situation that lost us a couple of hours. So yes, there is a primary from the Ovirt/RHEV perspective and I'm really disappointed in how the system completely failed to handled the situation. regards, John On 21/01/15 00:20, Alex Crow wrote:
Hi John,
There isn't really a primary in gluster. If you're using a glusterfs storage domain, you could turn off "nix" and the VMs would continue to run (although you'd have to disable quorum if you currently have it enabled on the volume, and you'd have to repoint the domain at some later point). If you're using NFS access you would have to repoint your storage to the remaining machine immediately.
The only snag I can see is that you can't detach the master storage domain in Ovirt if any VMs are running. I think you'd have to shut the VMs down, put the storage domain into maintenance, and then edit it.
Cheers
Alex
On 19/01/15 23:44, John Gardeniers wrote:
We are using Gluster as our storage backend. Gluster is configured as 2 node replica. The two nodes are name nix and jupiter. At the Ovirt (RHEV really) end we have the gluster path configured as "nix:/gluster-rhev", with a mount option of "backupvolfile-server=jupiter.om.net". We now need to replace nix with a new server, which cannot have the same name. That new server will be the primary, with jupiter remaining the secondary.
We will have all VMs and hypervisors shut down when we make this change.
What is the best and/or easiest way to do this? Should we just disconnect the storage and re-attach it using the new gluster primary? If we do that will our VMs just work or do we need to take other steps?
An alternative, which I suspect will be somewhat controversial, would be to make a direct edit of the engine database. Would that work any better or does that add more dangers (assuming the edit is done correctly)?
regards, John
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 20/01/15 20:46, John Gardeniers wrote:
Hi Alex,
I understand what you're saying and certainly there is no primary from the Gluster perspective. However, things are quite different as far as Ovirt/RHEV is concerned.
We had an incident last week where we had to take nix off-line. A network glitch then caused a our RHEV to briefly lose connection to jupiter. This resulted in all VMs crashing because the system was trying to reconnect to nix. It did not try to reconnect to jupiter, despite it being configured as the fail-over server.
Hi, As for the above, if you had quorum configured on the gluster side (either by applying the relevant recommended options on gluster, or by having created the volume on ovirt), loss of storage functionality is to be expected. In a two-node cluster if one goes down you lose quorum and the volume will become read only. In this case Ovirt should really pause the VMs.
In the end I had to bring nix back on line. RHEV still wouldn't connect. Finally, I had to reboot each hypervisor. Even then, two of them still failed to reconnect and could only be brought back by performing a full reinstall (we're using the cut-down dedicated RH hypervisors, not the RHEL+hypervisor that you use). All in all, quite a disastrous situation that lost us a couple of hours. So yes, there is a primary from the Ovirt/RHEV perspective and I'm really disappointed in how the system completely failed to handled the situation.
Looks like there are some bugs there. When we have had storage issues on RHEV we see all our VMS pausing, not crashing. BTW we do use the ded. hypervisor (like oVirt "node). Cheers Alex
regards, John
On 21/01/15 00:20, Alex Crow wrote:
Hi John,
There isn't really a primary in gluster. If you're using a glusterfs storage domain, you could turn off "nix" and the VMs would continue to run (although you'd have to disable quorum if you currently have it enabled on the volume, and you'd have to repoint the domain at some later point). If you're using NFS access you would have to repoint your storage to the remaining machine immediately.
The only snag I can see is that you can't detach the master storage domain in Ovirt if any VMs are running. I think you'd have to shut the VMs down, put the storage domain into maintenance, and then edit it.
Cheers
Alex
On 19/01/15 23:44, John Gardeniers wrote:
We are using Gluster as our storage backend. Gluster is configured as 2 node replica. The two nodes are name nix and jupiter. At the Ovirt (RHEV really) end we have the gluster path configured as "nix:/gluster-rhev", with a mount option of "backupvolfile-server=jupiter.om.net". We now need to replace nix with a new server, which cannot have the same name. That new server will be the primary, with jupiter remaining the secondary.
We will have all VMs and hypervisors shut down when we make this change.
What is the best and/or easiest way to do this? Should we just disconnect the storage and re-attach it using the new gluster primary? If we do that will our VMs just work or do we need to take other steps?
An alternative, which I suspect will be somewhat controversial, would be to make a direct edit of the engine database. Would that work any better or does that add more dangers (assuming the edit is done correctly)?
regards, John
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (2)
-
Alex Crow
-
John Gardeniers