Adding a GlusterFS Storage Domain
by simon@justconnect.ie
This may seem like a simple question but clarification from the experts would be good.
When adding a GlusterFS Storage Domain via the engine, do I use the Gluster Storage FQDNs for the Path and backup-volfile-servers mount options
Or
Do I use the ovirt management network FQDNs.
Kind Regards
Simon...
2 years, 3 months
Single Peer in Gluster Cluster Failure caused Storage Domain outage
by simon@justconnect.ie
Hi All,
We have a 3 node HCI cluster with Gluster 2+1 volumes.
The first node had a hardware memory failure which caused file corruption to the engine lv and the server would only boot into maintenance mode.
For some reason glusterd wouldn't start and one of the volumes became inaccessible with the Storage domain going offline. This caused multiple VMs to go into a paused or shutdown state.
Putting the host into maintenance mode and then shutting it down was done in an attempt to allow gluster to continue across 2 nodes (one being the arbiter). Unfortunately this didn't work.
The solution was to do the following:
1. Remove the contents of /var/lib/glusterd except for glusterd.info
2. Start glusterd
3. Peer probe one of the other 2 peers
4. Restart glusterd
5. Cross fingers and toes
Although this was a successful outcome I would like to know why losing 1 gluster peer caused the outage of a single storage domain and therefore outages of VMs with disks on that storage domain.
Kind Regards
Simon...
2 years, 3 months
Problem with engine deployment
by varekoarfa@gmail.com
HI everyone, hope all is good.
OS: Centos Stream
ovirt 4.5
I'm having problems deploying the hosted engine both through cockpit and cli.
I have 3 servers, where through cockpit, I have managed to configure and deploy glusterfs without problems. but when I want to deploy the hosted engine it tells me "No valid network interface has been found".
the 3 servers have 2 nic each one, I have created a bond in each one with cockpit and with the name bond0 and in XOR mode.
if someone can help me, please.
ansible packages installed:
[root@vs05 pre_checks]# rpq -qa | ansi
-bash: ansi: no se encontró la orden
-bash: rpq: no se encontró la orden
[root@vs05 pre_checks]# rpq -qa |grep ansi
-bash: rpq: no se encontró la orden
[root@vs05 pre_checks]# rpm -qa |grep ansi
ansible-collection-ansible-posix-1.3.0-1.2.el8.noarch
ansible-collection-ansible-netcommon-2.2.0-3.2.el8.noarch
ansible-collection-ansible-utils-2.3.0-2.2.el8.noarch
gluster-ansible-maintenance-1.0.1-12.el8.noarch
gluster-ansible-features-1.0.5-15.el8.noarch
ovirt-ansible-collection-2.1.0-1.el8.noarch
gluster-ansible-cluster-1.0-5.el8.noarch
gluster-ansible-repositories-1.0.1-5.el8.noarch
ansible-core-2.12.7-1.el8.x86_64
gluster-ansible-roles-1.0.5-28.el8.noarch
gluster-ansible-infra-1.0.4-22.el8.noarch
2 years, 3 months
Q: "vdsm-client Host getAllTasksInfo" Fails (Removing frozen task)
by Andrei Verovski
Hi,
I have frozen task (creating snapshot) on one of my oVirt nodes, and trying to kill it.
However,
# sudo vdsm-client Host getAllTasksInfo
vdsm-client: Command Host.getAllTasksInfo with args {} failed:
However, sudo "vdsm-client Host getVMList” and "sudo vdsm-client Host getVMFullList" working fine.
# /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -v -Z
select exists (select * from information_schema.tables where table_schema = 'public' and table_name = 'command_entities');
t
SELECT command_id,command_type,root_command_id,command_parameters,command_params_class,created_at,status,return_value,return_value_class,executed FROM GetAllCommandsWithZombieTasks();
So only letter “t” is printed and it is not clear what does it mean.
I can run "/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -vR” but not sure its not too destructive.
What is the problem with "vdsm-client Host getAllTasksInfo” ?
Thanks in advance.
Andrei
2 years, 3 months
Host certificate expired
by Rob B
Hi,
We have a ovirt host in a 'Unassigned' state because its certificate has expired.
The ovirt events show...
VDSM host1 command Get Host Capabilities failed: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
This is the only host in the cluster, and has local storage so I don't have any options to start the single VM elsewhere.
Is there a way to renew the certificate on this host? I have no option to put the host in maintenance mode and 'Enroll Certificate' as its in the unassigned state.
The oVirt manager is running version: 4.4.10.7-1.el8
The oVirt host in the bad state is running: ovirt-host-4.4.9-2, vdsm-4.40.100.2-1.
Please let me know if you need any more info, and thanks in advance.
Rob
2 years, 3 months
HostedEngine Restore woes
by simon@justconnect.ie
Hi All,
I've been asked to test the HE restore process but after taking a look at the documentation I'm afraid I'm none the wiser.
I thought there would be a simple 'restore in situ' option but it appears not.
My environments were build using ansible with a hostedengine .json answer file.
From what I've read so far it appears that a new HE VM needs to be built with new engine storage etc
2 years, 3 months
Re: Six node HCI
by Strahil Nikolov
It would make more sense to odentify the root cause.Can you check if all oVirt nodes are in the TSP:gluster pool listAlso, check that the Engine's volume is available from all nodes -> should be mounted and listable:ls -l /rhev/data-center/mnt/glusterSD/<host>:_<volume name>
Best Regards,Strahil Nikolov
On Tue, Aug 2, 2022 at 8:45, Gaurang Patel<gaurang.patel(a)allotgroup.com> wrote: <!--#yiv2856725588 _filtered {} _filtered {} _filtered {}#yiv2856725588 #yiv2856725588 p.yiv2856725588MsoNormal, #yiv2856725588 li.yiv2856725588MsoNormal, #yiv2856725588 div.yiv2856725588MsoNormal {margin:0cm;font-size:11.0pt;font-family:"Calibri", sans-serif;}#yiv2856725588 span.yiv2856725588EmailStyle17 {font-family:"Calibri", sans-serif;color:windowtext;}#yiv2856725588 .yiv2856725588MsoChpDefault {font-family:"Calibri", sans-serif;} _filtered {}#yiv2856725588 div.yiv2856725588WordSection1 {}-->
Hi,
I am very new to Ovirt 6 Node Gluster.
We can create three node cluster successfully with the 4.5.1 version but after expanding the gluster cluster with additional 3 nodes, Ovirt-engine is going to not responding mode.
If you can help me with step by step document would be very greatful to us.
Thank in advance.
Gaurang Patel
DISCLAIMER : The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future._______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/S7KBUPG2KPX...
2 years, 3 months
Moving raw/sparse disk from NFS to iSCSI fails on oVirt 4.5.1
by Guillaume Pavese
On a 4.5.1 DC, I have imported a vm and its disk from an old 4.3 DC
(through an export domain if that's relevant)
The DC/Cluster compat level is 4.7 and the vm was upgraded to it.
"Original custom compatibility version 4.3 of imported VM xxx is not
supported. Changing it to the lowest supported version: 4.7."
The disk is raw and sparse :
<format>raw</format>
<sparse>true</sparse>
I initially put the VM's disks on an NFS storage domain, but I want to move
the disks to an iSCSI one
However, after copying data for a while the task fails "User has failed to
move disk VM-TEMPLATE-COS7_Disk1 to domain iSCSI-STO-FR-301"
in engine.log :
qemu-img: error while writing at byte xxx: No space left on device
2022-07-21 08:58:23,240+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48)
[65fed1dc-e33b-471e-bc49-8b9662400e5f] FINISH, GetHostJobsVDSCommand,
return:
{0aa2d519-8130-4e2f-bc4f-892e5f7b5206=HostJobInfo:{id='0aa2d519-8130-4e2f-bc4f-892e5f7b5206',
type='storage', description='copy_data', status='failed', progress='79',
error='VDSError:{code='GeneralException', message='General Exception:
("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T',
'none', '-f', 'raw', '-O', 'qcow2', '-o', 'compat=1.1',
'/rhev/data-center/mnt/svc-int-prd-sto-fr-301.hostics.fr:_volume1_ovirt-int-2_data/1ce95c4a-2ec5-47b7-bd24-e540165c6718/images/d3c33cc7-f2c3-4613-84d0-d3c9fa3d5ebd/2c4a0041-b18b-408f-9c0d-971c19a552ea',
'/rhev/data-center/mnt/blockSD/b5dc9c01-3749-4326-99c5-f84f683190bd/images/d3c33cc7-f2c3-4613-84d0-d3c9fa3d5ebd/2c4a0041-b18b-408f-9c0d-971c19a552ea']
failed with rc=1 out=b'' err=bytearray(b'qemu-img: error while writing at
byte 13639873536: No space left on device\\n')",)'}'}}, log id: 73f77495
2022-07-21 08:58:23,241+02 INFO
[org.ovirt.engine.core.bll.StorageJobCallback]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48)
[65fed1dc-e33b-471e-bc49-8b9662400e5f] Command CopyData id:
'521bdf57-8379-40ce-a682-af859fb0cad7': job
'0aa2d519-8130-4e2f-bc4f-892e5f7b5206' execution was completed with VDSM
job status 'failed'
I do want the conversion from raw/sparse to qcow2/sparse to happen, as I
want to activate incremental backups.
I think that it may fail because the virtual size is bigger than the
initial size, as I think someone as explained on this list earlier? Can
anybody confirm?
It seems to be a pretty common use case to support though?
Guillaume Pavese
Ingénieur Système et Réseau
Interactiv-Group
--
Ce message et toutes les pièces jointes (ci-après le “message”) sont
établis à l’intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur, merci de le détruire et d’en avertir
immédiatement l’expéditeur. Toute utilisation de ce message non conforme a
sa destination, toute diffusion ou toute publication, totale ou partielle,
est interdite, sauf autorisation expresse. L’internet ne permettant pas
d’assurer l’intégrité de ce message . Interactiv-group (et ses filiales)
décline(nt) toute responsabilité au titre de ce message, dans l’hypothèse
ou il aurait été modifié. IT, ES, UK.
<https://interactiv-group.com/disclaimer.html>
2 years, 3 months