Gluster network - associate brick
by r greg
hi all,
*** new to oVirt and still learning ***
Sorry for the long thread...
I have a 3x node hyperconverged setup on v4.5.1.
4x 1G NICS
NIC0
> ovirtmgmt (Hosted-Engine VM)
> vmnetwork vlan102 (all VMs are placed on this network)
NIC1
> migration
NIC2 - NIC3 > bond0
> storage
Logical Networks:
ovirtmgmt - role: VM network | management | display | default route
vmnetwork - role: VM network
migrate - role: migration network
storage - role: gluster network
During deployment, I overlooked a setting and on node2 the host was deployed with Name: node2.mydomain.lab --- Hostname/IP: 172.16.20.X/24 (WebUI > Compute > Hosts)
I suspect because of this I see the following entries on /var/log/ovirt-engine/engine.log (only for node2)
2022-08-04 12:00:15,460Z WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [] Could not associate brick 'node2.mydomain.lab:/gluster_bricks/vmstore/vmstore' of volume '1ca6a01a-9230-4bb1-844e-8064f3eadb53' with correct network as no gluster network found in cluster '1770ade4-0f6f-11ed-b8f6-00163e6faae8'
Is this something I need to be worried about or correct somehow?
From node1
gluster> peer status
Number of Peers: 2
Hostname: node2.mydomain.lab
Uuid: a4468bb0-a3b3-42bc-9070-769da5a13427
State: Peer in Cluster (Connected)
Other names:
172.16.20.X
Hostname: node3.mydomain.lab
Uuid: 2b1273a4-667e-4925-af5e-00904988595a
State: Peer in Cluster (Connected)
Other names:
172.16.20.Z
volume status (same output Online Y --- for volumes vmstore and engine )
Status of volume: data
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick node1.mydomain.lab:/gluster_brick
s/data/data 58734 0 Y 31586
Brick node2.mydomain.lab:/gluster_brick
s/data/data 55148 0 Y 4317
Brick node3.mydomain.lab:/gluster_brick
s/data/data 57021 0 Y 5242
Self-heal Daemon on localhost N/A N/A Y 63170
Self-heal Daemon on node2.mydomain.lab N/A N/A Y 4365
Self-heal Daemon on node3.mydomain.lab N/A N/A Y 5385
2 years, 5 months
VM snapshot stuck
by Jirka Simon
Hello Ovirt folks,
last two days we have troubles with stuck snapshot task on some vms. and
VM is frozen and unresponsible.
task progress is still null
I can unlock it with command unlock_entity.sh but VM stays frozen. and
VM has to be force powered down and started again
in engine log i can see only that master job is waiting for an other job.
oVirt Version 4.4.10.7-1.el8
OS Rocky Linux 8.6
last full update was last week on Thurstday
Log:
2022-08-05 09:10:05,461+02 INFO
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35)
[8ff04bca-b5a2-4ba3-8925-1037c5a2850e] Command 'CreateSnapshotForVm'
(id: '93f98c9c-2936-4615-b12e-c295b90eace0') waiting on child command
id: '2f94c337-487e-40f7-b8f8-df69225d5c79'
type:'CreateLiveSnapshotForVm' to complete
2022-08-05 09:10:05,462+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35)
[8ff04bca-b5a2-4ba3-8925-1037c5a2850e] START,
GetHostJobsVDSCommand(HostName = ovirt3.corp.sldev.cz,
GetHostJobsVDSCommandParameters:{hostId='15d9a919-63d5-4673-95d2-e84662ef0852',
type='virt', jobIds='[6e73cda8-85d0-471c-b090-4c89719a97f4]'}), log id:
515c7e0d
2022-08-05 09:10:05,466+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35)
[8ff04bca-b5a2-4ba3-8925-1037c5a2850e] FINISH, GetHostJobsVDSCommand,
return:
{6e73cda8-85d0-471c-b090-4c89719a97f4=HostJobInfo:{id='6e73cda8-85d0-471c-b090-4c89719a97f4',
type='virt', description='snapshot_vm', status='running',
progress='null', error='null'}}, log id: 515c7e0d
2022-08-05 09:10:05,466+02 INFO
[org.ovirt.engine.core.bll.VirtJobCallback]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35)
[8ff04bca-b5a2-4ba3-8925-1037c5a2850e] Command CreateLiveSnapshotForVm
id: '2f94c337-487e-40f7-b8f8-df69225d5c79': waiting for job
'6e73cda8-85d0-471c-b090-4c89719a97f4' on host 'ovirt3.corp.sldev.cz'
(id: '15d9a919-63d5-4673-95d2-e84662ef0852') to complete
any idea what should I check ?
thank you for any help.
Jirka
2 years, 5 months
Cannot Enable incremental backup
by Jonas
Hello all
I'm trying to create incremental backups for my VMs on a testing cluster
and am using the functions from
https://gitlab.com/nirs/ovirt-stress/-/blob/master/backup/backup.py. So
far it works well, but on some disks it is not possible to enable
incremental backups even when the VM is powered off (see screenshot
below). Does anyone know why this might be the case and how to activate
it? I think I already checked the docs and didn't find anything but feel
free to nudge me in the right direction.
By the way, is there a backup solution that is somewhat endorsed by the
community here?
Thank you and kind regards,
Jonas
Screenshot of oVirt Disk
2 years, 5 months
Adding a GlusterFS Storage Domain
by simon@justconnect.ie
This may seem like a simple question but clarification from the experts would be good.
When adding a GlusterFS Storage Domain via the engine, do I use the Gluster Storage FQDNs for the Path and backup-volfile-servers mount options
Or
Do I use the ovirt management network FQDNs.
Kind Regards
Simon...
2 years, 5 months
Single Peer in Gluster Cluster Failure caused Storage Domain outage
by simon@justconnect.ie
Hi All,
We have a 3 node HCI cluster with Gluster 2+1 volumes.
The first node had a hardware memory failure which caused file corruption to the engine lv and the server would only boot into maintenance mode.
For some reason glusterd wouldn't start and one of the volumes became inaccessible with the Storage domain going offline. This caused multiple VMs to go into a paused or shutdown state.
Putting the host into maintenance mode and then shutting it down was done in an attempt to allow gluster to continue across 2 nodes (one being the arbiter). Unfortunately this didn't work.
The solution was to do the following:
1. Remove the contents of /var/lib/glusterd except for glusterd.info
2. Start glusterd
3. Peer probe one of the other 2 peers
4. Restart glusterd
5. Cross fingers and toes
Although this was a successful outcome I would like to know why losing 1 gluster peer caused the outage of a single storage domain and therefore outages of VMs with disks on that storage domain.
Kind Regards
Simon...
2 years, 5 months
Problem with engine deployment
by varekoarfa@gmail.com
HI everyone, hope all is good.
OS: Centos Stream
ovirt 4.5
I'm having problems deploying the hosted engine both through cockpit and cli.
I have 3 servers, where through cockpit, I have managed to configure and deploy glusterfs without problems. but when I want to deploy the hosted engine it tells me "No valid network interface has been found".
the 3 servers have 2 nic each one, I have created a bond in each one with cockpit and with the name bond0 and in XOR mode.
if someone can help me, please.
ansible packages installed:
[root@vs05 pre_checks]# rpq -qa | ansi
-bash: ansi: no se encontró la orden
-bash: rpq: no se encontró la orden
[root@vs05 pre_checks]# rpq -qa |grep ansi
-bash: rpq: no se encontró la orden
[root@vs05 pre_checks]# rpm -qa |grep ansi
ansible-collection-ansible-posix-1.3.0-1.2.el8.noarch
ansible-collection-ansible-netcommon-2.2.0-3.2.el8.noarch
ansible-collection-ansible-utils-2.3.0-2.2.el8.noarch
gluster-ansible-maintenance-1.0.1-12.el8.noarch
gluster-ansible-features-1.0.5-15.el8.noarch
ovirt-ansible-collection-2.1.0-1.el8.noarch
gluster-ansible-cluster-1.0-5.el8.noarch
gluster-ansible-repositories-1.0.1-5.el8.noarch
ansible-core-2.12.7-1.el8.x86_64
gluster-ansible-roles-1.0.5-28.el8.noarch
gluster-ansible-infra-1.0.4-22.el8.noarch
2 years, 5 months
Q: "vdsm-client Host getAllTasksInfo" Fails (Removing frozen task)
by Andrei Verovski
Hi,
I have frozen task (creating snapshot) on one of my oVirt nodes, and trying to kill it.
However,
# sudo vdsm-client Host getAllTasksInfo
vdsm-client: Command Host.getAllTasksInfo with args {} failed:
However, sudo "vdsm-client Host getVMList” and "sudo vdsm-client Host getVMFullList" working fine.
# /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -v -Z
select exists (select * from information_schema.tables where table_schema = 'public' and table_name = 'command_entities');
t
SELECT command_id,command_type,root_command_id,command_parameters,command_params_class,created_at,status,return_value,return_value_class,executed FROM GetAllCommandsWithZombieTasks();
So only letter “t” is printed and it is not clear what does it mean.
I can run "/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -vR” but not sure its not too destructive.
What is the problem with "vdsm-client Host getAllTasksInfo” ?
Thanks in advance.
Andrei
2 years, 5 months
Host certificate expired
by Rob B
Hi,
We have a ovirt host in a 'Unassigned' state because its certificate has expired.
The ovirt events show...
VDSM host1 command Get Host Capabilities failed: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
This is the only host in the cluster, and has local storage so I don't have any options to start the single VM elsewhere.
Is there a way to renew the certificate on this host? I have no option to put the host in maintenance mode and 'Enroll Certificate' as its in the unassigned state.
The oVirt manager is running version: 4.4.10.7-1.el8
The oVirt host in the bad state is running: ovirt-host-4.4.9-2, vdsm-4.40.100.2-1.
Please let me know if you need any more info, and thanks in advance.
Rob
2 years, 5 months
HostedEngine Restore woes
by simon@justconnect.ie
Hi All,
I've been asked to test the HE restore process but after taking a look at the documentation I'm afraid I'm none the wiser.
I thought there would be a simple 'restore in situ' option but it appears not.
My environments were build using ansible with a hostedengine .json answer file.
From what I've read so far it appears that a new HE VM needs to be built with new engine storage etc
2 years, 5 months