Gluster problems, cluster performance issues
by Jim Kusznir
Hello:
I've been having some cluster and gluster performance issues lately. I
also found that my cluster was out of date, and was trying to apply updates
(hoping to fix some of these), and discovered the ovirt 4.1 repos were
taken completely offline. So, I was forced to begin an upgrade to 4.2.
According to docs I found/read, I needed only add the new repo, do a yum
update, reboot, and be good on my hosts (did the yum update, the
engine-setup on my hosted engine). Things seemed to work relatively well,
except for a gluster sync issue that showed up.
My cluster is a 3 node hyperconverged cluster. I upgraded the hosted
engine first, then engine 3. When engine 3 came back up, for some reason
one of my gluster volumes would not sync. Here's sample output:
[root@ovirt3 ~]# gluster volume heal data-hdd info
Brick 172.172.1.11:/gluster/brick3/data-hdd
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
Status: Connected
Number of entries: 8
Brick 172.172.1.12:/gluster/brick3/data-hdd
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
Status: Connected
Number of entries: 8
Brick 172.172.1.13:/gluster/brick3/data-hdd
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
Status: Connected
Number of entries: 8
---------
Its been in this state for a couple days now, and bandwidth monitoring
shows no appreciable data moving. I've tried repeatedly commanding a full
heal from all three clusters in the node. Its always the same files that
need healing.
When running gluster volume heal data-hdd statistics, I see sometimes
different information, but always some number of "heal failed" entries. It
shows 0 for split brain.
I'm not quite sure what to do. I suspect it may be due to nodes 1 and 2
still being on the older ovirt/gluster release, but I'm afraid to upgrade
and reboot them until I have a good gluster sync (don't need to create a
split brain issue). How do I proceed with this?
Second issue: I've been experiencing VERY POOR performance on most of my
VMs. To the tune that logging into a windows 10 vm via remote desktop can
take 5 minutes, launching quickbooks inside said vm can easily take 10
minutes. On some linux VMs, I get random messages like this:
Message from syslogd@unifi at May 28 20:39:23 ...
kernel:[6171996.308904] NMI watchdog: BUG: soft lockup - CPU#0 stuck for
22s! [mongod:14766]
(the process and PID are often different)
I'm not quite sure what to do about this either. My initial thought was
upgrad everything to current and see if its still there, but I cannot move
forward with that until my gluster is healed...
Thanks!
--Jim
6 years, 5 months
logical_name of virtual machine disk not seen in the ovirt engine
by Punaatua PK
Hello,
i used the vm_backup script as provide here : https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/vm_bac...
I understood the process to backup VM. I'm stuck at getting the logical_name of the disk when the snapshot disk is attached in a VM.
I checked the flow like this:
- The disk is attached on the VM
- oVirt guest agent detect the new disk and the mapping seen on the LOG (i did put those log in DEBUG in /etc/ovirt-guest-agent.conf), i also reduce the report_disk_usage to 10 to speed up the process
- VDSM on the host get the info from the ovirt guest agent seen by running the following command vdsm-client Host getVMFullList
- On the engine the logical_name is empty seen with the following sql request select device,type,device_id, logical_name from vm_device where type='disk' and vm_id='XXX';
ovirt-engine-4.2.3.5-1.el7.centos.noarch
vdsm-4.20.27.1-1.el7.centos.x86_64
ovirt-guest-agent-common-1.0.14-1.el7.noarch
Do you have an idea ? Is the information request by the engine to VDSM ? VDSM does report to the engine ? What is the flow to get the logical_name populated in the engine DB, python SDK ?
I can provide logs if needed. I juste don't know how to enable debug on VDSM (I will take a look for this)
6 years, 5 months
Ovirt consulting
by Jonathan Del Campo
Hello,
I am looking for a specialized Ovirt partner in France in order to operate
a migration of our system from 3.x to 4.x.
Can someone recommend an Ovirt service partner to perform this kind of
operations ?
many thanks.
Jonathan
6 years, 5 months
ovirt 4.2.3 console.vv provide IP addr, not FQHN for host= (spice connection)
by Oliver Riesener
Hi,
i recently upgraded to ovirt 4.2.3-[78] and noticed that the spice
client got and use
host=IPAddr
from received console.vv file and not
host=FQHN
of the virtualization host.
I need the FQHN because i access ovirt-webui remote via ssh with port
forwarding.
I added a "127.0.0.1 FQHN" entry in /etc/hosts on client side to access
remote ports.
Please could everyone provide a < 4.2.3 console.vv file ?
How can i change the behavior back to FQHN?
How can i re enable WebSocket Proxy ? It's permanent False in engine-setup?
Greetings
Oliver
6 years, 5 months
engine UI add user failed
by dhy336@sina.com
Hi I want to add user by engine UI,but failed. is it need other soft package, or should I do ?
6 years, 5 months
how to use oVirt engine API to retrieve the attached VM list by a disk id.
by iterjpnic
Hi all,
I use oVirt engine API v4.2 for implementing terraform oVirt provider. And I want to check if a Disk has been attached to a VM, so I need to find all vms attached by this disk.
But after I checked the GET-Response data from the "/ovirt-engine/api/disks/<disk-id>" rest url, there has no disk-attachment/vm related properties or links. I could get a trade-off, by the following steps:
1. getting all vms
2. get all disk-attachments of each vm
3. check if the given disk id equals to the `disk` property of each disk-attachment`
4. If equals, append the vm to result list
Is there any simpler and smarter way to get this? Thanks.
6 years, 5 months