New failure Gluster deploy: Set granual-entry-heal on --> Bricks down
by Charles Lam
Dear friends,
Thanks to Donald and Strahil, my earlier Gluster deploy issue was resolved by disabling multipath on the nvme drives. The Gluster deployment is now failing on the three node hyperconverged oVirt v4.3.3 deployment at:
TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] **********
task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
with:
"stdout": "One or more bricks could be down. Please execute the command
again after bringing all bricks online and finishing any pending heals\nVolume heal
failed."
Specifically:
TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] **********
task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine',
'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"engine", "granular-entry-heal", "enable"],
"delta": "0:00:10.112451", "end": "2020-12-18
19:50:22.818741", "item": {"arbiter": 0, "brick":
"/gluster_bricks/engine/engine", "volname": "engine"},
"msg": "non-zero return code", "rc": 107, "start":
"2020-12-18 19:50:12.706290", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick':
'/gluster_bricks/data/data', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"data", "granular-entry-heal", "enable"], "delta":
"0:00:10.110165", "end": "2020-12-18 19:50:38.260277",
"item": {"arbiter": 0, "brick":
"/gluster_bricks/data/data", "volname": "data"},
"msg": "non-zero return code", "rc": 107, "start":
"2020-12-18 19:50:28.150112", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore',
'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"vmstore", "granular-entry-heal", "enable"],
"delta": "0:00:10.113203", "end": "2020-12-18
19:50:53.767864", "item": {"arbiter": 0, "brick":
"/gluster_bricks/vmstore/vmstore", "volname": "vmstore"},
"msg": "non-zero return code", "rc": 107, "start":
"2020-12-18 19:50:43.654661", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
Any suggestions regarding troubleshooting, insight or recommendations for reading are greatly appreciated. I apologize for all the email and am only creating this as a separate thread as it is a new, presumably unrelated issue. I welcome any recommendations if I can improve my forum etiquette.
Respectfully,
Charles
3 years, 2 months
Re: VM templates
by Strahil Nikolov
You should create a file like mine, cause vdsm manages /etc/multipathd.conf
# cat /etc/multipath/conf.d/blacklist.confblacklist { devnode "*" wwid nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-00000001 wwid TOSHIBA-TR200_Z7KB600SK46S wwid ST500NM0011_Z1M00LM7 wwid WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189 wwid WDC_WD15EADS-00P8B0_WD-WMAVU0885453 wwid WDC_WD5003ABYZ-011FA0_WD-WMAYP0F35PJ4}
Keep in mind 'devnode *' is OK only for gluster-only machine.
Best Regards,Strahil Nikolov
Sent from Yahoo Mail on Android
On Wed, Jan 27, 2021 at 6:02, Robert Tongue<phunyguy(a)neverserio.us> wrote: _______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YD7ROMATPWF...
3 years, 2 months
Re: VM templates
by Robert Tongue
Correction, the issue came back, but I fixed it again, the actual issue was multipathd. I had to set up device filters in /etc/multipath.conf
blacklist {
protocol "(scsi:adt|scsi:sbp)"
devnode "^hd[a-z]"
devnode "^sd[a-z]$"
devnode "^sd[a-z]"
devnode "^nvme0n1"
devnode "^nvme0n1p$"blacklist {
}
Probably overkill, but it works.
________________________________
From: Robert Tongue <phunyguy(a)neverserio.us>
Sent: Tuesday, January 26, 2021 2:24 PM
To: users <users(a)ovirt.org>
Subject: Re: VM templates
I fixed my own issue, and for everyone else that may run into this, the issue was the fact that I created the first oVirt node VM inside VMware, and got it fully configured with all the software/disks/partitioning/settings, then cloned it to two more VMs. Then I ran the hosted-engine deployment and set up the cluster. I think it was because I used clones for each cluster node, and that confused things due to device/system identifiers.
I rebuilt all 3 node VMs from scratch, and everything works perfectly now.
Thanks for listening.
________________________________
From: Robert Tongue
Sent: Monday, January 25, 2021 10:03 AM
To: users <users(a)ovirt.org>
Subject: VM templates
Hello,
Another weird issue over here. I have the latest oVirt running inside VMware Vcenter, as a proof of concept/testing platform. Things are working well finally, for the most part, however I am noticing strange behavior with templates, and deployed VMs from that template. Let me explain:
I created a basic Ubuntu Server VM, captured that VM as a template, then deployed 4 VMs from that template. The deployment went fine; however I can only start 3 of the 4 VMs. If I shut one down one of the 3 that I started, I can then start the other one that refused to start, then the one I JUST shut down will then refuse to start. The error is:
VM test3 is down with error. Exit message: Bad volume specification {'device': 'disk', 'type': 'disk', 'diskType': 'file', 'specParams': {}, 'alias': 'ua-2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'address': {'bus': '0', 'controller': '0', 'unit': '0', 'type': 'drive', 'target': '0'}, 'domainID': '804c6a0c-b246-4ccc-b3ab-dd4ceb819cea', 'imageID': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'poolID': '3208bbce-5e04-11eb-9313-00163e281c6d', 'volumeID': 'f514ab22-07ae-40e4-9146-1041d78553fd', 'path': '/rhev/data-center/3208bbce-5e04-11eb-9313-00163e281c6d/804c6a0c-b246-4ccc-b3ab-dd4ceb819cea/images/2dc7fbff-da30-485d-891f-03a0ed60fd0a/f514ab22-07ae-40e4-9146-1041d78553fd', 'discard': True, 'format': 'cow', 'propagateErrors': 'off', 'cache': 'none', 'iface': 'scsi', 'name': 'sda', 'bootOrder': '1', 'serial': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'index': 0, 'reqsize': '0', 'truesize': '2882392576', 'apparentsize': '3435134976'}.
The underlying storage is GlusterFS, self-managed outside of oVirt.
I can provide any logs needed, please let me know which. Thanks in advance.
3 years, 2 months
OVN and change of mgmt network
by Gianluca Cecchi
Hello,
I previously had OVN running on engine (as OVN provider with northd and
northbound and southbound DBs) and hosts (with OVN controller).
After changing mgmt ip of hosts (engine has retained instead the same ip),
I executed again on them the command:
vdsm-tool ovn-config <ip_of_engine> <nel_local_ip_of_host>
Now I think I have to clean up some things, eg:
1) On engine
where I get these lines below
systemctl status ovn-northd.service -l
. . .
Sep 29 14:41:42 ovmgr1 ovsdb-server[940]: ovs|00005|reconnect|ERR|tcp:
10.4.167.40:37272: no response to inactivity probe after 5 seconds,
disconnecting
Oct 03 11:52:00 ovmgr1 ovsdb-server[940]: ovs|00006|reconnect|ERR|tcp:
10.4.167.41:52078: no response to inactivity probe after 5 seconds,
disconnecting
The two IPs are the old ones of two hosts
It seems that a restart of the services has fixed...
Can anyone confirm if I have to do anything else?
2) On hosts (there are 3 hosts with OVN on ip 10.4.192.32/33/34)
where I currently have this output
[root@ov301 ~]# ovs-vsctl show
3a38c5bb-0abf-493d-a2e6-345af8aedfe3
Bridge br-int
fail_mode: secure
Port "ovn-1dce5b-0"
Interface "ovn-1dce5b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.32"}
Port "ovn-ddecf0-0"
Interface "ovn-ddecf0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.33"}
Port "ovn-fd413b-0"
Interface "ovn-fd413b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.168.74"}
Port br-int
Interface br-int
type: internal
ovs_version: "2.7.2"
[root@ov301 ~]#
The IPs of kind 10.4.192.x are ok.
But there is a left-over of an old host I initially used for tests,
corresponding to 10.4.168.74, that now doesn't exist anymore
How can I clean records for 1) and 2)?
Thanks,
Gianluca
3 years, 2 months
Re: VM templates
by Robert Tongue
I fixed my own issue, and for everyone else that may run into this, the issue was the fact that I created the first oVirt node VM inside VMware, and got it fully configured with all the software/disks/partitioning/settings, then cloned it to two more VMs. Then I ran the hosted-engine deployment and set up the cluster. I think it was because I used clones for each cluster node, and that confused things due to device/system identifiers.
I rebuilt all 3 node VMs from scratch, and everything works perfectly now.
Thanks for listening.
________________________________
From: Robert Tongue
Sent: Monday, January 25, 2021 10:03 AM
To: users <users(a)ovirt.org>
Subject: VM templates
Hello,
Another weird issue over here. I have the latest oVirt running inside VMware Vcenter, as a proof of concept/testing platform. Things are working well finally, for the most part, however I am noticing strange behavior with templates, and deployed VMs from that template. Let me explain:
I created a basic Ubuntu Server VM, captured that VM as a template, then deployed 4 VMs from that template. The deployment went fine; however I can only start 3 of the 4 VMs. If I shut one down one of the 3 that I started, I can then start the other one that refused to start, then the one I JUST shut down will then refuse to start. The error is:
VM test3 is down with error. Exit message: Bad volume specification {'device': 'disk', 'type': 'disk', 'diskType': 'file', 'specParams': {}, 'alias': 'ua-2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'address': {'bus': '0', 'controller': '0', 'unit': '0', 'type': 'drive', 'target': '0'}, 'domainID': '804c6a0c-b246-4ccc-b3ab-dd4ceb819cea', 'imageID': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'poolID': '3208bbce-5e04-11eb-9313-00163e281c6d', 'volumeID': 'f514ab22-07ae-40e4-9146-1041d78553fd', 'path': '/rhev/data-center/3208bbce-5e04-11eb-9313-00163e281c6d/804c6a0c-b246-4ccc-b3ab-dd4ceb819cea/images/2dc7fbff-da30-485d-891f-03a0ed60fd0a/f514ab22-07ae-40e4-9146-1041d78553fd', 'discard': True, 'format': 'cow', 'propagateErrors': 'off', 'cache': 'none', 'iface': 'scsi', 'name': 'sda', 'bootOrder': '1', 'serial': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'index': 0, 'reqsize': '0', 'truesize': '2882392576', 'apparentsize': '3435134976'}.
The underlying storage is GlusterFS, self-managed outside of oVirt.
I can provide any logs needed, please let me know which. Thanks in advance.
3 years, 2 months
Replaced host but ovn complains of duplicate port
by Kevin Doyle
I had to rebuild a host. I first removed it from Ovirt then reinstalled the OS. I then used the same 192.xxx.xxx.207 IP as the old host. I then added it to Ovirt which seem to go OK. However I am seeing lots of ovn errors complaining that there was an existing port for the same IP
Port ovn-47cc88-0 new port is ovn-af7f78-0
ovs-vsctl show
34c43e58-46f5-4217-8f2e-5801e1f2b9de
Bridge br-int
fail_mode: secure
Port "ovn-24b972-0"
Interface "ovn-24b972-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.204"}
Port "ovn-ec1bbd-0"
Interface "ovn-ec1bbd-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.201"}
Port "ovn-47cc88-0"
Interface "ovn-47cc88-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.207"}
Port "ovn-d1e09d-0"
Interface "ovn-d1e09d-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.203"}
Port "ovn-f5ded7-0"
Interface "ovn-f5ded7-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.202"}
Port "ovn-1cb1b0-0"
Interface "ovn-1cb1b0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.206"}
Port br-int
Interface br-int
type: internal
Port "ovn-af7f78-0"
Interface "ovn-af7f78-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.207"}
error: "could not add network device ovn-af7f78-0 to ofproto (File exists)"
ovs_version: "2.10.1"
/ var/log/ovs-vswitchd.log
tunnel|WARN|ovn-af7f78-0: attempting to add tunnel port with same config as port 'ovn-47cc88-0' (::->192.xxx.xxx.207, key=flow, legacy_l2, dp port=2)
2021-01-26T18:03:38.128Z|69740|ofproto|WARN|br-int: could not add port ovn-af7f78-0 (File exists)
2021-01-26T18:03:38.128Z|69741|bridge|WARN|could not add network device ovn-af7f78-0 to ofproto (File exists)
/var/log/messages
Jan 26 18:03:37 xxxx05 kernel: device genev_sys_6081 entered promiscuous mode
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.0 eno3: UDP port 6081 was not found, not deleting
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.1 eno4: UDP port 6081 was not found, not deleting
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.2 eno5: UDP port 6081 was not found, not deleting
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.3 eno6: UDP port 6081 was not found, not deleting
I can bring the host 192.xxx.xxx.107 down and the errors stop on the other hosts but ovn still has a port defined for 192.xxx.xxx.207 The question I have is how do I delete the old port definition ?
Regards
Kevin
3 years, 2 months
Is there a way hot to reserve MAC address ?
by Michal Mocnak
Hey Guys,
is there a way how to reserve MAC address for VMs created from the pool ? I have a workaround via ansible playbook which works fine but I am wondering if there is any other solution how to achieve this.
And btw if i use predefined instance type other than Custom there are no network interfaces in newly created machines. Is it right behavior ?
Thank you so much !! Cheers
-m
3 years, 2 months