New setup - Failing to Activate storage domain on NFS shared storage
by Matt Snow
I installed ovirt node 4.4.4 as well as 4.4.5-pre and experience the same problem with both versions. The issue occurs in both cockpit UI and tmux'd CLI of ovirt-hosted-engine-setup. I get passed the point where the VM is created and running.
I tried to do some debugging on my own before reaching out to this list. Any help is much appreciated!
ovirt node hardware: NUC format Jetway w/ Intel N3160 (Braswell 4 cores/4threads), 8GB RAM, 64GB SSD. I understand this is underspec'd, but I believe it meets the minimum requirements.
NFS server:
* Ubuntu 19.10 w/ ZFS share w/ 17TB available space.
* NFS share settings are just 'rw=(a)172.16.1.0/24' but have also tried 'rw,sec=sys,anon=0' and '@172.16.1.0/24,insecure'
* The target directory is always empty and chown'd 36:36 with 0755 permissions.
* I have tried using both IP and DNS names. forward and reverse DNS works from ovirt host and other systems on the network.
* The NFS share always gets mounted successfully on the ovirt node system.
* I have tried auto and v3 NFS versions in other various combinations.
* I have also tried setting up an NFS server on a non-ZFS backed storage system that is open to any host and get the same errors as shown below.
* I ran nfs-check.py script without issue against both NFS servers and followed other verification steps listed on https://www.ovirt.org/develop/troubleshooting-nfs-storage-issues.html
***Snip from ovirt-hosted-engine-setup***
Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: nfs
Please specify the nfs version you would like to use (auto, v3, v4, v4_0, v4_1, v4_2)[auto]:
Please specify the full shared storage connection path to use (example: host:/path): stumpy.mydomain.com:/tanker/ovirt/host_storage
If needed, specify additional mount options for the connection to the hosted-engine storagedomain (example: rsize=32768,wsize=32768) []: rw
[ INFO ] Creating Storage Domain
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Force facts gathering]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Wait for the storage interface to be up]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Check local VM dir stat]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Enforce local VM dir existence]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch host facts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch cluster ID]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch cluster facts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter facts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter ID]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter name]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add NFS storage domain]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add glusterfs storage domain]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add iSCSI storage domain]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add Fibre Channel storage domain]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Get storage domain details]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Find the appliance OVF]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Parse OVF]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Get required size]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Remove unsuitable storage domain]
[ INFO ] skipping: [localhost]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Check storage domain free space]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Activate storage domain]
[ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}
Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]:
***End snippet**
The relevant section from /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-ansible-create_storage_domain-20210116195220-bnfg1w.log:
**Snip***
2021-01-16 19:53:56,010-0700 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f11a04c8320> kwargs
2021-01-16 19:53:57,219-0700 INFO ansible task start {'status': 'OK', 'ansible_type': 'task', 'ansible_playbook': '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_task': 'ovirt.ovirt.hosted_engine_setup : Activate storage domain'}
2021-01-16 19:53:57,220-0700 DEBUG ansible on_any args TASK: ovirt.ovirt.hosted_engine_setup : Activate storage domain kwargs is_conditional:False
2021-01-16 19:53:57,221-0700 DEBUG ansible on_any args localhost TASK: ovirt.ovirt.hosted_engine_setup : Activate storage domain kwargs
2021-01-16 19:54:00,346-0700 DEBUG var changed: host "localhost" var "ansible_play_hosts" type "<class 'list'>" value: "[]"
2021-01-16 19:54:00,347-0700 DEBUG var changed: host "localhost" var "ansible_play_batch" type "<class 'list'>" value: "[]"
2021-01-16 19:54:00,348-0700 DEBUG var changed: host "localhost" var "play_hosts" type "<class 'list'>" value: "[]"
2021-01-16 19:54:00,349-0700 ERROR ansible failed {
"ansible_host": "localhost",
"ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
"ansible_result": {
"_ansible_no_log": false,
"changed": false,
"exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_storage_domain_payload_f38n25ab/ansible_ovirt_storage_domain_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_storage_domain.py\", line 783, in main\n File \"/tmp/ansible_ovirt_storage_domain_payload_f38n25ab/ansible_ovirt_storage_domain_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_storage_domain.py\", line 638, in post_create_check\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py\", line 3647, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n return future.wait() if wait else future\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 55, in wait\n return self._code(response)\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 229, in callback\n self._check_fault(respon
se)\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n self._raise_error(response, body)\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n raise error\novirtsdk4.Error: Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400.\n",
"invocation": {
"module_args": {
"backup": null,
"comment": null,
"critical_space_action_blocker": null,
"data_center": "Default",
"description": null,
"destroy": null,
"discard_after_delete": null,
"domain_function": "data",
"fcp": null,
"fetch_nested": false,
"format": null,
"glusterfs": null,
"host": "brick.mydomain.com",
"id": null,
"iscsi": null,
"localfs": null,
"managed_block_storage": null,
"name": "hosted_storage",
"nested_attributes": [],
"nfs": null,
"poll_interval": 3,
"posixfs": null,
"state": "present",
"timeout": 180,
"wait": true,
"warning_low_space": null,
"wipe_after_delete": null
}
},
"msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."
},
"ansible_task": "Activate storage domain",
"ansible_type": "task",
"status": "FAILED",
"task_duration": 4
}
2021-01-16 19:54:00,350-0700 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f11a05f2d30> kwargs ignore_errors:None
2021-01-16 19:54:00,358-0700 INFO ansible stats {
"ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
"ansible_playbook_duration": "01:35 Minutes",
"ansible_result": "type: <class 'dict'>\nstr: {'localhost': {'ok': 23, 'failures': 1, 'unreachable': 0, 'changed': 0, 'skipped': 7, 'rescued': 0, 'ignored': 0}}",
"ansible_type": "finish",
"status": "FAILED"
}
2021-01-16 19:54:00,359-0700 INFO SUMMARY:
Duration Task Name
-------- --------
[ < 1 sec ] Execute just a specific set of steps
[ 00:05 ] Force facts gathering
[ 00:03 ] Check local VM dir stat
[ 00:04 ] Obtain SSO token using username/password credentials
[ 00:04 ] Fetch host facts
[ 00:02 ] Fetch cluster ID
[ 00:04 ] Fetch cluster facts
[ 00:04 ] Fetch Datacenter facts
[ 00:02 ] Fetch Datacenter ID
[ 00:02 ] Fetch Datacenter name
[ 00:04 ] Add NFS storage domain
[ 00:04 ] Get storage domain details
[ 00:03 ] Find the appliance OVF
[ 00:03 ] Parse OVF
[ 00:02 ] Get required size
[ FAILED ] Activate storage domain
2021-01-16 19:54:00,359-0700 DEBUG ansible on_any args <ansible.executor.stats.AggregateStats object at 0x7f11a2e3b8d0> kwargs
**End snip**
The relevantsection from /var/log/vdsm/vdsm.log:
***begin snip***
2021-01-16 19:53:58,439-0700 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=b8b21668-189e-4b68-a7f0-c2d2ebf14546 (api:48)
2021-01-16 19:53:58,439-0700 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=b8b21668-189e-4b68-a7f0-c2d2ebf14546 (api:54)
2021-01-16 19:53:58,440-0700 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:726)
2021-01-16 19:53:58,885-0700 INFO (jsonrpc/3) [vdsm.api] START connectStorageServer(domType=1, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'password': '********', 'protocol_version': 'auto', 'port': '', 'iqn': '', 'connection': 'stumpy:/tanker/ovirt/host_storage', 'ipv6_enabled': 'false', 'id': '3ffd1e3b-168e-4248-a2af-b28fbdf49eef', 'user': '', 'tpgt': '1'}], options=None) from=::ffff:192.168.222.53,41192, flow_id=592e278f, task_id=5bd52fa3-f790-4ed3-826d-c1f51e5f2291 (api:48)
2021-01-16 19:53:58,892-0700 INFO (jsonrpc/3) [storage.StorageDomainCache] Invalidating storage domain cache (sdc:74)
2021-01-16 19:53:58,892-0700 INFO (jsonrpc/3) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': '3ffd1e3b-168e-4248-a2af-b28fbdf49eef', 'status': 0}]} from=::ffff:192.168.222.53,41192, flow_id=592e278f, task_id=5bd52fa3-f790-4ed3-826d-c1f51e5f2291 (api:54)
2021-01-16 19:53:58,914-0700 INFO (jsonrpc/6) [vdsm.api] START getStorageDomainInfo(sdUUID='54532dd4-3e5b-4885-b88e-599c81efb146', options=None) from=::ffff:192.168.222.53,41192, flow_id=592e278f, task_id=4d5a352d-6096-45b7-a4ee-b6e08ac02f7b (api:48)
2021-01-16 19:53:58,914-0700 INFO (jsonrpc/6) [storage.StorageDomain] sdUUID=54532dd4-3e5b-4885-b88e-599c81efb146 (fileSD:535)
2021-01-16 19:53:58,918-0700 INFO (jsonrpc/6) [vdsm.api] FINISH getStorageDomainInfo error=Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',) from=::ffff:192.168.222.53,41192, flow_id=592e278f, task_id=4d5a352d-6096-45b7-a4ee-b6e08ac02f7b (api:52)
2021-01-16 19:53:58,918-0700 ERROR (jsonrpc/6) [storage.TaskManager.Task] (Task='4d5a352d-6096-45b7-a4ee-b6e08ac02f7b') Unexpected error (task:880)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
return fn(*args, **kargs)
File "<decorator-gen-131>", line 2, in getStorageDomainInfo
File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2796, in getStorageDomainInfo
dom = self.validateSdUUID(sdUUID)
File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 312, in validateSdUUID
sdDom.validate()
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 538, in validate
raise se.StorageDomainAccessError(self.sdUUID)
vdsm.storage.exception.StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',)
2021-01-16 19:53:58,918-0700 INFO (jsonrpc/6) [storage.TaskManager.Task] (Task='4d5a352d-6096-45b7-a4ee-b6e08ac02f7b') aborting: Task is aborted: "value=Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',) abortedcode=379" (task:1190)
2021-01-16 19:53:58,918-0700 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH getStorageDomainInfo error=Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',) (dispatcher:83)
2021-01-16 19:53:58,919-0700 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo failed (error 379) in 0.00 seconds (__init__:312)
2021-01-16 19:53:58,956-0700 INFO (jsonrpc/1) [vdsm.api] START createStoragePool(poolType=None, spUUID='91fb3f22-5795-11eb-ad9f-00163e3f4683', poolName='Default', masterDom='54532dd4-3e5b-4885-b88e-599c81efb146', domList=['54532dd4-3e5b-4885-b88e-599c81efb146'], masterVersion=7, lockPolicy=None, lockRenewalIntervalSec=5, leaseTimeSec=60, ioOpTimeoutSec=10, leaseRetries=3, options=None) from=::ffff:192.168.222.53,41192, flow_id=592e278f, task_id=a946357f-537a-4c9c-9040-e70a98ee5643 (api:48)
2021-01-16 19:53:58,958-0700 INFO (jsonrpc/1) [storage.StoragePool] updating pool 91fb3f22-5795-11eb-ad9f-00163e3f4683 backend from type NoneType instance 0x7f86e45f89d0 to type StoragePoolDiskBackend instance 0x7f8680738408 (sp:157)
2021-01-16 19:53:58,958-0700 INFO (jsonrpc/1) [storage.StoragePool] spUUID=91fb3f22-5795-11eb-ad9f-00163e3f4683 poolName=Default master_sd=54532dd4-3e5b-4885-b88e-599c81efb146 domList=['54532dd4-3e5b-4885-b88e-599c81efb146'] masterVersion=7 {'LEASERETRIES': 3, 'LEASETIMESEC': 60, 'LOCKRENEWALINTERVALSEC': 5, 'IOOPTIMEOUTSEC': 10} (sp:602)
2021-01-16 19:53:58,958-0700 INFO (jsonrpc/1) [storage.StorageDomain] sdUUID=54532dd4-3e5b-4885-b88e-599c81efb146 (fileSD:535)
2021-01-16 19:53:58,963-0700 ERROR (jsonrpc/1) [storage.StoragePool] Unexpected error (sp:618)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 613, in create
domain.validate()
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 538, in validate
raise se.StorageDomainAccessError(self.sdUUID)
vdsm.storage.exception.StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',)
2021-01-16 19:53:58,963-0700 INFO (jsonrpc/1) [vdsm.api] FINISH createStoragePool error=Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',) from=::ffff:192.168.222.53,41192, flow_id=592e278f, task_id=a946357f-537a-4c9c-9040-e70a98ee5643 (api:52)
2021-01-16 19:53:58,963-0700 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='a946357f-537a-4c9c-9040-e70a98ee5643') Unexpected error (task:880)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 613, in create
domain.validate()
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 538, in validate
raise se.StorageDomainAccessError(self.sdUUID)
vdsm.storage.exception.StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
return fn(*args, **kargs)
File "<decorator-gen-31>", line 2, in createStoragePool
File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1027, in createStoragePool
leaseParams)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 619, in create
raise se.StorageDomainAccessError(sdUUID)
vdsm.storage.exception.StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',)
2021-01-16 19:53:58,964-0700 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='a946357f-537a-4c9c-9040-e70a98ee5643') aborting: Task is aborted: "value=Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',) abortedcode=379" (task:1190)
2021-01-16 19:53:58,964-0700 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH createStoragePool error=Domain is either partially accessible or entirely inaccessible: ('54532dd4-3e5b-4885-b88e-599c81efb146',) (dispatcher:83)
2021-01-16 19:53:58,965-0700 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StoragePool.create failed (error 379) in 0.01 seconds (__init__:312)
2021-01-16 19:54:01,923-0700 INFO (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.222.53,41192 (api:48)
2021-01-16 19:54:01,964-0700 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::ffff:192.168.222.53,41192 (api:48)
2021-01-16 19:54:01,966-0700 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::ffff:192.168.222.53,41192 (api:54)
2021-01-16 19:54:01,977-0700 INFO (jsonrpc/5) [vdsm.api] START repoStats(domains=()) from=::ffff:192.168.222.53,41192, task_id=7a6c8536-3f1e-41b3-9ab7-bb58a5e75fac (api:48)
2021-01-16 19:54:01,978-0700 INFO (jsonrpc/5) [vdsm.api] FINISH repoStats return={} from=::ffff:192.168.222.53,41192, task_id=7a6c8536-3f1e-41b3-9ab7-bb58a5e75fac (api:54)
2021-01-16 19:54:01,980-0700 INFO (jsonrpc/5) [vdsm.api] START multipath_health() from=::ffff:192.168.222.53,41192, task_id=b8af45b5-6488-4940-b7d0-1d11b5f76db4 (api:48)
2021-01-16 19:54:01,981-0700 INFO (jsonrpc/5) [vdsm.api] FINISH multipath_health return={} from=::ffff:192.168.222.53,41192, task_id=b8af45b5-6488-4940-b7d0-1d11b5f76db4 (api:54)
2021-01-16 19:54:01,996-0700 INFO (jsonrpc/5) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:192.168.222.53,41192 (api:54)
***end snip***
3 years, 9 months
New failure Gluster deploy: Set granual-entry-heal on --> Bricks down
by Charles Lam
Dear friends,
Thanks to Donald and Strahil, my earlier Gluster deploy issue was resolved by disabling multipath on the nvme drives. The Gluster deployment is now failing on the three node hyperconverged oVirt v4.3.3 deployment at:
TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] **********
task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
with:
"stdout": "One or more bricks could be down. Please execute the command
again after bringing all bricks online and finishing any pending heals\nVolume heal
failed."
Specifically:
TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] **********
task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine',
'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"engine", "granular-entry-heal", "enable"],
"delta": "0:00:10.112451", "end": "2020-12-18
19:50:22.818741", "item": {"arbiter": 0, "brick":
"/gluster_bricks/engine/engine", "volname": "engine"},
"msg": "non-zero return code", "rc": 107, "start":
"2020-12-18 19:50:12.706290", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick':
'/gluster_bricks/data/data', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"data", "granular-entry-heal", "enable"], "delta":
"0:00:10.110165", "end": "2020-12-18 19:50:38.260277",
"item": {"arbiter": 0, "brick":
"/gluster_bricks/data/data", "volname": "data"},
"msg": "non-zero return code", "rc": 107, "start":
"2020-12-18 19:50:28.150112", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore',
'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) =>
{"ansible_loop_var": "item", "changed": true,
"cmd": ["gluster", "volume", "heal",
"vmstore", "granular-entry-heal", "enable"],
"delta": "0:00:10.113203", "end": "2020-12-18
19:50:53.767864", "item": {"arbiter": 0, "brick":
"/gluster_bricks/vmstore/vmstore", "volname": "vmstore"},
"msg": "non-zero return code", "rc": 107, "start":
"2020-12-18 19:50:43.654661", "stderr": "",
"stderr_lines": [], "stdout": "One or more bricks could be down.
Please execute the command again after bringing all bricks online and finishing any
pending heals\nVolume heal failed.", "stdout_lines": ["One or more
bricks could be down. Please execute the command again after bringing all bricks online
and finishing any pending heals", "Volume heal failed."]}
Any suggestions regarding troubleshooting, insight or recommendations for reading are greatly appreciated. I apologize for all the email and am only creating this as a separate thread as it is a new, presumably unrelated issue. I welcome any recommendations if I can improve my forum etiquette.
Respectfully,
Charles
3 years, 9 months
Re: VM templates
by Strahil Nikolov
You should create a file like mine, cause vdsm manages /etc/multipathd.conf
# cat /etc/multipath/conf.d/blacklist.confblacklist { devnode "*" wwid nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-00000001 wwid TOSHIBA-TR200_Z7KB600SK46S wwid ST500NM0011_Z1M00LM7 wwid WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189 wwid WDC_WD15EADS-00P8B0_WD-WMAVU0885453 wwid WDC_WD5003ABYZ-011FA0_WD-WMAYP0F35PJ4}
Keep in mind 'devnode *' is OK only for gluster-only machine.
Best Regards,Strahil Nikolov
Sent from Yahoo Mail on Android
On Wed, Jan 27, 2021 at 6:02, Robert Tongue<phunyguy(a)neverserio.us> wrote: _______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YD7ROMATPWF...
3 years, 9 months
Re: VM templates
by Robert Tongue
Correction, the issue came back, but I fixed it again, the actual issue was multipathd. I had to set up device filters in /etc/multipath.conf
blacklist {
protocol "(scsi:adt|scsi:sbp)"
devnode "^hd[a-z]"
devnode "^sd[a-z]$"
devnode "^sd[a-z]"
devnode "^nvme0n1"
devnode "^nvme0n1p$"blacklist {
}
Probably overkill, but it works.
________________________________
From: Robert Tongue <phunyguy(a)neverserio.us>
Sent: Tuesday, January 26, 2021 2:24 PM
To: users <users(a)ovirt.org>
Subject: Re: VM templates
I fixed my own issue, and for everyone else that may run into this, the issue was the fact that I created the first oVirt node VM inside VMware, and got it fully configured with all the software/disks/partitioning/settings, then cloned it to two more VMs. Then I ran the hosted-engine deployment and set up the cluster. I think it was because I used clones for each cluster node, and that confused things due to device/system identifiers.
I rebuilt all 3 node VMs from scratch, and everything works perfectly now.
Thanks for listening.
________________________________
From: Robert Tongue
Sent: Monday, January 25, 2021 10:03 AM
To: users <users(a)ovirt.org>
Subject: VM templates
Hello,
Another weird issue over here. I have the latest oVirt running inside VMware Vcenter, as a proof of concept/testing platform. Things are working well finally, for the most part, however I am noticing strange behavior with templates, and deployed VMs from that template. Let me explain:
I created a basic Ubuntu Server VM, captured that VM as a template, then deployed 4 VMs from that template. The deployment went fine; however I can only start 3 of the 4 VMs. If I shut one down one of the 3 that I started, I can then start the other one that refused to start, then the one I JUST shut down will then refuse to start. The error is:
VM test3 is down with error. Exit message: Bad volume specification {'device': 'disk', 'type': 'disk', 'diskType': 'file', 'specParams': {}, 'alias': 'ua-2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'address': {'bus': '0', 'controller': '0', 'unit': '0', 'type': 'drive', 'target': '0'}, 'domainID': '804c6a0c-b246-4ccc-b3ab-dd4ceb819cea', 'imageID': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'poolID': '3208bbce-5e04-11eb-9313-00163e281c6d', 'volumeID': 'f514ab22-07ae-40e4-9146-1041d78553fd', 'path': '/rhev/data-center/3208bbce-5e04-11eb-9313-00163e281c6d/804c6a0c-b246-4ccc-b3ab-dd4ceb819cea/images/2dc7fbff-da30-485d-891f-03a0ed60fd0a/f514ab22-07ae-40e4-9146-1041d78553fd', 'discard': True, 'format': 'cow', 'propagateErrors': 'off', 'cache': 'none', 'iface': 'scsi', 'name': 'sda', 'bootOrder': '1', 'serial': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'index': 0, 'reqsize': '0', 'truesize': '2882392576', 'apparentsize': '3435134976'}.
The underlying storage is GlusterFS, self-managed outside of oVirt.
I can provide any logs needed, please let me know which. Thanks in advance.
3 years, 9 months
OVN and change of mgmt network
by Gianluca Cecchi
Hello,
I previously had OVN running on engine (as OVN provider with northd and
northbound and southbound DBs) and hosts (with OVN controller).
After changing mgmt ip of hosts (engine has retained instead the same ip),
I executed again on them the command:
vdsm-tool ovn-config <ip_of_engine> <nel_local_ip_of_host>
Now I think I have to clean up some things, eg:
1) On engine
where I get these lines below
systemctl status ovn-northd.service -l
. . .
Sep 29 14:41:42 ovmgr1 ovsdb-server[940]: ovs|00005|reconnect|ERR|tcp:
10.4.167.40:37272: no response to inactivity probe after 5 seconds,
disconnecting
Oct 03 11:52:00 ovmgr1 ovsdb-server[940]: ovs|00006|reconnect|ERR|tcp:
10.4.167.41:52078: no response to inactivity probe after 5 seconds,
disconnecting
The two IPs are the old ones of two hosts
It seems that a restart of the services has fixed...
Can anyone confirm if I have to do anything else?
2) On hosts (there are 3 hosts with OVN on ip 10.4.192.32/33/34)
where I currently have this output
[root@ov301 ~]# ovs-vsctl show
3a38c5bb-0abf-493d-a2e6-345af8aedfe3
Bridge br-int
fail_mode: secure
Port "ovn-1dce5b-0"
Interface "ovn-1dce5b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.32"}
Port "ovn-ddecf0-0"
Interface "ovn-ddecf0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.33"}
Port "ovn-fd413b-0"
Interface "ovn-fd413b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.168.74"}
Port br-int
Interface br-int
type: internal
ovs_version: "2.7.2"
[root@ov301 ~]#
The IPs of kind 10.4.192.x are ok.
But there is a left-over of an old host I initially used for tests,
corresponding to 10.4.168.74, that now doesn't exist anymore
How can I clean records for 1) and 2)?
Thanks,
Gianluca
3 years, 10 months
Re: VM templates
by Robert Tongue
I fixed my own issue, and for everyone else that may run into this, the issue was the fact that I created the first oVirt node VM inside VMware, and got it fully configured with all the software/disks/partitioning/settings, then cloned it to two more VMs. Then I ran the hosted-engine deployment and set up the cluster. I think it was because I used clones for each cluster node, and that confused things due to device/system identifiers.
I rebuilt all 3 node VMs from scratch, and everything works perfectly now.
Thanks for listening.
________________________________
From: Robert Tongue
Sent: Monday, January 25, 2021 10:03 AM
To: users <users(a)ovirt.org>
Subject: VM templates
Hello,
Another weird issue over here. I have the latest oVirt running inside VMware Vcenter, as a proof of concept/testing platform. Things are working well finally, for the most part, however I am noticing strange behavior with templates, and deployed VMs from that template. Let me explain:
I created a basic Ubuntu Server VM, captured that VM as a template, then deployed 4 VMs from that template. The deployment went fine; however I can only start 3 of the 4 VMs. If I shut one down one of the 3 that I started, I can then start the other one that refused to start, then the one I JUST shut down will then refuse to start. The error is:
VM test3 is down with error. Exit message: Bad volume specification {'device': 'disk', 'type': 'disk', 'diskType': 'file', 'specParams': {}, 'alias': 'ua-2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'address': {'bus': '0', 'controller': '0', 'unit': '0', 'type': 'drive', 'target': '0'}, 'domainID': '804c6a0c-b246-4ccc-b3ab-dd4ceb819cea', 'imageID': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'poolID': '3208bbce-5e04-11eb-9313-00163e281c6d', 'volumeID': 'f514ab22-07ae-40e4-9146-1041d78553fd', 'path': '/rhev/data-center/3208bbce-5e04-11eb-9313-00163e281c6d/804c6a0c-b246-4ccc-b3ab-dd4ceb819cea/images/2dc7fbff-da30-485d-891f-03a0ed60fd0a/f514ab22-07ae-40e4-9146-1041d78553fd', 'discard': True, 'format': 'cow', 'propagateErrors': 'off', 'cache': 'none', 'iface': 'scsi', 'name': 'sda', 'bootOrder': '1', 'serial': '2dc7fbff-da30-485d-891f-03a0ed60fd0a', 'index': 0, 'reqsize': '0', 'truesize': '2882392576', 'apparentsize': '3435134976'}.
The underlying storage is GlusterFS, self-managed outside of oVirt.
I can provide any logs needed, please let me know which. Thanks in advance.
3 years, 10 months
Replaced host but ovn complains of duplicate port
by Kevin Doyle
I had to rebuild a host. I first removed it from Ovirt then reinstalled the OS. I then used the same 192.xxx.xxx.207 IP as the old host. I then added it to Ovirt which seem to go OK. However I am seeing lots of ovn errors complaining that there was an existing port for the same IP
Port ovn-47cc88-0 new port is ovn-af7f78-0
ovs-vsctl show
34c43e58-46f5-4217-8f2e-5801e1f2b9de
Bridge br-int
fail_mode: secure
Port "ovn-24b972-0"
Interface "ovn-24b972-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.204"}
Port "ovn-ec1bbd-0"
Interface "ovn-ec1bbd-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.201"}
Port "ovn-47cc88-0"
Interface "ovn-47cc88-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.207"}
Port "ovn-d1e09d-0"
Interface "ovn-d1e09d-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.203"}
Port "ovn-f5ded7-0"
Interface "ovn-f5ded7-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.202"}
Port "ovn-1cb1b0-0"
Interface "ovn-1cb1b0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.206"}
Port br-int
Interface br-int
type: internal
Port "ovn-af7f78-0"
Interface "ovn-af7f78-0"
type: geneve
options: {csum="true", key=flow, remote_ip="192.xxx.xxx.207"}
error: "could not add network device ovn-af7f78-0 to ofproto (File exists)"
ovs_version: "2.10.1"
/ var/log/ovs-vswitchd.log
tunnel|WARN|ovn-af7f78-0: attempting to add tunnel port with same config as port 'ovn-47cc88-0' (::->192.xxx.xxx.207, key=flow, legacy_l2, dp port=2)
2021-01-26T18:03:38.128Z|69740|ofproto|WARN|br-int: could not add port ovn-af7f78-0 (File exists)
2021-01-26T18:03:38.128Z|69741|bridge|WARN|could not add network device ovn-af7f78-0 to ofproto (File exists)
/var/log/messages
Jan 26 18:03:37 xxxx05 kernel: device genev_sys_6081 entered promiscuous mode
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.0 eno3: UDP port 6081 was not found, not deleting
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.1 eno4: UDP port 6081 was not found, not deleting
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.2 eno5: UDP port 6081 was not found, not deleting
Jan 26 18:03:37 xxxx05 kernel: i40e 0000:3d:00.3 eno6: UDP port 6081 was not found, not deleting
I can bring the host 192.xxx.xxx.107 down and the errors stop on the other hosts but ovn still has a port defined for 192.xxx.xxx.207 The question I have is how do I delete the old port definition ?
Regards
Kevin
3 years, 10 months