Forum available
by Josep Manel Andrés Moscardó
Hi all,
I am just wondering if anyone like me would like to have everything that
is bump here in a forum, with all the benefits it brings (and people
will still be able to subscribe and reply through email). Something like
Discourse would be nice in my opinion.
Best.
5 years, 8 months
Cancel storage migration task?
by Alan G
Hi,
I accidentally triggered a storage migration for a large vdisk that will take some hours to complete. Is there a way to cleanly cancel the task, such that the vdisk will remain on the original domain?
Thanks,
Alan
5 years, 8 months
Hosted Engine I/O scheduler
by Strahil
Dear All,
I have just noticed that my Hosted Engine has a strange I/O scheduler:
Last login: Sun Mar 17 18:14:26 2019 from 192.168.1.43
[root@engine ~]# cat /sys/block/vda/queue/scheduler
[mq-deadline] kyber none
[root@engine ~]#
Based on my experience anything than noop/none is useless and performance degrading for a VM.
Is there any reason that we have this scheduler ?
It is quite pointless to process (and delay) the I/O in the VM and then process (and again delay) on Host Level .
If there is no reason to keep the deadline, I will open a bug about it.
Best Regards,
Strahil Nikolov
5 years, 8 months
oVirt 4.3.2 RC - ISCSI LUN
by a.e.pool@iway.na
I have deployed the latest oVirt 4.3.2 host and the hosted engine without problems, until you have to select the storage. I configured a brand new FReenAS 11 with ISCSI on three 1TB HDDs, the oVirt engine sees the iscsi target but when trying to select the LUN it shows that the LUN is already in use. Anyone come across this issue?
5 years, 8 months
Ovirt 4.3.1 problem with HA agent
by Strahil
Hi guys,
After updating to 4.3.1 I had an issue where the ovirt-ha-broker was complaining that it couldn't ping the gateway.
As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker, vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace.
I gues s I didn't do it properly as now I receive:
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf
Any hints how to fix this ? Of course a redeploy is possible, but I prefer to recover from that.
Best Regards,
Strahil Nikolov
5 years, 8 months
Bandwidth problem
by Leo David
Hello Everyone,
I have 10Gb connections setup for all the hosts in the cluster, for both
management/vm and gluster traffic ( separate network cards )
The problem is that i just cannot pass 1Gb/s traffic between vms ( even
between vms running on the same hosts ! - which makes the things more
weird... ). Traffic measured by using iperf tool.
Is there a way I can check waht could be the problem ? Network card type,
vm drivers, any suggestion ? I just do not know where to look for a
possible cause.
Thank you very much !
Leo
--
Best regards, Leo David
5 years, 8 months
Re: self-hosted ovirt-engine down
by Strahil
In order to be easier to read, you can rotate the log before starting the engine.Should be something like:
hosted-engine --set-maintenance --mode=global && logrotate -f /etc/vdsm/logrotate/vdsm && hosted-engine -- vm-start
Best Regards,
Strahil NikolovOn Mar 16, 2019 10:52, Simone Tiraboschi <stirabos(a)redhat.com> wrote:
>
>
>
> On Sat, Mar 16, 2019 at 7:44 AM <siovelrm(a)gmail.com> wrote:
>>
>> Hi, I have a big problem with ovirt. I use version 4.2.7 with self-hosted. The problem is that when I try to raise the vm of the ovirt-engine with the command: hosted-engine --vm-start, it appears in the output
>> "VM exists and is down, cleaning up and restarting"
>> when running: hosted-engine --vm-status appears
>> --== Host 1 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date : True
>> Hostname : node1.softel.cu
>> Host ID : 1
>> Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"}
>> Score : 0
>> stopped : False
>> Local maintenance : False
>> crc32 : 02c3b5a4
>> local_conf_timestamp : 49529
>> Host timestamp : 49529
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=49529 (Sat Mar 16 02:39:10 2019)
>> host-id=1
>> score=0
>> vm_conf_refresh_time=49529 (Sat Mar 16 02:39:11 2019)
>> conf_on_shared_storage=True
>> maintenance=False
>> state=EngineUnexpectedlyDown
>> stopped=False
>> timeout=Thu Jan 1 08:49:39 1970
>>
>> in /var/log/messages
>> Mar 16 02:35:34 node1 vdsm[26151]: WARN Attempting to remove a non existing network: ovirtmgmt/0c3e1c08-3928-47f1-96a8-c6a8d0dc3241
>> Mar 16 02:35:34 node1 vdsm[26151]: WARN Attempting to remove a non existing net user: ovirtmgmt/0c3e1c08-3928-47f1-96a8-c6a8d0dc3241
>> Mar 16 02:35:34 node1 vdsm[26151]: WARN Attempting to remove a non existing network: ovirtmgmt/0c3e1c08-3928-47f1-96a8-c6a8d0dc3241
>> Mar 16 02:35:34 node1 vdsm[26151]: WARN Attempting to remove a non existing net user: ovirtmgmt/0c3e1c08-3928-47f1-96a8-c6a8d0dc3241
>> Mar 16 02:35:34 node1 vdsm[26151]: WARN File: /var/lib/libvirt/qemu/channels/0c3e1c08-3928-47f1-96a8-c6a8d0dc3241.org.qemu.guest_agent.0 already removed
>>
>> Please help!!
>
>
> I'd suggest to check in /var/log/vdsm/vdsm.log why it failed to start.
>
>
>>
>> _______________________________________________
>> Users mailing list -- users(a)ovirt.org
>> To unsubscribe send an email to users-leave(a)ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OTXJFCVCAHM...
5 years, 8 months
Error creating a storage domain (On Cisco UCS Only)
by nico.kruger@darkmatter.ae
Hi Guys,
I am trying to install a new cluster... I currently have a 9 node and two 6 node oVirt Clusters... (these were installed on 4.1 and upgraded to 4.2)
So i want to build a new cluster, which is working fine on this HP notebook i use for testing (using single node gluster deployment)
But when i try install this on my production servers which are Cisco UCS servers i keep getting this error:
[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Error creating a storage domain]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."}
This happens during storage creation after the hosted-engine is built and after gluster has been deployed (error happens for both single and 3 replica deployments)
I just cant see how an install on one type of server is successful but not on the UCS servers (which i am running my other ovirt clusters on)
BTW i dont think the issue is related to Gluster Storage Create as i tried using NFS and Local storage and get the same error (on UCS server only)
I am using the ovirt-node-ng-installer-4.2.0-2019011406.el7.iso install ISO
Below is a tail from ovirt-hosted-engine-setup-ansible-create_storage_domain Log file
2019-01-27 11:09:49,754+0400 INFO ansible ok {'status': 'OK', 'ansible_task': u'Fetch Datacenter name', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'task'}
2019-01-27 11:09:49,754+0400 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f9f9b7e2d50> kwargs
2019-01-27 11:09:50,478+0400 INFO ansible task start {'status': 'OK', 'ansible_task': u'Add NFS storage domain', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'task'}
2019-01-27 11:09:50,479+0400 DEBUG ansible on_any args TASK: Add NFS storage domain kwargs is_conditional:False
2019-01-27 11:09:51,151+0400 DEBUG var changed: host "localhost" var "otopi_storage_domain_details_nfs" type "<type 'dict'>" value: "{
"changed": false,
"skip_reason": "Conditional result was False",
"skipped": true
}"
2019-01-27 11:09:51,151+0400 INFO ansible skipped {'status': 'SKIPPED', 'ansible_task': u'Add NFS storage domain', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'task'}
2019-01-27 11:09:51,151+0400 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f9f9b7e2610> kwargs
2019-01-27 11:09:51,820+0400 INFO ansible task start {'status': 'OK', 'ansible_task': u'Add glusterfs storage domain', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'task'}
2019-01-27 11:09:51,821+0400 DEBUG ansible on_any args TASK: Add glusterfs storage domain kwargs is_conditional:False
2019-01-27 11:10:02,045+0400 DEBUG var changed: host "localhost" var "otopi_storage_domain_details_gluster" type "<type 'dict'>" value: "{
"changed": false,
"exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_storage_domain_payload_Xous24/__main__.py\", line 682, in main\n ret = storage_domains_module.create()\n File \"/tmp/ansible_ovirt_storage_domain_payload_Xous24/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py\", line 587, in create\n **kwargs\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 24225, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n return future.wait() if wait else future\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 55, in wait\n return self._code(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 229, in callback\n self._check_fault(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n
self._raise_error(response, body)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n raise error\nError: Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400.\n",
"failed": true,
"msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."
}"
2019-01-27 11:10:02,045+0400 DEBUG var changed: host "localhost" var "ansible_play_hosts" type "<type 'list'>" value: "[]"
2019-01-27 11:10:02,045+0400 DEBUG var changed: host "localhost" var "play_hosts" type "<type 'list'>" value: "[]"
2019-01-27 11:10:02,045+0400 DEBUG var changed: host "localhost" var "ansible_play_batch" type "<type 'list'>" value: "[]"
2019-01-27 11:10:02,046+0400 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Add glusterfs storage domain', 'ansible_result': u'type: <type \'dict\'>\nstr: {\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n File "/tmp/ansible_ovirt_storage_domain_payload_Xous24/__main__.py", line 682, in main\\n ret = storage_domains_module.create()\\n File "/tmp/ansible_ovirt_storage_domain_payload_Xous24/ansible_ovirt_storage_domain_pay', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml'}
2019-01-27 11:10:02,046+0400 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f9f9b859f50> kwargs ignore_errors:None
2019-01-27 11:10:02,048+0400 INFO ansible stats {'status': 'FAILED', 'ansible_playbook_duration': 39.840701, 'ansible_result': u"type: <type 'dict'>\nstr: {u'localhost': {'unreachable': 0, 'skipped': 2, 'ok': 13, 'changed': 0, 'failures': 1}}", 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'finish'}
2019-01-27 11:10:02,048+0400 DEBUG ansible on_any args <ansible.executor.stats.AggregateStats object at 0x7f9f9da59a50> kwargs
This issue is driving me crazy... any assistance on what i can do would be greatly appreciated
Nico
5 years, 8 months
vdsm should decouple with managed glusterfs services
by levin@mydream.com.hk
Hi, I had experience two time of 3-node hyper-converged 4.2.8 ovirt cluster total outage due to vdsm reactivate the unresponsive node, and cause the multiple glusterfs daemon restart. As a result, all VM was paused and some of disk image was corrupted.
At the very beginning, one of the ovirt node was overloaded due to high memory and CPU, the hosted-engine have trouble to collect status from vdsm and mark it as unresponsive and it start migrate the workload to healthy node. However, when it start migrate, second ovirt node being unresponsive where vdsm try reactivate the 1st unresponsive node and restart it's glusterd. So the gluster domain was acquiring the quorum and waiting for timeout.
If 1st node reactivation is success and every other node can survive the timeout, it will be an idea case. Unfortunately, the second node cannot pick up the VM being migrated due to gluster I/O timeout, so second node at that moment was marked as unresponsive, and so on... vdsm is restarting the glusterd on the second node which cause disaster. All node are racing on gluster volume self-healing, and i can't mark the cluster as maintenance mode as well. What I can do is try to resume the paused VM via virsh and issue shutdown for each domain, also hard shutdown for un-resumable VM.
After number of VM shutdown and wait the gluster healing completed, the cluster state back to normal, and I try to start the VM being manually stopped, most of them can be started normally, but number of VM was crashed or un-startable, instantly I found the image files of un-startable VM was owned by root(can't explain why), and can be restarted after chmod. Two of them still cannot start with "bad volume specification" error. One of them can start to boot loader, but the LVM metadata were lost.
The impact was huge when vdsm restart the glusterd without human invention.
5 years, 8 months
Enabled Base and Update repos?
by nico.kruger@darkmatter.ae
Hi Guys,
Should Centos Base and Update repos be enabled on ovirt hosts or only the dependancies and ovirt repos (the ones from rpm release)
5 years, 8 months