Re: Single instance scaleup.
by Strahil
Hi Leo,
As you do not have a distributed volume , you can easily switch to replica 2 arbiter 1 or replica 3 volumes.
You can use the following for adding the bricks:
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Ad...
Best Regards,
Strahil NikolivOn May 26, 2019 10:54, Leo David <leoalex(a)gmail.com> wrote:
>
> Hi Stahil,
> Thank you so much for yout input !
>
> gluster volume info
>
>
> Volume Name: engine
> Type: Distribute
> Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.80.191:/gluster_bricks/engine/engine
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> performance.low-prio-threads: 32
> performance.strict-o-direct: off
> network.remote-dio: off
> network.ping-timeout: 30
> user.cifs: off
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> cluster.eager-lock: enable
> Volume Name: ssd-samsung
> Type: Distribute
> Volume ID: 76576cc6-220b-4651-952d-99846178a19e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.80.191:/gluster_bricks/sdc/data
> Options Reconfigured:
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> user.cifs: off
> network.ping-timeout: 30
> network.remote-dio: off
> performance.strict-o-direct: on
> performance.low-prio-threads: 32
> features.shard: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> transport.address-family: inet
> nfs.disable: on
>
> The other two hosts will be 192.168.80.192/193 - this is gluster dedicated network over 10GB sfp+ switch.
> - host 2 wil have identical harware configuration with host 1 ( each disk is actually a raid0 array )
> - host 3 has:
> - 1 ssd for OS
> - 1 ssd - for adding to engine volume in a full replica 3
> - 2 ssd's in a raid 1 array to be added as arbiter for the data volume ( ssd-samsung )
> So the plan is to have "engine" scaled in a full replica 3, and "ssd-samsung" scalled in a replica 3 arbitrated.
>
>
>
>
> On Sun, May 26, 2019 at 10:34 AM Strahil <hunter86_bg(a)yahoo.com> wrote:
>>
>> Hi Leo,
>>
>> Gluster is quite smart, but in order to provide any hints , can you provide output of 'gluster volume info <glustervol>'.
>> If you have 2 more systems , keep in mind that it is best to mirror the storage on the second replica (2 disks on 1 machine -> 2 disks on the new machine), while for the arbiter this is not neccessary.
>>
>> What is your network and NICs ? Based on my experience , I can recommend at least 10 gbit/s interfase(s).
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On May 26, 2019 07:52, Leo David <leoalex(a)gmail.com> wrote:
>>>
>>> Hello Everyone,
>>> Can someone help me to clarify this ?
>>> I have a single-node 4.2.8 installation ( only two gluster storage domains - distributed single drive volumes ). Now I just got two identintical servers and I would like to go for a 3 nodes bundle.
>>> Is it possible ( after joining the new nodes to the cluster ) to expand the existing volumes across the new nodes and change them to replica 3 arbitrated ?
>>> If so, could you share with me what would it be the procedure ?
>>> Thank you very much !
>>>
>>> Leo
>
>
>
> --
> Best regards, Leo David
4 years, 5 months
Failed to add storage domain
by thunderlight1@gmail.com
Hi!
I have installed oVirt using the iso ovirt-node-ng-installer-4.3.2-2019031908.el7. I the did run the Host-engine deployment through Cockpit.
I got an error when it tries to create the domain storage. It sucessfully mounted the NFS-share on the host. Bellow is the error I got:
2019-04-14 10:40:38,967+0200 INFO ansible skipped {'status': 'SKIPPED', 'ansible_task': u'Check storage domain free space', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_type': 'task'}
2019-04-14 10:40:38,967+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fb6918ad9d0> kwargs
2019-04-14 10:40:39,516+0200 INFO ansible task start {'status': 'OK', 'ansible_task': u'ovirt.hosted_engine_setup : Activate storage domain', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_type': 'task'}
2019-04-14 10:40:39,516+0200 DEBUG ansible on_any args TASK: ovirt.hosted_engine_setup : Activate storage domain kwargs is_conditional:False
2019-04-14 10:40:41,923+0200 DEBUG var changed: host "localhost" var "otopi_storage_domain_details" type "<type 'dict'>" value: "{
"changed": false,
"exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py\", line 664, in main\n storage_domains_module.post_create_check(sd_id)\n File \"/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py\", line 526, in post_create_check\n id=storage_domain.id,\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 3053, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n return future.wait() if wait else future\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 55, in wait\n return self._code(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 229, in callback\n self._check_fault(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n self._raise_error(response
, body)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n raise error\nError: Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400.\n",
"failed": true,
"msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."
}"
2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "ansible_play_hosts" type "<type 'list'>" value: "[]"
2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "play_hosts" type "<type 'list'>" value: "[]"
2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "ansible_play_batch" type "<type 'list'>" value: "[]"
2019-04-14 10:40:41,924+0200 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Activate storage domain', 'ansible_result': u'type: <type \'dict\'>\nstr: {\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n File "/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py", line 664, in main\\n storage_domains_module.post_create_check(sd_id)\\n File "/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py", line 526', 'task_duration': 2, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'}
2019-04-14 10:40:41,924+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fb691843190> kwargs ignore_errors:None
2019-04-14 10:40:41,928+0200 INFO ansible stats {
"ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
"ansible_playbook_duration": "00:37 Minutes",
"ansible_result": "type: <type 'dict'>\nstr: {u'localhost': {'unreachable': 0, 'skipped': 6, 'ok': 23, 'changed': 1, 'failures': 1}}",
"ansible_type": "finish",
"status": "FAILED"
}
2019-04-14 10:40:41,928+0200 INFO SUMMARY:
Duration Task Name
-------- --------
[ < 1 sec ] Execute just a specific set of steps
[ 00:01 ] Force facts gathering
[ 00:01 ] Check local VM dir stat
[ 00:01 ] Obtain SSO token using username/password credentials
[ 00:01 ] Fetch host facts
[ < 1 sec ] Fetch cluster ID
[ 00:01 ] Fetch cluster facts
[ 00:01 ] Fetch Datacenter facts
[ < 1 sec ] Fetch Datacenter ID
[ < 1 sec ] Fetch Datacenter name
[ 00:02 ] Add NFS storage domain
[ 00:01 ] Get storage domain details
[ 00:01 ] Find the appliance OVF
[ 00:01 ] Parse OVF
[ < 1 sec ] Get required size
[ FAILED ] Activate storage domain
2019-04-14 10:40:41,928+0200 DEBUG ansible on_any args <ansible.executor.stats.AggregateStats object at 0x7fb69404eb90> kwargs
Any suggestions on how fix this?
4 years, 5 months
How to connect to a guest with vGPU ?
by Josep Manel Andrés Moscardó
Hi,
I got vGPU through mdev working but I am wondering how I would connect
to the client and make use of the GPU. So far I try to access the
console through SPICE and at some point in the boot process it switches
to GPU and I cannot see anything else.
Thanks.
--
Josep Manel Andrés Moscardó
Systems Engineer, IT Operations
EMBL Heidelberg
T +49 6221 387-8394
4 years, 5 months
Vm suddenly paused with error "vm has paused due to unknown storage error"
by Jasper Siero
Hi all,
Since we upgraded our Ovirt nodes to CentOS 7 a vm (not a specific one but never more then one) will sometimes pause suddenly with the error "VM ... has paused due to unknown storage error". It happens now two times in a month.
The Ovirt node uses san storage for the vm's running on it. When a specific vm is pausing with an error the other vm's keeps running without problems.
The vm runs without problems after unpausing it.
Versions:
CentOS Linux release 7.1.1503
vdsm-4.14.17-0
libvirt-daemon-1.2.8-16
vdsm.log:
VM Channels Listener::DEBUG::2015-10-25 07:43:54,382::vmChannels::95::vds::(_handle_timeouts) Timeout on fileno 78.
libvirtEventLoop::INFO::2015-10-25 07:43:56,177::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother
libvirtEventLoop::DEBUG::2015-10-25 07:43:56,178::vm::5204::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::event Suspended detail 2 opaque None
libvirtEventLoop::INFO::2015-10-25 07:43:56,178::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother
...........
libvirtEventLoop::INFO::2015-10-25 07:43:56,180::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother
specific error part in libvirt vm log:
block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758)
...........
block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758)
engine.log:
2015-10-25 07:44:48,945 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) [a43dcc8] VM diataal-prod-cas1 77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb moved from
Up --> Paused
2015-10-25 07:44:49,003 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-40) [a43dcc8] Correlation ID: null, Call Stack: null, Custom Event
ID: -1, Message: VM diataal-prod-cas1 has paused due to unknown storage error.
Has anyone experienced the same problem or knows a way to solve this?
Kind regards,
Jasper
4 years, 9 months
Ovirt-engine-ha cannot to see live status of Hosted Engine
by asm@pioner.kz
Good day for all.
I have some issues with Ovirt 4.2.6. But now the main this of it:
I have two Centos 7 Nodes with same config and last Ovirt 4.2.6 with Hostedengine with disk on NFS storage.
Also some of virtual machines working good.
But, when HostedEngine running on one node (srv02.local) everything is fine.
After migrating to another node (srv00.local), i see that agent cannot to check livelinness of HostedEngine. After few minutes HostedEngine going to reboot and after some time i see some situation. After migration to another node (srv00.local) all looks OK.
hosted-engine --vm-status commang when HosterEngine on srv00 node:
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : srv02.local
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : False
crc32 : ecc7ad2d
local_conf_timestamp : 78328
Host timestamp : 78328
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=78328 (Tue Sep 18 12:44:18 2018)
host-id=1
score=0
vm_conf_refresh_time=78328 (Tue Sep 18 12:44:18 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUnexpectedlyDown
stopped=False
timeout=Fri Jan 2 03:49:58 1970
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : srv00.local
Host ID : 2
Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 1d62b106
local_conf_timestamp : 326288
Host timestamp : 326288
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=326288 (Tue Sep 18 12:44:21 2018)
host-id=2
score=3400
vm_conf_refresh_time=326288 (Tue Sep 18 12:44:21 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
Log agent.log from srv00.local:
MainThread::INFO::2018-09-18 12:40:51,749::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE
ngine::(consume) VM is powering up..
MainThread::INFO::2018-09-18 12:40:52,052::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400)
MainThread::INFO::2018-09-18 12:41:01,066::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE
ngine::(consume) VM is powering up..
MainThread::INFO::2018-09-18 12:41:01,374::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400)
MainThread::INFO::2018-09-18 12:41:11,393::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(refresh) Global metadata: {'maintenance': False}
MainThread::INFO::2018-09-18 12:41:11,393::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(refresh) Host srv02.local.pioner.kz (id 1): {'conf_on_shared_storage': True, 'extra': 'meta
data_parse_version=1\nmetadata_feature_version=1\ntimestamp=78128 (Tue Sep 18 12:40:58 2018)\nhost-id=1\ns
core=0\nvm_conf_refresh_time=78128 (Tue Sep 18 12:40:58 2018)\nconf_on_shared_storage=True\nmaintenance=Fa
lse\nstate=EngineUnexpectedlyDown\nstopped=False\ntimeout=Fri Jan 2 03:49:58 1970\n', 'hostname': 'srv02.
local.pioner.kz', 'alive': True, 'host-id': 1, 'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down_unexpected', 'detail': 'unknown'}, 'score': 0, 'stopped': False, 'maintenance
': False, 'crc32': 'e18e3f22', 'local_conf_timestamp': 78128, 'host-ts': 78128}
MainThread::INFO::2018-09-18 12:41:11,393::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(refresh) Local (id 2): {'engine-health': {'reason': 'failed liveliness check', 'health': 'b
ad', 'vm': 'up', 'detail': 'Up'}, 'bridge': True, 'mem-free': 12763.0, 'maintenance': False, 'cpu-load': 0
.0364, 'gateway': 1.0, 'storage-domain': True}
MainThread::INFO::2018-09-18 12:41:11,393::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE
ngine::(consume) VM is powering up..
MainThread::INFO::2018-09-18 12:41:11,703::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400)
MainThread::INFO::2018-09-18 12:41:21,716::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE
ngine::(consume) VM is powering up..
MainThread::INFO::2018-09-18 12:41:22,020::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400)
MainThread::INFO::2018-09-18 12:41:31,033::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE
ngine::(consume) VM is powering up..
MainThread::INFO::2018-09-18 12:41:31,344::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400)
As we can see, agent thinking that HostedEngine just in powering up mode. I cannot to do anythink with it. I allready reinstalled many times srv00 node without success.
One time i even has to uninstall ovirt* and vdsm* software. Also here one interesting point, after installing just "yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release42.rpm" on this node i try to install this node from engine web interface with "Deploy" action. But, installation was unsuccesfull, before i didnt install ovirt-hosted-engine-ha on this node. I dont see in documentation that its need bofore installation of new hosts. But this is for information and checking. After installing ovirt-hosted-engine-ha node was installed with HostedEngine support. But the main issue not changed.
Thanks in advance for help.
BR,
Alexandr
4 years, 9 months
Hyperconverged setup - storage architecture - scaling
by Leo David
Hello Everyone,
Reading through the document:
"Red Hat Hyperconverged Infrastructure for Virtualization 1.5
Automating RHHI for Virtualization deployment"
Regarding storage scaling, i see the following statements:
*2.7. SCALINGRed Hat Hyperconverged Infrastructure for Virtualization is
supported for one node, and for clusters of 3, 6, 9, and 12 nodes.The
initial deployment is either 1 or 3 nodes.There are two supported methods
of horizontally scaling Red Hat Hyperconverged Infrastructure for
Virtualization:*
*1 Add new hyperconverged nodes to the cluster, in sets of three, up to the
maximum of 12 hyperconverged nodes.*
*2 Create new Gluster volumes using new disks on existing hyperconverged
nodes.You cannot create a volume that spans more than 3 nodes, or expand an
existing volume so that it spans across more than 3 nodes at a time*
*2.9.1. Prerequisites for geo-replicationBe aware of the following
requirements and limitations when configuring geo-replication:One
geo-replicated volume onlyRed Hat Hyperconverged Infrastructure for
Virtualization (RHHI for Virtualization) supports only one geo-replicated
volume. Red Hat recommends backing up the volume that stores the data of
your virtual machines, as this is usually contains the most valuable data.*
------
Also in oVirtEngine UI, when I add a brick to an existing volume i get the
following warning:
*"Expanding gluster volume in a hyper-converged setup is not recommended as
it could lead to degraded performance. To expand storage for cluster, it is
advised to add additional gluster volumes." *
Those things are raising a couple of questions that maybe for some for you
guys are easy to answer, but for me it creates a bit of confusion...
I am also referring to RedHat product documentation, because I treat
oVirt as production-ready as RHHI is.
*1*. Is there any reason for not going to distributed-replicated volumes (
ie: spread one volume across 6,9, or 12 nodes ) ?
- ie: is recomanded that in a 9 nodes scenario I should have 3 separated
volumes, but how should I deal with the folowing question
*2.* If only one geo-replicated volume can be configured, how should I
deal with 2nd and 3rd volume replication for disaster recovery
*3.* If the limit of hosts per datacenter is 250, then (in theory ) the
recomended way in reaching this treshold would be to create 20 separated
oVirt logical clusters with 12 nodes per each ( and datacenter managed from
one ha-engine ) ?
*4.* In present, I have the folowing one 9 nodes cluster , all hosts
contributing with 2 disks each to a single replica 3 distributed
replicated volume. They where added to the volume in the following order:
node1 - disk1
node2 - disk1
......
node9 - disk1
node1 - disk2
node2 - disk2
......
node9 - disk2
At the moment, the volume is arbitrated, but I intend to go for full
distributed replica 3.
Is this a bad setup ? Why ?
It oviously brakes the redhat recommended rules...
Is there anyone so kind to discuss on these things ?
Thank you very much !
Leo
--
Best regards, Leo David
--
Best regards, Leo David
4 years, 10 months
disk locked after export as OVA
by adam_xu@adagene.com.cn
Hello, everyone. I tried to export a vm as OVA. I got a error in webui:
VDSM ovirt1.ntbaobei.com command HSMGetAllTasksStatusesVDS failed: Volume Group not big enough: (u'Not enough free extents for extending LV d9f2378f-92f0-4bc1-96a8-20a0f2c575cb/c515a3f9-0590-4ebe-81c5-9d0993f1fec9 (free=848, needed=872)',)
seems no enough space in the storage domain.
I deleted the vm which i wanted to export as OVA before, but I saw a disk with id 8140d2e7-6908-438e-a646-23e58a33913e left in the storage and the status is locked.
here's some log in the engine.log:
2018-09-17 14:27:03,467+08 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (EE-ManagedThreadFactory-engine-Thread-107849) [a407f6a9-445d-4ccc-b998-f9dfc6dc67ec] START, CopyImageVDSCommand( CopyImageVDSCommandParameters:{storagePoolId='79432500-ad45-11e8-98f3-00163e188641', ignoreFailoverLimit='false', storageDomainId='d9f2378f-92f0-4bc1-96a8-20a0f2c575cb', imageGroupId='ee55bbe7-8001-4c82-b1a6-0cac7710704b', imageId='701a7472-8c00-4b9f-aa87-4837eb128695', dstImageGroupId='8140d2e7-6908-438e-a646-23e58a33913e', vmId='b3aa623e-3072-430e-87d6-b81a3b07f466', dstImageId='c515a3f9-0590-4ebe-81c5-9d0993f1fec9', imageDescription='', dstStorageDomainId='d9f2378f-92f0-4bc1-96a8-20a0f2c575cb', copyVolumeType='LeafVol', volumeFormat='COW', preallocate='Sparse', postZero='false', discard='false', force='true'}), log id: 164e3046
2018-09-17 14:27:03,467+08 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (EE-ManagedThreadFactory-engine-Thread-107849) [a407f6a9-445d-4ccc-b998-f9dfc6dc67ec] ++ dstImageGUID=8140d2e7-6908-438e-a646-23e58a33913e
I think maybe some command did not been excuted after I export the vm as OVA. I can find the similar case here:
https://access.redhat.com/solutions/1298893
I have used the tool unlock_entity.sh but cannot find any disk that is locked.
So what should I do to delete the disk that is been locked?
yours Adam
5 years, 1 month
ovirt-imagio-proxy upload speed slow
by Dev Ops
I am working on integrating a backup solution for our ovirt environment and having issues with the time it takes to backup the VM's. This backup solution is simply taking a snapshot and making a clone and backing the clone up to a backup server.
A VM that is 100 gig takes 52 minutes to back up. The same VM doing a file backup using the same product, and bypassing their rhv plugin, takes 14 minutes. So the throughput is there but the ovirt imageio-proxy process seems to be what manages how images are uploaded and is officially my bottle neck. Load is not high on the engine or kvm hosts.
I had bumped up the Upload image size from 100MB to 10gig weeks ago and that didn't seem to help.
[root@blah-lab-engine ~]# engine-config -a |grep Upload
UploadImageChunkSizeKB: 10240000 version: general
[root@bgl-vms-engine ~]# rpm -qa |grep ovirt-image
ovirt-imageio-proxy-1.4.6-1.el7.noarch
ovirt-imageio-common-1.4.6-1.el7.x86_64
ovirt-imageio-proxy-setup-1.4.6-1.el7.noarch
I have seen bugs reported to redhat about this but I am running above the affected releases.
engine software is 4.2.8.2-1.el7
Any idea what we can tweak to open up this bottleneck?
5 years, 2 months
Re: Cannot Increase Hosted Engine VM Memory
by Douglas Duckworth
Yes, I do. Gold crown indeed.
It's the "HostedEngine" as seen attached!
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit<https://scu.med.cornell.edu>
Weill Cornell Medicine
1300 York Avenue
New York, NY 10065
E: doug(a)med.cornell.edu<mailto:doug@med.cornell.edu>
O: 212-746-6305
F: 212-746-8690
On Wed, Jan 23, 2019 at 12:02 PM Simone Tiraboschi <stirabos(a)redhat.com<mailto:stirabos@redhat.com>> wrote:
On Wed, Jan 23, 2019 at 5:51 PM Douglas Duckworth <dod2014(a)med.cornell.edu<mailto:dod2014@med.cornell.edu>> wrote:
Hi Simone
Can I get help with this issue? Still cannot increase memory for Hosted Engine.
From the logs it seams that the engine is trying to hotplug memory to the engine VM which is something it should not happen.
The engine should simply update engine VM configuration in the OVF_STORE and require a reboot of the engine VM.
Quick question, in the VM panel do you see a gold crown symbol on the Engine VM?
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit<https://scu.med.cornell.edu>
Weill Cornell Medicine
1300 York Avenue
New York, NY 10065
E: doug(a)med.cornell.edu<mailto:doug@med.cornell.edu>
O: 212-746-6305
F: 212-746-8690
On Thu, Jan 17, 2019 at 8:08 AM Douglas Duckworth <dod2014(a)med.cornell.edu<mailto:dod2014@med.cornell.edu>> wrote:
Sure, they're attached. In "first attempt" the error seems to be:
2019-01-17 07:49:24,795-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-29) [680f82b3-7612-4d91-afdc-43937aa298a2] EVENT_ID: FAILED_HOT_SET_MEMORY_NOT_DIVIDABLE(2,048), Failed to hot plug memory to VM HostedEngine. Amount of added memory (4000MiB) is not dividable by 256MiB.
Followed by:
2019-01-17 07:49:24,814-05 WARN [org.ovirt.engine.core.bll.UpdateRngDeviceCommand] (default task-29) [26f5f3ed] Validation of action 'UpdateRngDevice' failed for user admin@internal-authz. Reasons: ACTION_TYPE_FAILED_VM_IS_RUNNING
2019-01-17 07:49:24,815-05 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-29) [26f5f3ed] Updating RNG device of VM HostedEngine (adf14389-1563-4b1a-9af6-4b40370a825b) failed. Old RNG device = VmRngDevice:{id='VmDeviceId:{deviceId='6435b2b5-163c-4f0c-934e-7994da60dc89', vmId='adf14389-1563-4b1a-9af6-4b40370a825b'}', device='virtio', type='RNG', specParams='[source=urandom]', address='', managed='true', plugged='true', readOnly='false', deviceAlias='', customProperties='null', snapshotId='null', logicalName='null', hostDevice='null'}. New RNG device = VmRngDevice:{id='VmDeviceId:{deviceId='6435b2b5-163c-4f0c-934e-7994da60dc89', vmId='adf14389-1563-4b1a-9af6-4b40370a825b'}', device='virtio', type='RNG', specParams='[source=urandom]', address='', managed='true', plugged='true', readOnly='false', deviceAlias='', customProperties='null', snapshotId='null', logicalName='null', hostDevice='null'}.
In "second attempt" I used values that are dividable by 256 MiB so that's no longer present. Though same error:
2019-01-17 07:56:59,795-05 INFO [org.ovirt.engine.core.vdsbroker.SetAmountOfMemoryVDSCommand] (default task-22) [7059a48f] START, SetAmountOfMemoryVDSCommand(HostName = ovirt-hv1.med.cornell.edu<http://ovirt-hv1.med.cornell.edu>, Params:{hostId='cdd5ffda-95c7-4ffa-ae40-be66f1d15c30', vmId='adf14389-1563-4b1a-9af6-4b40370a825b', memoryDevice='VmDevice:{id='VmDeviceId:{deviceId='7f7d97cc-c273-4033-af53-bc9033ea3abe', vmId='adf14389-1563-4b1a-9af6-4b40370a825b'}', device='memory', type='MEMORY', specParams='[node=0, size=2048]', address='', managed='true', plugged='true', readOnly='false', deviceAlias='', customProperties='null', snapshotId='null', logicalName='null', hostDevice='null'}', minAllocatedMem='6144'}), log id: 50873daa
2019-01-17 07:56:59,855-05 INFO [org.ovirt.engine.core.vdsbroker.SetAmountOfMemoryVDSCommand] (default task-22) [7059a48f] FINISH, SetAmountOfMemoryVDSCommand, log id: 50873daa
2019-01-17 07:56:59,862-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-22) [7059a48f] EVENT_ID: HOT_SET_MEMORY(2,039), Hotset memory: changed the amount of memory on VM HostedEngine from 4096 to 4096
2019-01-17 07:56:59,881-05 WARN [org.ovirt.engine.core.bll.UpdateRngDeviceCommand] (default task-22) [28fd4c82] Validation of action 'UpdateRngDevice' failed for user admin@internal-authz. Reasons: ACTION_TYPE_FAILED_VM_IS_RUNNING
2019-01-17 07:56:59,882-05 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-22) [28fd4c82] Updating RNG device of VM HostedEngine (adf14389-1563-4b1a-9af6-4b40370a825b) failed. Old RNG device = VmRngDevice:{id='VmDeviceId:{deviceId='6435b2b5-163c-4f0c-934e-7994da60dc89', vmId='adf14389-1563-4b1a-9af6-4b40370a825b'}', device='virtio', type='RNG', specParams='[source=urandom]', address='', managed='true', plugged='true', readOnly='false', deviceAlias='', customProperties='null', snapshotId='null', logicalName='null', hostDevice='null'}. New RNG device = VmRngDevice:{id='VmDeviceId:{deviceId='6435b2b5-163c-4f0c-934e-7994da60dc89', vmId='adf14389-1563-4b1a-9af6-4b40370a825b'}', device='virtio', type='RNG', specParams='[source=urandom]', address='', managed='true', plugged='true', readOnly='false', deviceAlias='', customProperties='null', snapshotId='null', logicalName='null', hostDevice='null'}.
This message repeats throughout engine.log:
2019-01-17 07:55:43,270-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [] EVENT_ID: VM_MEMORY_UNDER_GUARANTEED_VALUE(148), VM HostedEngine on host ovirt-hv1.med.cornell.edu<http://ovirt-hv1.med.cornell.edu> was guaranteed 8192 MB but currently has 4224 MB
As you can see attached the host has plenty of memory.
Thank you Simone!
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit<https://scu.med.cornell.edu>
Weill Cornell Medicine
1300 York Avenue
New York, NY 10065
E: doug(a)med.cornell.edu<mailto:doug@med.cornell.edu>
O: 212-746-6305
F: 212-746-8690
On Thu, Jan 17, 2019 at 5:09 AM Simone Tiraboschi <stirabos(a)redhat.com<mailto:stirabos@redhat.com>> wrote:
On Wed, Jan 16, 2019 at 8:22 PM Douglas Duckworth <dod2014(a)med.cornell.edu<mailto:dod2014@med.cornell.edu>> wrote:
Sorry for accidental send.
Anyway I try to increase physical memory however it won't go above 4096MB. The hypervisor has 64GB.
Do I need to modify this value with Hosted Engine offline?
No, it's not required.
Can you please attach your engine.log for the relevant time frame?
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit<https://scu.med.cornell.edu>
Weill Cornell Medicine
1300 York Avenue
New York, NY 10065
E: doug(a)med.cornell.edu<mailto:doug@med.cornell.edu>
O: 212-746-6305
F: 212-746-8690
On Wed, Jan 16, 2019 at 1:58 PM Douglas Duckworth <dod2014(a)med.cornell.edu<mailto:dod2014@med.cornell.edu>> wrote:
Hello
I am trying to increase Hosted Engine physical memory above 4GB
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit<https://scu.med.cornell.edu>
Weill Cornell Medicine
1300 York Avenue
New York, NY 10065
E: doug(a)med.cornell.edu<mailto:doug@med.cornell.edu>
O: 212-746-6305
F: 212-746-8690
_______________________________________________
Users mailing list -- users(a)ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to users-leave(a)ovirt.org<mailto:users-leave@ovirt.org>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ovirt.org_site_p...>
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ovirt.org_commun...>
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WGSXQVVPJJ2...<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ovirt.org_arch...>
5 years, 2 months
Hosted-engine inaccessible
by Tau Makgaile
Hi,
I am currently experiencing a problem with my Hosted-engine. Hosted-engine
disconnected after increasing / partition. The increase went well but after
some time the hosted-enigine VM disconnected and has since been giving
alerts such as* re-initializingFSM*.
Though VMs underneth are running, Hosted-engine --vm-status:
*"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail"*
There is no backup to restore at the moment. I am looking for a way to
bring it up without redeploying the hosted engine.
Thanks in advance for your help.
Kind regards,
Tau
5 years, 2 months