Re: Manual Migration not working and Dashboard broken after 4.3.4 update
by Strahil
I'm not sure, but I always thought that you need an agent for live migrations.
You can always try installing either qemu-guest-agent or ovirt-guest-agent and check if live migration between hosts is possible.
Have you set the new cluster/dc version ?
Best Regards
Strahil NikolovOn Jul 9, 2019 17:42, Neil <nwilson123(a)gmail.com> wrote:
>
> I remember seeing the bug earlier but because it was closed thought it was unrelated, this appears to be it....
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1670701
>
> Perhaps I'm not understanding your question about the VM guest agent, but I don't have any guest agent currently installed on the VM, not sure if the output of my qemu-kvm process maybe answers this question?....
>
> /usr/libexec/qemu-kvm -name guest=Headoffice.cbl-ho.local,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-Headoffice.cbl-ho.lo/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Broadwell,vme=on,f16c=on,rdrand=on,hypervisor=on,arat=on,xsaveopt=on,abm=on,rtm=on,hle=on -m 8192 -realtime mlock=off -smp 8,maxcpus=64,sockets=16,cores=4,threads=1 -numa node,nodeid=0,cpus=0-7,mem=8192 -uuid 9a6561b8-5702-43dc-9e92-1dc5dfed4eef -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=4C4C4544-0034-5810-8033-C2C04F4E4B32,uuid=9a6561b8-5702-43dc-9e92-1dc5dfed4eef -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=31,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-07-09T10:26:53,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/59831b91-00a5-01e4-0294-000000000018/8a607f8a-542a-473c-bb18-25c05fe2a3d4/images/56e8240c-a172-4f52-b0c1-2bddc4f34f93/9f245467-d31d-4f5a-8037-7c5012a4aa84,format=qcow2,if=none,id=drive-virtio-disk0,serial=56e8240c-a172-4f52-b0c1-2bddc4f34f93,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:5b,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=35,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,fd=36,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice tls-port=5900,addr=10.0.1.11,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
>
> Please shout if you need further info.
>
> Thanks.
>
>
>
>
>
>
> On Tue, Jul 9, 2019 at 4:17 PM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
>>
>> Shouldn't cause that problem.
>>
>> You have to find the bug in bugzilla and report a regression (if it's not closed) , or open a new one and report the regression.
>> As far as I remember , only the dashboard was affected due to new features about vdo disk savings.
>>
5 years, 3 months
Leeroy Jenkins cloud plugin for oVirt
by Vrgotic, Marko
Hey oVirt,
How are you all doing?
Has anyone used or has experience with Leeroy Jenkins on oVIrt.
We need to move master ci and its slaves to oVirt, but cannot find suitable plugin for it. For builds we run, Jenkins need to spawn slaves, for which plugin is required.
Please help if possible.
Kindly awaiting your reply.
— — —
Met vriendelijke groet / Kind regards,
Marko Vrgotic
5 years, 3 months
Trouble initializing new VM in 4.2.8.2-1.el7
by David Johnson
Good evening,
Thanks in advance,
I'm trying to set up a new VM in my cluster, and things appear to have hung
up initializing the 80GB boot partition for a new Windows VM. The system
has been sitting like this for 6 hours.
What am I looking for to resolve this issue? Can I kill this process safely
and start over?
[image: image.png]
[image: image.png]
Regards,
David Johnson
Director of Development, Maxis Technology
844.696.2947 ext 702 (o) | 479.531.3590 (c)
djohnson(a)maxistechnology.com
[image: Maxis Techncology] <http://www.maxistechnology.com>
www.maxistechnology.com
*stay connected <http://www.linkedin.com/in/pojoguy>*
5 years, 3 months
Moving data to new storage appliance
by David Johnson
Hi everyone,
I'm sorry to bother y'all with another noob question.
We are in the process of retiring the old storage appliance that backed our
oVirt cluster in favor of a new appliance. I have migrated all of the
active VM storage to the new appliance, but can't see how to migrate the
"base versions".
My understanding is that if I just drop the storage with the base versions
then the derived VM's will cease to be functional.
Please advise.
5 years, 3 months
bond for vm interfaces
by Edoardo Mazza
Hello everyone,
I need to create a bond for vm interfaces but I don't kown what is the best solution, you can help me?
Thanks
Edoardo
5 years, 3 months
Re: Active-Passive DR: mutual for different storage domains possible?
by Gianluca Cecchi
On Thu, Jul 25, 2019 at 2:21 PM Eyal Shenitzky <eshenitz(a)redhat.com> wrote:
> On Thu, Jul 25, 2019 at 3:02 PM Gianluca Cecchi <gianluca.cecchi(a)gmail.com>
> wrote:
>
>> On Thu, Jul 25, 2019 at 1:54 PM Eyal Shenitzky <eshenitz(a)redhat.com>
>> wrote:
>>
>>>
>>> Please notice that a automation python scripts created in order to
>>> facilitate the DR process.
>>> You can find them under - path/to/your/dr/folder/files.
>>>
>>> You can use those scripts to generate the mapping, test the generated
>>> mapping and start the failover/failback.
>>>
>>> I strongly recommend to use it.
>>>
>>>
>> Yes, I have used it to create the disaster_recovery_vars.yml mapping file
>> and then populating it with the secondary site information, thanks.
>> My doubt was about any difference in playbook actions between "failover"
>> (3.3) and "discreet failover test" (B.1), as the executed playbook and
>> tags are the same.
>>
>
> No, the only difference is that you disable the storage replication by
> yourself, this way you can test the failover while the other "primary" site
> is still active.
>
>
First "discreet failover test" was a success!!! Great.
Storage domain attached, templates imported and the only VM defined at
source correctly started (at source I configured link down for the VM,
inherited at target, so no collisions).
Elapsed between beginning of ovirt connection, until first template import
has been about 6 minutes.
...
Template TOL76 has been successfully imported from the given configuration.
7/25/19 3:26:58 PM
Storage Domain ovsd3910 was attached to Data Center SVIZ3-DR by
admin@internal-authz 7/25/19 3:26:46 PM
Storage Domains were attached to Data Center SVIZ3-DR by
admin@internal-authz 7/25/19 3:26:46 PM
Storage Domain ovsd3910 (Data Center SVIZ3-DR) was activated by
admin@internal-authz 7/25/19 3:26:46 PM
...
Storage Pool Manager runs on Host ovh201. (Address: ovh201.), Data Center
SVIZ3-DR. 7/25/19 3:26:36 PM
Data Center is being initialized, please wait for initialization to
complete. 7/25/19 3:23:53 PM
Storage Domain ovsd3910 was added by admin@internal-authz 7/25/19 3:20:43 PM
Disk Profile ovsd3910 was successfully added (User: admin@internal-authz).
7/25/19 3:20:42 PM
User admin@internal-authz connecting from '10.4.192.43' using session 'xxx'
logged in. 7/25/19 3:20:35 PM
Some notes:
1) iSCSI multipath
my storage domains are iSCSI based and my hosts have two network cards to
reach the storage.
I'm using EQL that doesn't support bonding and has one portal that all
initiators use.
So in my primary env I configured "iSCSI Multipathing" tab in Compute -->
Datacenter --> Datacenter_Name window.
But this tab appears only when you activate the storage.
So during the ansible playbook run the iSCSI connection has been activated
through the "default" iscsi interface
I can then:
- configure "iSCSI Multipathing"
- shutdown VM
- put host into maintenance
- remove the default iSCSI session that has not been removed on host
iscsiadm -m session -r 6 -u
- activate host
now I have:
[root@ov201 ~]# iscsiadm -m session
tcp: [10] 10.10.100.8:3260,1
iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910
(non-flash)
tcp: [9] 10.10.100.8:3260,1
iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910
(non-flash)
[root@ov201 ~]#
with
# multipath -l
364817197c52fd899acbe051e0377dd5b dm-29 EQLOGIC ,100E-00
size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
|- 23:0:0:0 sdb 8:16 active undef running
`- 24:0:0:0 sdc 8:32 active undef running
- start vm
The I do a cleanup:
1. Detach the storage domains from the secondary site.
2. Enable storage replication between the primary and secondary storage
domains.
The storage domain remains as "unattached" in DR environment
Then I executed the test again and during connection I got this error about
40 seconds after run of playbook
TASK [oVirt.disaster-recovery : Import iSCSI storage domain]
***************************************
An exception occurred during task execution. To see the full traceback, use
-vvv. The error was: Error: Fault reason is "Operation Failed". Fault
detail is "[]". HTTP response code is 400.
failed: [localhost]
(item=iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910)
=> {"ansible_loop_var": "dr_target", "changed": false, "dr_target":
"iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910",
"msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP
response code is 400."}
In webadmin gui of DR env I see:
VDSM ov201 command CleanStorageDomainMetaDataVDS failed: Cannot obtain
lock: "id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
by another host')" 7/25/19 4:50:43 PM
What could be the cause of this?
In vdsm.log:
2019-07-25 16:50:43,196+0200 INFO (jsonrpc/1) [vdsm.api] FINISH
forcedDetachStorageDomain error=Cannot obtain lock:
"id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
by another host')" from=::ffff:10.4.192.79,49038, flow_id=4bd330d1,
task_id=c0dfac81-5c58-427d-a7d0-e8c695448d27 (api:52)
2019-07-25 16:50:43,196+0200 ERROR (jsonrpc/1) [storage.TaskManager.Task]
(Task='c0dfac81-5c58-427d-a7d0-e8c695448d27') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
in _run
return fn(*args, **kargs)
File "<string>", line 2, in forcedDetachStorageDomain
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 856, in
forcedDetachStorageDomain
self._detachStorageDomainFromOldPools(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 834, in
_detachStorageDomainFromOldPools
dom.acquireClusterLock(host_id)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 910, in
acquireClusterLock
self._manifest.acquireDomainLock(hostID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 527, in
acquireDomainLock
self._domainLock.acquire(hostID, self.getDomainLease())
File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
419, in acquire
"Cannot acquire %s" % (lease,), str(e))
AcquireLockFailure: Cannot obtain lock:
"id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
by another host')"
2019-07-25 16:50:43,196+0200 INFO (jsonrpc/1) [storage.TaskManager.Task]
(Task='c0dfac81-5c58-427d-a7d0-e8c695448d27') aborting: Task is aborted:
'Cannot obtain lock: "id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243,
out=Cannot acquire Lease(name=\'SDM\',
path=\'/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases\', offset=1048576),
err=(-243, \'Sanlock resource not acquired\', \'Lease is held by another
host\')"' - code 651 (task:1181)
2019-07-25 16:50:43,197+0200 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH
forcedDetachStorageDomain error=Cannot obtain lock:
"id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
by another host')" (dispatcher:83)
2019-07-25 16:50:43,197+0200 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
call StorageDomain.detach failed (error 651) in 24.12 seconds (__init__:312)
2019-07-25 16:50:44,180+0200 INFO (jsonrpc/6) [api.host] START getStats()
from=::ffff:10.4.192.79,49038 (api:48)
2019-07-25 16:50:44,222+0200 INFO (jsonrpc/6) [vdsm.api] START
repoStats(domains=()) from=::ffff:10.4.192.79,49038,
task_id=8a7a0302-4ee3-49a8-a3f7-f9636a123765 (api:48)
2019-07-25 16:50:44,222+0200 INFO (jsonrpc/6) [vdsm.api] FINISH repoStats
return={} from=::ffff:10.4.192.79,49038,
task_id=8a7a0302-4ee3-49a8-a3f7-f9636a123765 (api:54)
2019-07-25 16:50:44,223+0200 INFO (jsonrpc/6) [vdsm.api] START
multipath_health() from=::ffff:10.4.192.79,49038,
task_id=fb09923c-0888-4c3f-9b8a-a7750592da22 (api:48)
2019-07-25 16:50:44,223+0200 INFO (jsonrpc/6) [vdsm.api] FINISH
multipath_health return={} from=::ffff:10.4.192.79,49038,
task_id=fb09923c-0888-4c3f-9b8a-a7750592da22 (api:54)
After putting host into maintenance + reboot of the host and re-run of
playbook, all went well again..
2) Mac Address pools
I notice that the imported VM has preserved the link state (down in my
case) but not the mac address.
The mac address is the one defined in my target engine, that is different
from source engine to avoid overlap of mac adresses
Is this an option I can customize?
In general a VM could have problems when changing its mac address...
3) Clean up dest DC after "discreet failover test"
The guide says:
1. Detach the storage domains from the secondary site.
2. Enable storage replication between the primary and secondary storage
domains.
better to also restart the DR hosts?
4) VM consistency
Can we say that all the imported VMs will be "crash consistent"?
Thanks,
Gianluca
5 years, 3 months
Re: engine-setup failure on 4.1 -> 4.2
by Strahil
Try to remove the maintenance, wait 10 min and then put it back:
hosted-engine --set-maintenance --mode=none; sleep 600; hosted-engine --set-maintenance --mode=global ; sleep 300
Best Regards,
Strahil NikolovOn Jul 26, 2019 20:49, Alex K <rightkicktech(a)gmail.com> wrote:
>
> I repeated the same upgrade steps to another cluster and although I was receiving same warnings about the db the upgrade completed successfully.
>
> Is there a way I can manually inform engine db about the maintenance status? I was thinking that in this way the engine would proceed with the remaining steps.
>
> On Thu, Jul 25, 2019 at 3:55 PM Alex K <rightkicktech(a)gmail.com> wrote:
>>
>> Hi all,
>>
>> I have a self hosted engine setup, with 3 servers.
>> I have successfully upgraded several other installations, from 4.1 to 4.2.
>> On one of them I am encountering and issue with the engine-setup.
>>
>> I get the following warning:
>>
>> Found the following problems in PostgreSQL configuration for the Engine database:
>> It is required to be at least '8192'
>>
>> Please note the following required changes in postgresql.conf on 'localhost':
>> 'work_mem' is currently '1024'. It is required to be at least '8192'.
>> postgresql.conf is usually in /var/lib/pgsql/data, /var/opt/rh/rh-postgresql95/lib/pgsql/data, or somewhere under /etc/postgresql* . You have to restart PostgreSQL after making these changes.
>> The database requires these configurations values to be changed. Setup can fix them for you or abort. Fix automatically? (Yes, No) [Yes]:
>>
>> Then, if I select Yes to proceed, I get:
>>
>> [WARNING] This release requires PostgreSQL server 9.5.14 but the engine database is currently hosted on PostgreSQL server 9.2.24.
>>
>> Then finally:
>> [ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode.
>> In that case you should put the system into the "Global Maintenance" mode before running engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data.
>>
>> [ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup detected, but Global Maintenance is not set.
>> [ INFO ] Stage: Clean up
>> Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20190725124653-hvekp2.log
>> [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20190725125154-setup.conf'
>> [ INFO ] Stage: Pre-termination
>> [ INFO ] Stage: Termination
>> [ ERROR ] Execution of setup failed
>>
>> I have put the cluster on global maintenance, though the engine thinks it is not.
>>
>> Are any steps that I may follow to avoid above?
>> I am attaching also the last full setup log.
>> Thank you!
>>
>> Alex
>>
>>
>>
5 years, 3 months
engine-setup failure on 4.1 -> 4.2
by Alex K
Hi all,
I have a self hosted engine setup, with 3 servers.
I have successfully upgraded several other installations, from 4.1 to 4.2.
On one of them I am encountering and issue with the engine-setup.
I get the following warning:
Found the following problems in PostgreSQL configuration for the Engine
database:
It is required to be at least '8192'
Please note the following required changes in postgresql.conf on
'localhost':
'work_mem' is currently '1024'. It is required to be at least
'8192'.
postgresql.conf is usually in /var/lib/pgsql/data,
/var/opt/rh/rh-postgresql95/lib/pgsql/data, or somewhere under
/etc/postgresql* . You have to restart PostgreSQL after making these
changes.
The database requires these configurations values to be changed.
Setup can fix them for you or abort. Fix automatically? (Yes, No) [Yes]:
Then, if I select Yes to proceed, I get:
[WARNING] This release requires PostgreSQL server 9.5.14 but the engine
database is currently hosted on PostgreSQL server 9.2.24.
Then finally:
[ ERROR ] It seems that you are running your engine inside of the
hosted-engine VM and are not in "Global Maintenance" mode.
In that case you should put the system into the "Global
Maintenance" mode before running engine-setup, or the hosted-engine HA
agent might kill the machine, which might corrupt your data.
[ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup
detected, but Global Maintenance is not set.
[ INFO ] Stage: Clean up
Log file is located at
/var/log/ovirt-engine/setup/ovirt-engine-setup-20190725124653-hvekp2.log
[ INFO ] Generating answer file
'/var/lib/ovirt-engine/setup/answers/20190725125154-setup.conf'
[ INFO ] Stage: Pre-termination
[ INFO ] Stage: Termination
[ ERROR ] Execution of setup failed
I have put the cluster on global maintenance, though the engine thinks it
is not.
Are any steps that I may follow to avoid above?
I am attaching also the last full setup log.
Thank you!
Alex
5 years, 4 months