oVirt 4.4 pending VM changes on HostedEngine VM
by Radoslav Milanov
Hello
I'm not sure how but I've got pending changes on the HostedEngine VM on
a 3 node 4.4 cluster and those changes are never applied. Is there a way
to cancel pending VM changes in general ?
Thanks.
4 years, 6 months
Adding extra parameters to qemu-kvm
by Sakari Poussa
Hi,
Before I dive in too deeply to vdsm-hook-qemucmdline I'd like to know if it
is the correct solution to my needs.
I want to add the following (type of) parameters to VM startup
-object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=$PMEM_SIZE,align=2M
\-device nvdimm,id=nvdimm1,memdev=mem1,label-size=128k \
Is the vdsm-hook-qemucmdline the correct tool? Can I use (add the params)
it also from the ansible playbook (via ovirt roles) somehow?
Thanks, Sakari
4 years, 6 months
Current Architecture of oVirt and Quality Assurance measures
by Juergen Novak
I am currently evaluating oVirt for the usage in big companies as base
system for edge computing applications.
I already tried to find a current architecture overview regarding the
oVirt modules -- particularly ovirt-node-ng and imgbased. But I did not
manage to find any. There is some architecture documentation, but marked
as outdated and many links go to 404.
So I would ask, if somebody could provide any current architecture
overview (no user documentation), particularly of the node module(s).
An other question I have, is which quality measures are used in development?
* Tools used (for Unit-Tests, Component-Tests, Integration-Tests)
* Are there any statistics about test coverage?
Any assistance would be appreciated.
Best Regards
/juergen
4 years, 6 months
oVirt 4.4 gluster single host vm paused due to unknown storage error
by Gianluca Cecchi
Hello,
in environment in subject, I downloaded from glance repository the CentOS 8
image
CentOS 8 Generic Cloud Image v20200113.3 for x86_64 (5e35c84)
and imported as a template.
I created a vm based on it (I got the message: "In order to create a VM
from a template with a different chipset, device configuration will be
changed. This may affect functionality of the guest software. Are you sure
you want to proceed?")
When running "dnf update", during I/O of packages updates the VM went into
pause.
VM c8desktop started on Host novirt2.example.net 5/28/201:29:04 PM
VM c8desktop has been paused. 5/28/201:41:52 PM
VM c8desktop has been paused due to unknown storage error. 5/28/201:41:52 PM
VM c8desktop has recovered from paused back to up. 5/28/201:43:50 PM
In messages of (nested) host I see:
May 28 13:28:06 novirt2 systemd-machined[1497]: New machine
qemu-7-c8desktop.
May 28 13:28:06 novirt2 systemd[1]: Started Virtual Machine
qemu-7-c8desktop.
May 28 13:28:06 novirt2 kvm[57798]: 2 guests now active
May 28 13:28:07 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:12 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:17 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:22 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:27 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:32 novirt2 journal[13368]: Guest agent is not responding: QEMU
guest agent is not connected
May 28 13:28:37 novirt2 journal[13368]: Domain id=7 name='c8desktop'
uuid=63e27cb5-087d-435e-bf61-3fe25e3319d
6 is tainted: custom-ga-command
May 28 13:28:37 novirt2 journal[26984]: Cannot open log file:
'/var/log/libvirt/qemu/c8desktop.log': Device o
r resource busy
May 28 13:28:37 novirt2 journal[13368]: Cannot open log file:
'/var/log/libvirt/qemu/c8desktop.log': Device o
r resource busy
May 28 13:28:37 novirt2 journal[13368]: Unable to open domainlog
May 28 13:30:00 novirt2 systemd[1]: Starting system activity accounting
tool...
May 28 13:30:00 novirt2 systemd[1]: Started system activity accounting tool.
May 28 13:37:21 novirt2 python3[62512]: detected unhandled Python exception
in '/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:21 novirt2 abrt-server[62514]: Deleting problem directory
Python3-2020-05-28-13:37:21-62512 (dup of Python3-2020-05-28-10:32:57-29697)
May 28 13:37:21 novirt2 dbus-daemon[1502]: [system] Activating service
name='org.freedesktop.problems' requested by ':1.3111' (uid=0 pid=62522
comm="/usr/libexec/platform-python /usr/bin/abrt-action-"
label="system_u:system_r:abrt_t:s0-s0:c0.c1023") (using servicehelper)
May 28 13:37:21 novirt2 dbus-daemon[1502]: [system] Successfully activated
service 'org.freedesktop.problems'
May 28 13:37:21 novirt2 abrt-server[62514]: /bin/sh:
reporter-systemd-journal: command not found
May 28 13:37:21 novirt2 python3[62550]: detected unhandled Python exception
in '/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:21 novirt2 abrt-server[62552]: Not saving repeating crash in
'/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:22 novirt2 python3[62578]: detected unhandled Python exception
in '/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:37:22 novirt2 abrt-server[62584]: Not saving repeating crash in
'/usr/lib/python3.6/site-packages/vdsm/gluster/gfapi.py'
May 28 13:40:00 novirt2 systemd[1]: Starting system activity accounting
tool...
May 28 13:40:00 novirt2 systemd[1]: Started system activity accounting tool.
On the related gluster volume log where vm disk is I have:
[2020-05-28 11:41:33.892074] W [MSGID: 114031]
[client-rpc-fops_v2.c:679:client4_0_writev_cbk] 0-vmstore-client-0: remote
operation failed [Invalid argument]
[2020-05-28 11:41:33.892140] W [fuse-bridge.c:2925:fuse_writev_cbk]
0-glusterfs-fuse: 348168: WRITE => -1
gfid=35ae86e8-0ccd-48b8-9ef2-6ca9a108ccf9 fd=0x7fd1d800cf38 (Invalid
argument)
[2020-05-28 11:41:33.902984] I [MSGID: 133022]
[shard.c:3693:shard_delete_shards] 0-vmstore-shard: Deleted shards of
gfid=35ae86e8-0ccd-48b8-9ef2-6ca9a108ccf9 from backend
[2020-05-28 11:41:52.434362] E [MSGID: 133010]
[shard.c:2339:shard_common_lookup_shards_cbk] 0-vmstore-shard: Lookup on
shard 6 failed. Base file gfid = 3e12e7fe-6a77-41b8-932a-d4f50c41ac00 [No
such file or directory]
[2020-05-28 11:41:52.434423] W [fuse-bridge.c:2925:fuse_writev_cbk]
0-glusterfs-fuse: 353565: WRITE => -1
gfid=3e12e7fe-6a77-41b8-932a-d4f50c41ac00 fd=0x7fd208093fb8 (No such file
or directory)
[2020-05-28 11:46:34.095697] W [MSGID: 114031]
[client-rpc-fops_v2.c:679:client4_0_writev_cbk] 0-vmstore-client-0: remote
operation failed [Invalid argument]
[2020-05-28 11:46:34.095758] W [fuse-bridge.c:2925:fuse_writev_cbk]
0-glusterfs-fuse: 384006: WRITE => -1
gfid=7b804a1a-1734-4bec-b8f4-9ba33ffefe8b fd=0x7fd1d0005fd8 (Invalid
argument)
[2020-05-28 11:46:34.104494] I [MSGID: 133022]
[shard.c:3693:shard_delete_shards] 0-vmstore-shard: Deleted shards of
gfid=7b804a1a-1734-4bec-b8f4-9ba33ffefe8b from backend
Very similar sharding message when in 4.3.9 in single host with gluster and
"heavy"/sudden I/O operations when using thin provisioned disks.....
I see in 4.4 Gluster is glusterfs-7.5-1.el8.x86_64
Can this be a problem that only appears in single host as there are indeed
no data travellling to the network to sync nodes and sharding feature for
some reason is not able to keep pace, when local disk is very fast?
In 4.3.9 single host with gluster 6.8-1 my only way to solve was to disable
sharding.. and I got final stability see here
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/OIN4R63I6ITO...
Still waiting for Gluster devs comments on logs provided at that time
As I already wrote, in my opinion the single host wizard should set
sharding off in automatic, because in that environment can make thin
provisioned disks unusable.
In case of future additions of nodes, the setup can make a check and say
the user that he/she should re-enable sharding....
Just my 0.2 eurocent
Gianluca
4 years, 6 months
oVirt 4.4 HE on Copy local VM disk to shared storage (NFS) failing
by Anton Gonzalez
Deploy via cockpit fails after copy local VM disk to shared storage, Exit
HE maintenance mode, Check engine VM health. The HE VM does not start from
shared storage with hosted-engine --vm-status returning down_unexpected.
I've also tried with iSCSI, with similar results. I'm using CentOS 8.1 now,
but also tried with Node 4.4 with same results. VDSM log attached. If any
other logs are needed, please let me know and I'll add.
I've used the same NFS server to deploy 4.3 without issue, and I ran the
nfs-check.py script to verify that the NFS share was correctly provisioned.
[ INFO ] TASK [ovirt.hosted_engine_setup : Copy local VM disk to shared
storage]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in
/etc/hosts for the local VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Start ovirt-ha-broker service on
the host]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Initialize lockspace volume]
[ INFO ] TASK [ovirt.hosted_engine_setup : Workaround for ovirt-ha-broker
start failures]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Initialize lockspace volume]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Start ovirt-ha-agent service on
the host]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Exit HE maintenance mode]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Check engine VM health]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 180, "changed": true,
"cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
"0:00:00.186082", "end": "2020-05-26 20:05:40.063450", "rc": 0, "start":
"2020-05-26 20:05:39.877368", "stderr": "", "stderr_lines": [], "stdout":
"{\"1\": {\"host-id\": 1, \"host-ts\": 2500, \"score\": 0,
\"engine-status\": {\"vm\": \"down_unexpected\", \"health\": \"bad\",
\"detail\": \"Down\", \"reason\": \"bad vm status\"}, \"hostname\":
\"ovirthost02.po-lite.local\", \"maintenance\": false, \"stopped\": false,
\"crc32\": \"3b77b8b8\", \"conf_on_shared_storage\": true,
\"local_conf_timestamp\": 2500, \"extra\":
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2500
(Tue May 26 20:05:36
2020)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=2500 (Tue May 26
20:05:36
2020)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineUnexpectedlyDown\\nstopped=False\\ntimeout=Wed
Dec 31 19:47:59 1969\\n\", \"live-data\": true}, \"global_maintenance\":
false}", "stdout_lines": ["{\"1\": {\"host-id\": 1, \"host-ts\": 2500,
\"score\": 0, \"engine-status\": {\"vm\": \"down_unexpected\", \"health\":
\"bad\", \"detail\": \"Down\", \"reason\": \"bad vm status\"},
\"hostname\": \"ovirthost02.po-lite.local\", \"maintenance\": false,
\"stopped\": false, \"crc32\": \"3b77b8b8\", \"conf_on_shared_storage\":
true, \"local_conf_timestamp\": 2500, \"extra\":
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2500
(Tue May 26 20:05:36
2020)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=2500 (Tue May 26
20:05:36
2020)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineUnexpectedlyDown\\nstopped=False\\ntimeout=Wed
Dec 31 19:47:59 1969\\n\", \"live-data\": true}, \"global_maintenance\":
false}"]}
[ INFO ] TASK [ovirt.hosted_engine_setup : Check VM status at virt level]
[ INFO ] TASK [ovirt.hosted_engine_setup : Fail if engine VM is not running]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine
VM is not running, please check vdsm logs"}
4 years, 6 months
[ANN] oVirt 4.4.1 Second Release Candidate
by Sandro Bonazzola
oVirt 4.4.1 Second Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of oVirt 4.4.1
Second Release Candidate for testing, as of May 28th, 2020.
This update is the first in a series of stabilization updates to the 4.4
series.
Important notes before you try it
Please note this is a pre-release build.
The oVirt Project makes no guarantees as to its suitability or usefulness.
This pre-release must not be used in production.
Some of the features included in oVirt 4.4.1 Release Candidate require
content that will be available in CentOS Linux 8.2 but can’t be tested on
RHEL 8.2 yet due to some incompatibility in openvswitch package shipped in
CentOS Virt SIG which requires to rebuild openvswitch on top of CentOS 8.2.
Installation instructions
For the engine: either use appliance or:
- Install CentOS Linux 8 minimal from
http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-...
- dnf install
https://resources.ovirt.org/pub/yum-repo/ovirt-release44-pre.rpm
- dnf update (reboot if needed)
- dnf module enable -y javapackages-tools pki-deps postgresql:12
- dnf install ovirt-engine
- engine-setup
For the nodes:
Either use oVirt Node ISO or:
- Install CentOS Linux 8 from
http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-...
; select minimal installation
- dnf install
https://resources.ovirt.org/pub/yum-repo/ovirt-release44-pre.rpm
- dnf update (reboot if needed)
- Attach the host to engine and let it be deployed.
This release is available now on x86_64 architecture for:
* Red Hat Enterprise Linux 8.1 or newer
* CentOS Linux (or similar) 8.1 or newer
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
for:
* Red Hat Enterprise Linux 8.1 or newer
* CentOS Linux (or similar) 8.1 or newer
* oVirt Node 4.4 based on CentOS Linux 8.1 (available for x86_64 only)
See the release notes [1] for installation instructions and a list of new
features and bugs fixed.
If you manage more than one oVirt instance, OKD or RDO we also recommend to
try ManageIQ <http://manageiq.org/>.
In such a case, please be sure to take the qc2 image and not the ova image.
Notes:
- oVirt Appliance is already available for CentOS Linux 8
- oVirt Node NG is already available for CentOS Linux 8
Additional Resources:
* Read more about the oVirt 4.4.1 release highlights:
http://www.ovirt.org/release/4.4.1/
* Get more oVirt project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.4.1/
[2] http://resources.ovirt.org/pub/ovirt-4.4-pre/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
[image: |Our code is open_] <https://www.redhat.com/en/our-code-is-open>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
4 years, 6 months
Re: AutoStart VMs (was Re: Re: oVirt 4.4.0 Release is now generally available)
by Derek Atkins
Eh, no point in creating a repo for that, so I just put them on the web:
https://www.ihtfp.org/ovirt/
-derek
On Wed, May 27, 2020 11:05 am, Staniforth, Paul wrote:
>
> Thanks Derek,
> GitHub or GitLab probably.
>
> Regards,
> Paul S.
> ________________________________
> From: Derek Atkins <derek(a)ihtfp.com>
> Sent: 27 May 2020 15:50
> To: Gianluca Cecchi <gianluca.cecchi(a)gmail.com>
> Cc: thomas(a)hoberg.net <thomas(a)hoberg.net>; users <users(a)ovirt.org>
> Subject: [ovirt-users] AutoStart VMs (was Re: Re: oVirt 4.4.0 Release is
> now generally available)
>
> Caution External Mail: Do not click any links or open any attachments
> unless you trust the sender and know that the content is safe.
>
> Hi,
>
> (Sorry if you get this twice -- looks like it didn't like the python
> script in there so I'm resending without the code)
>
> Gianluca Cecchi <gianluca.cecchi(a)gmail.com> writes:
>
>> Hi Derek,
>> today I played around with Ansible to accomplish, I think, what you
>> currently
>> do in oVirt shell.
>> It was the occasion to learn, as always, something new: as "blocks" in
>> Ansible
>> dont' support looping, a workaround to get that.
>> Furthermore I have a single host environment where it can turn usefull
>> too...
> [snip]
>
> I found the time to work on this using the Python SDK. Took me longer
> than I wanted but I think I've got something working now. I just
> haven't done a FULL test, yet, but a runtime time on the online system
> works (I commented out the start call).
>
> I still have two files, a vm_list.py which is a config file that
> contains the list of VMs, in order, and then the main program itself
> (start_vms.py) which is based on several of the examples available in
> github.
>
> Unfortunately I can't seem to send the script in email because it's
> getting blocked by the redhat server -- so I have no idea the best way
> to share it.
>
> -derek
>
> --
> Derek Atkins 617-623-3745
> derek(a)ihtfp.com
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ihtf...
> Computer and Internet Security Consultant
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> oVirt Code of Conduct:
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> List Archives:
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.o...
> To view the terms under which this email is distributed, please go to:-
> http://leedsbeckett.ac.uk/disclaimer/email/
>
--
Derek Atkins 617-623-3745
derek(a)ihtfp.com www.ihtfp.com
Computer and Internet Security Consultant
4 years, 6 months
ovirt 4.4.0 - Live merge failure with libvirt error "virDomainBlockCommit() failed"
by Marco Fais
Hi,
I have upgraded one of my nodes to oVirt-node 4.4.0 and I am testing the basic functionality in preparation for the full cluster migration.
Unfortunately snapshot deletions are not working at the moment; I have a live merge failure most of the times due to a libvirt error -- virDomainBlockCommit() failed
I have opened a bug on this as well (https://bugzilla.redhat.com/show_bug.cgi?id=1840414)
See the most relevant portion of the vdsm.log here:
2020-05-26 22:43:39,067+0100 ERROR (jsonrpc/1) [virt.vm] (vmId='baaf6be8-dcf4-4f26-b0f1-435287eeed95') Live merge failed (job: 7b96452b-0e60-46c6-9236-6b8a906b0ed8) (vm:5381)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5379, in merge
bandwidth, flags)
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python3.6/site-packages/libvirt.py", line 728, in blockCommit
if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)
libvirt.libvirtError: internal error: qemu block name 'json:{"backing": {"driver": "raw", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/glusterSD/192.168.30.2:_Temp__Storage/246845e5-5546-4f04-948e-66a9532b403f/images/64955343-da61-4b76-9158-8df8363dc5a3/b0558698-bc82-4941-ac52-9fa2385a9f00"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/glusterSD/192.168.30.2:_Temp__Storage/246845e5-5546-4f04-948e-66a9532b403f/images/64955343-da61-4b76-9158-8df8363dc5a3/be844ebe-10b6-4ac7-89c1-9f8526a52762"}}' doesn't match expected '/rhev/data-center/mnt/glusterSD/192.168.30.2:_Temp__Storage/246845e5-5546-4f04-948e-66a9532b403f/images/64955343-da61-4b76-9158-8df8363dc5a3/be844ebe-10b6-4ac7-89c1-9f8526a52762'
2020-05-26 22:43:39,075+0100 INFO (jsonrpc/1) [api.virt] FINISH merge return={'status': {'code': 52, 'message': 'Merge failed'}} from=::ffff:10.144.138.240,60914, flow_id=dbf9c831-e0cb-4891-a38c-d61136daf029, vmId=baaf6be8-dcf4-4f26-b0f1-435287eeed95 (api:54)
2020-05-26 22:43:39,075+0100 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 0.18 seconds (__init__:312)
This issue was reported also on https://bugzilla.redhat.com/show_bug.cgi?id=1785939; according to the comments the resolution seems to be in libvirt 6.x.
However I am not quite sure if / how I can get libvirt 6 in my setup... is there a test image (ideally ovirt-node) I can use?
Any suggestions?
Unrelated: I had also a problem during the hosted-engine deployment due to the fact the machine is not connected directly to internet and could not complete the dnf update steps (even with the proxy configured it was still failing at a later stage during the engine setup, same problem..). I might open another thread on it...
Thanks,
Marco
4 years, 6 months
oVirt 4.3.9.4-1:--> VM Has been paused due to storage I/O error
by adrianquintero@gmail.com
Team,
I've been having issues all my VMs cant be started, error is "VM Has been paused due to storage I/O error"
Any ideas are welcome as all my VMs are down
Gluster log (gluster version 6.8):
[2020-05-27 09:10:28.132619] E [MSGID: 113040] [posix-inode-fd-ops.c:1572:posix_readv] 0-vmstore-posix: read failed on gfid=8674ab5f-56b9-4136-9b30-a65ca86be204, fd=0x7f12b807e6d8, offset=0 size=1, buf=0x7f134fbd9000 [Invalid argument]
[2020-05-27 09:10:28.132694] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-vmstore-server: 3286: READV 3 (8674ab5f-56b9-4136-9b30-a65ca86be204), client: CTX_ID:03354525-5de3-4390-b775-3db7a85c0022-GRAPH_ID:0-PID:29372-HOST:jrz-061-ovirt3.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0, error-xlator: vmstore-posix [Invalid argument]
[2020-05-27 09:10:28.211930] E [MSGID: 113040] [posix-inode-fd-ops.c:1572:posix_readv] 0-vmstore-posix: read failed on gfid=8674ab5f-56b9-4136-9b30-a65ca86be204, fd=0x7f12b825ed18, offset=0 size=1, buf=0x7f134fbd9000 [Invalid argument]
[2020-05-27 09:10:28.211995] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-vmstore-server: 3306: READV 2 (8674ab5f-56b9-4136-9b30-a65ca86be204), client: CTX_ID:03354525-5de3-4390-b775-3db7a85c0022-GRAPH_ID:0-PID:29372-HOST:jrz-061-ovirt3.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0, error-xlator: vmstore-posix [Invalid argument]
[2020-05-27 09:10:28.226451] E [MSGID: 113040] [posix-inode-fd-ops.c:1572:posix_readv] 0-vmstore-posix: read failed on gfid=755664db-2d04-4ed0-9333-251c6cc3dcb1, fd=0x7f12b807e6d8, offset=0 size=1, buf=0x7f134fbd9000 [Invalid argument]
[2020-05-27 09:10:28.226511] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-vmstore-server: 3317: READV 2 (755664db-2d04-4ed0-9333-251c6cc3dcb1), client: CTX_ID:03354525-5de3-4390-b775-3db7a85c0022-GRAPH_ID:0-PID:29372-HOST:jrz-061-ovirt3.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0, error-xlator: vmstore-posix [Invalid argument]
[2020-05-27 09:10:28.232122] E [MSGID: 113040] [posix-inode-fd-ops.c:1572:posix_readv] 0-vmstore-posix: read failed on gfid=755664db-2d04-4ed0-9333-251c6cc3dcb1, fd=0x7f12b807e6d8, offset=0 size=1, buf=0x7f134fbd9000 [Invalid argument]
[2020-05-27 09:10:28.232181] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-vmstore-server: 3319: READV 2 (755664db-2d04-4ed0-9333-251c6cc3dcb1), client: CTX_ID:03354525-5de3-4390-b775-3db7a85c0022-GRAPH_ID:0-PID:29372-HOST:jrz-061-ovirt3.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0, error-xlator: vmstore-posix [Invalid argument]
[2020-05-27 09:10:28.237043] E [MSGID: 113040] [posix-inode-fd-ops.c:1572:posix_readv] 0-vmstore-posix: read failed on gfid=8674ab5f-56b9-4136-9b30-a65ca86be204, fd=0x7f12b82038b8, offset=0 size=1, buf=0x7f134fbd9000 [Invalid argument]
[2020-05-27 09:10:28.237100] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-vmstore-server: 3321: READV 3 (8674ab5f-56b9-4136-9b30-a65ca86be204), client: CTX_ID:03354525-5de3-4390-b775-3db7a85c0022-GRAPH_ID:0-PID:29372-HOST:jrz-061-ovirt3.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0, error-xlator: vmstore-posix [Invalid argument]
[2020-05-27 09:10:28.242176] E [MSGID: 113040] [posix-inode-fd-ops.c:1572:posix_readv] 0-vmstore-posix: read failed on gfid=755664db-2d04-4ed0-9333-251c6cc3dcb1, fd=0x7f12b807e6d8, offset=0 size=1, buf=0x7f134fbd9000 [Invalid argument]
[2020-05-27 09:10:28.242235] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-vmstore-server: 3323: READV 2 (755664db-2d04-4ed0-9333-251c6cc3dcb1), client: CTX_ID:03354525-5de3-4390-b775-3db7a85c0022-GRAPH_ID:0-PID:29372-HOST:jrz-061-ovirt3.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0, error-xlator: vmstore-posix [Invalid argument]
[2020-05-27 09:11:18.990877] I [MSGID: 115036] [server.c:499:server_rpc_notify] 0-vmstore-server: disconnecting connection from CTX_ID:87270d80-4310-4795-9f4d-2c1a61d16cee-GRAPH_ID:0-PID:28815-HOST:jrz-059-ovirt1.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0
[2020-05-27 09:11:18.991896] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection CTX_ID:87270d80-4310-4795-9f4d-2c1a61d16cee-GRAPH_ID:0-PID:28815-HOST:jrz-059-ovirt1.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0
[2020-05-27 09:11:19.329120] I [MSGID: 115036] [server.c:499:server_rpc_notify] 0-vmstore-server: disconnecting connection from CTX_ID:5858e269-923e-4a38-9c6b-62f337a6abac-GRAPH_ID:0-PID:20791-HOST:jrz-060-ovirt2.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0
[2020-05-27 09:11:19.329419] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection CTX_ID:5858e269-923e-4a38-9c6b-62f337a6abac-GRAPH_ID:0-PID:20791-HOST:jrz-060-ovirt2.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0
[2020-05-27 09:11:21.044186] I [addr.c:54:compare_addr_and_update] 0-/gluster_bricks/vmstore/vmstore: allowed = "*", received addr = "192.168.0.59"
[2020-05-27 09:11:21.044227] I [login.c:110:gf_auth] 0-auth/login: allowed user names: c2ef048a-b354-4f91-8e41-a2aab6e65dbc
[2020-05-27 09:11:21.044265] I [MSGID: 115029] [server-handshake.c:553:server_setvolume] 0-vmstore-server: accepted client from CTX_ID:c2426c68-ad6c-4d3a-b6a4-decce8d18b75-GRAPH_ID:0-PID:1558-HOST:jrz-059-ovirt1.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0 (version: 6.8) with subvol /gluster_bricks/vmstore/vmstore
[2020-05-27 09:11:21.383131] I [addr.c:54:compare_addr_and_update] 0-/gluster_bricks/vmstore/vmstore: allowed = "*", received addr = "192.168.0.60"
[2020-05-27 09:11:21.383168] I [login.c:110:gf_auth] 0-auth/login: allowed user names: c2ef048a-b354-4f91-8e41-a2aab6e65dbc
[2020-05-27 09:11:21.383190] I [MSGID: 115029] [server-handshake.c:553:server_setvolume] 0-vmstore-server: accepted client from CTX_ID:5a363377-0bc7-40eb-a7cb-f4f5757a102b-GRAPH_ID:0-PID:27173-HOST:jrz-060-ovirt2.example.com-PC_NAME:vmstore-client-2-RECON_NO:-0 (version: 6.8) with subvol /gluster_bricks/vmstore/vmstore
4 years, 6 months
Tasks stuck waiting on another after failed storage migration (yet not visible on SPM)
by david.sekne@gmail.com
Hello,
I'm running oVirt version 4.3.9.4-1.el7. After a failed live storage migration VM got stuck with snapshot. Checking the engine logs I can see that the snapshot removal task is waiting for Merge to complete and vice versa.
2020-05-26 18:34:04,826+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Command 'RemoveSnapshotSingleDiskLive' (id: '60ce36c1-bf74-40a9-9fb0-7fcf7eb95f40') waiting on child command id: 'f7d1de7b-9e87-47ba-9ba0-ee04301ba3b1' type:'Merge' to complete
2020-05-26 18:34:04,827+02 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Waiting on merge command to complete (jobId = f694590a-1577-4dce-bf0c-3a8d74adf341)
2020-05-26 18:34:04,845+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Command 'RemoveSnapshot' (id: '47c9a847-5b4b-4256-9264-a760acde8275') waiting on child command id: '60ce36c1-bf74-40a9-9fb0-7fcf7eb95f40' type:'RemoveSnapshotSingleDiskLive' to complete
2020-05-26 18:34:14,277+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmJobsMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] VM Job [f694590a-1577-4dce-bf0c-3a8d74adf341]: In progress (no change)
I cant see any tasks on sPM via command 2020-05-26 18:34:04,826+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Command 'RemoveSnapshotSingleDiskLive' (id: '60ce36c1-bf74-40a9-9fb0-7fcf7eb95f40') waiting on child command id: 'f7d1de7b-9e87-47ba-9ba0-ee04301ba3b1' type:'Merge' to complete
2020-05-26 18:34:04,827+02 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Waiting on merge command to complete (jobId = f694590a-1577-4dce-bf0c-3a8d74adf341)
2020-05-26 18:34:04,845+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Command 'RemoveSnapshot' (id: '47c9a847-5b4b-4256-9264-a760acde8275') waiting on child command id: '60ce36c1-bf74-40a9-9fb0-7fcf7eb95f40' type:'RemoveSnapshotSingleDiskLive' to complete
2020-05-26 18:34:14,277+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmJobsMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] VM Job [f694590a-1577-4dce-bf0c-3a8d74adf341]: In progress (no change)
I cannot see any runnig tasks on the SPM (vdsm-client Host getAllTasksInfo). I also cannot find the task ID in any of the other node's logs.
I already tried restarting the Engine (didn't help).
To start I'm puzzled where is the engine getting the task info?
Any Ideas on how I could resolve this?
Thank you.
Regards,
David
4 years, 6 months