Vm suddenly paused with error "vm has paused due to unknown storage error"

Hi all, Since we upgraded our Ovirt nodes to CentOS 7 a vm (not a specific one but never more then one) will sometimes pause suddenly with the error "VM ... has paused due to unknown storage error". It happens now two times in a month. The Ovirt node uses san storage for the vm's running on it. When a specific vm is pausing with an error the other vm's keeps running without problems. The vm runs without problems after unpausing it. Versions: CentOS Linux release 7.1.1503 vdsm-4.14.17-0 libvirt-daemon-1.2.8-16 vdsm.log: VM Channels Listener::DEBUG::2015-10-25 07:43:54,382::vmChannels::95::vds::(_handle_timeouts) Timeout on fileno 78. libvirtEventLoop::INFO::2015-10-25 07:43:56,177::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2015-10-25 07:43:56,178::vm::5204::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2015-10-25 07:43:56,178::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother ........... libvirtEventLoop::INFO::2015-10-25 07:43:56,180::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother specific error part in libvirt vm log: block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) ........... block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) engine.log: 2015-10-25 07:44:48,945 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) [a43dcc8] VM diataal-prod-cas1 77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb moved from Up --> Paused 2015-10-25 07:44:49,003 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-40) [a43dcc8] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM diataal-prod-cas1 has paused due to unknown storage error. Has anyone experienced the same problem or knows a way to solve this? Kind regards, Jasper

This is a multi-part message in MIME format. ------=_NextPartTM-000-aeafb0d3-0c33-4ea0-9b6f-99754e278718 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Jasper,=0A= =0A= from time to time we see a similar behaviour. All of a sudden a VM pauses d= ue to =0A= some IO error. But it takes 5 months to occur. Our /var/log/libvirt/qemu/<v= m>.log gives=0A= =0A= qemu-system-x86_64: block.c:2806: bdrv_error_action: Assertion `error >=3D = 0' failed.=0A= =0A= Currently we are waiting to capture the next crash. I do not know if your e= rror=0A= allows to enforce cores for in depth analysis. If yes you should activate= =0A= =0A= 1) /usr/lib/systemd/system/libvirtd.service=0A= LimitCORE=3Dinfinity=0A= =0A= 2) /etc/security/limits.conf=0A= * soft core unlimited=0A= =0A= Markus=0A= ________________________________________=0A= Von: users-bounces@ovirt.org [users-bounces@ovirt.org]" im Auftrag von= "Jasper Siero [jasper.siero@target-holding.nl]=0A= Gesendet: Montag, 26. Oktober 2015 17:39=0A= An: users@ovirt.org=0A= Betreff: [ovirt-users] Vm suddenly paused with error "vm has paused due to = unknown storage error"=0A= =0A= Hi all,=0A= =0A= Since we upgraded our Ovirt nodes to CentOS 7 a vm (not a specific one but = never more then one) will sometimes pause suddenly with the error "VM ... h= as paused due to unknown storage error". It happens now two times in a mont= h.=0A= The Ovirt node uses san storage for the vm's running on it. When a specific= vm is pausing with an error the other vm's keeps running without problems.= =0A= The vm runs without problems after unpausing it.=0A= =0A= Versions:=0A= CentOS Linux release 7.1.1503=0A= vdsm-4.14.17-0=0A= libvirt-daemon-1.2.8-16=0A= =0A= vdsm.log:=0A= VM Channels Listener::DEBUG::2015-10-25 07:43:54,382::vmChannels::95::vds::= (_handle_timeouts) Timeout on fileno 78.=0A= libvirtEventLoop::INFO::2015-10-25 07:43:56,177::vm::4602::vm.Vm::(_onIOErr= or) vmId=3D`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device = virtio-disk0 error eother=0A= libvirtEventLoop::DEBUG::2015-10-25 07:43:56,178::vm::5204::vm.Vm::(_onLibv= irtLifecycleEvent) vmId=3D`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::event Sus= pended detail 2 opaque None=0A= libvirtEventLoop::INFO::2015-10-25 07:43:56,178::vm::4602::vm.Vm::(_onIOErr= or) vmId=3D`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device = virtio-disk0 error eother=0A= ...........=0A= libvirtEventLoop::INFO::2015-10-25 07:43:56,180::vm::4602::vm.Vm::(_onIOErr= or) vmId=3D`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device = virtio-disk0 error eother=0A= =0A= specific error part in libvirt vm log:=0A= block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758)= =0A= ...........=0A= block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758)= =0A= =0A= engine.log:=0A= 2015-10-25 07:44:48,945 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRun= TimeInfo] (DefaultQuartzScheduler_Worker-40) [a43dcc8] VM diataal-prod-cas1= 77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb moved from=0A= Up --> Paused=0A= 2015-10-25 07:44:49,003 ERROR [org.ovirt.engine.core.dal.dbbroker.auditlogh= andling.AuditLogDirector] (DefaultQuartzScheduler_Worker-40) [a43dcc8] Corr= elation ID: null, Call Stack: null, Custom Event=0A= ID: -1, Message: VM diataal-prod-cas1 has paused due to unknown storage er= ror.=0A= =0A= Has anyone experienced the same problem or knows a way to solve this?=0A= =0A= Kind regards,=0A= =0A= Jasper=0A= _______________________________________________=0A= Users mailing list=0A= Users@ovirt.org=0A= http://lists.ovirt.org/mailman/listinfo/users=0A= ------=_NextPartTM-000-aeafb0d3-0c33-4ea0-9b6f-99754e278718 Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-aeafb0d3-0c33-4ea0-9b6f-99754e278718--

Hello Markus, Thanks for your reply. I think this will not work in our case because the libvirtd/vm process is not crashed and after unpausing the vm runs without problems. Is it possible to unpause your vm in this situation or is the process really crashed? Jasper ________________________________________ Van: Markus Stockhausen [stockhausen@collogia.de] Verzonden: maandag 26 oktober 2015 19:47 Aan: Jasper Siero; users@ovirt.org Onderwerp: AW: [ovirt-users] Vm suddenly paused with error "vm has paused due to unknown storage error" Hi Jasper, from time to time we see a similar behaviour. All of a sudden a VM pauses due to some IO error. But it takes 5 months to occur. Our /var/log/libvirt/qemu/<vm>.log gives qemu-system-x86_64: block.c:2806: bdrv_error_action: Assertion `error >= 0' failed. Currently we are waiting to capture the next crash. I do not know if your error allows to enforce cores for in depth analysis. If yes you should activate 1) /usr/lib/systemd/system/libvirtd.service LimitCORE=infinity 2) /etc/security/limits.conf * soft core unlimited Markus ________________________________________ Von: users-bounces@ovirt.org [users-bounces@ovirt.org]" im Auftrag von "Jasper Siero [jasper.siero@target-holding.nl] Gesendet: Montag, 26. Oktober 2015 17:39 An: users@ovirt.org Betreff: [ovirt-users] Vm suddenly paused with error "vm has paused due to unknown storage error" Hi all, Since we upgraded our Ovirt nodes to CentOS 7 a vm (not a specific one but never more then one) will sometimes pause suddenly with the error "VM ... has paused due to unknown storage error". It happens now two times in a month. The Ovirt node uses san storage for the vm's running on it. When a specific vm is pausing with an error the other vm's keeps running without problems. The vm runs without problems after unpausing it. Versions: CentOS Linux release 7.1.1503 vdsm-4.14.17-0 libvirt-daemon-1.2.8-16 vdsm.log: VM Channels Listener::DEBUG::2015-10-25 07:43:54,382::vmChannels::95::vds::(_handle_timeouts) Timeout on fileno 78. libvirtEventLoop::INFO::2015-10-25 07:43:56,177::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2015-10-25 07:43:56,178::vm::5204::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2015-10-25 07:43:56,178::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother ........... libvirtEventLoop::INFO::2015-10-25 07:43:56,180::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother specific error part in libvirt vm log: block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) ........... block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) engine.log: 2015-10-25 07:44:48,945 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) [a43dcc8] VM diataal-prod-cas1 77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb moved from Up --> Paused 2015-10-25 07:44:49,003 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-40) [a43dcc8] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM diataal-prod-cas1 has paused due to unknown storage error. Has anyone experienced the same problem or knows a way to solve this? Kind regards, Jasper _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Did you ever get the pausing vm's resolved and started? I am having the same issue.
participants (3)
-
eevans@digitaldatatechs.com
-
Jasper Siero
-
Markus Stockhausen