ovirt-engine-webadmin-portal-debuginfo package for 4.2?

First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that. Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug. Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked. 1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more details on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not? 2. Error when clicking "network interfaces" in the web gui for the hosted VM engine. 3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM. The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this package does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required? I do see some additional details in the ui.log that I can post if helpful. There is obviously something odd going on here with the hosted engine VM. All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue? Thanks in advance!

On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more details on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the hosted VM engine.
3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this package does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required? I do see some additional details in the ui.log that I can post if helpful.
There is obviously something odd going on here with the hosted engine VM. All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi

On Sun, Jan 14, 2018 at 3:46 PM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more details on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the hosted VM engine.
3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this package does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required?
Indeed: https://bugzilla.redhat.com/show_bug.cgi?id=1431545
I do see some additional details in the ui.log that I can post if helpful.
There is obviously something odd going on here with the hosted engine VM. All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi

On Sun, Jan 14, 2018 at 8:50 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to
debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster
and
domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more
on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the
hosted VM
engine.
3. Similar to #2 above an error is given when clicking "snapshots" in
web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this
On Sun, Jan 14, 2018 at 3:46 PM, Yedidyah Bar David <didi@redhat.com> wrote: that. details the package
does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required?
Right. ui.log will include the same stack trace that it gives in the browser -- just deobfuscated (automatically now).
I do see some additional details in the ui.log that I can post if helpful.
So, yes, please share it.
There is obviously something odd going on here with the hosted engine
VM.
All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA <https://www.redhat.com/> gshereme@redhat.com IRC: gshereme <https://red.ht/sig>

I managed to fix the error with HA broker and agent continually crashing. I found that it was not a permissions problem on the path mentioned in the log: On Sun, Jan 14, 2018 at 2:07 PM, Greg Sheremeta <gshereme@redhat.com> wrote:
On Sun, Jan 14, 2018 at 8:50 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to
debug.
Current status is engine and all hosts are upgraded to 4.2, and
cluster and
domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more
on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the
hosted VM
engine.
3. Similar to #2 above an error is given when clicking "snapshots" in
web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this
On Sun, Jan 14, 2018 at 3:46 PM, Yedidyah Bar David <didi@redhat.com> wrote: that. details the package
does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required?
Right. ui.log will include the same stack trace that it gives in the browser -- just deobfuscated (automatically now).
I do see some additional details in the ui.log that I can post if helpful.
So, yes, please share it.
There is obviously something odd going on here with the hosted engine
VM.
All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--
GREG SHEREMETA
SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
Red Hat NA
gshereme@redhat.com IRC: gshereme <https://red.ht/sig>

I managed to fix the error with HA broker and agent continually crashing. I found that it was not a permissions problem on the path mentioned in the log: /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84- 4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 but this folder had wrong permissions: _exports_hosted__engine/248f46f0-d793-4581-9810-c9d965e2f286/images/19f114a1-e1c3-41c5-9fcb-b6099612d012 That seems to have resolved the agent and broker problem. The UI error when clicking network interfaces or snapshots for hosted engine VM still appears to exist. Here is the ui log from that: Network interfaces tab for hosted engine VM: 2018-01-14 14:12:06,781-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-19) [] Permutation name: C1D1FEFE5DCAE683818762C75C501D92 2018-01-14 14:12:06,781-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-19) [] Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot read property 'a' of null at Unknown.new pvp(webadmin-141.js) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine.SubTabVirtualMachineNetworkInterfaceView.$createListViewItem(SubTabVirtualMachineNetworkInterfaceView.java:69) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine.SubTabVirtualMachineNetworkInterfaceView.createListViewItem(SubTabVirtualMachineNetworkInterfaceView.java:69) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$updateInfoPanel(PatternflyListView.java:137) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$lambda$1(PatternflyListView.java:63) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView$lambda$1$Type.eventRaised(PatternflyListView.java:63) at org.ovirt.engine.ui.uicompat.Event.$raise(Event.java:99) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel.$setItems(SearchableListModel.java:739) at org.ovirt.engine.ui.uicommonweb.models.vms.VmInterfaceListModel.$lambda$2(VmInterfaceListModel.java:143) at org.ovirt.engine.ui.uicommonweb.models.vms.VmInterfaceListModel$lambda$2$Type.executed(VmInterfaceListModel.java:143) at org.ovirt.engine.ui.frontend.Frontend$2.$onSuccess(Frontend.java:319) [frontend.jar:] at org.ovirt.engine.ui.frontend.Frontend$2.onSuccess(Frontend.java:319) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.$onSuccess(OperationProcessor.java:170) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.onSuccess(OperationProcessor.java:170) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$5$1.$onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$5$1.onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at com.google.gwt.user.client.rpc.impl.RequestCallbackAdapter.onResponseReceived(RequestCallbackAdapter.java:198) [gwt-servlet.jar:] at com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:233) [gwt-servlet.jar:] at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) snapshots tab for hosted engine vm: 2018-01-14 14:12:55,628-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-14) [] Permutation name: C1D1FEFE5DCAE683818762C75C501D92 2018-01-14 14:12:55,628-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-14) [] Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot read property 'a' of null at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$createNicsItemContainerPanel(VmSnapshotListViewItem.java:90) at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$updateValues(VmSnapshotListViewItem.java:387) at Unknown.new Swp(webadmin-143.js) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine.SubTabVirtualMachineSnapshotView.$createListViewItem(SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine.SubTabVirtualMachineSnapshotView.createListViewItem(SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$updateInfoPanel(PatternflyListView.java:137) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$lambda$1(PatternflyListView.java:63) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView$lambda$1$Type.eventRaised(PatternflyListView.java:63) at org.ovirt.engine.ui.uicompat.Event.$raise(Event.java:99) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel.$setItems(SearchableListModel.java:739) at org.ovirt.engine.ui.uicommonweb.models.vms.VmSnapshotListModel.$updateItems(VmSnapshotListModel.java:230) at org.ovirt.engine.ui.uicommonweb.models.vms.VmSnapshotListModel.setItems(VmSnapshotListModel.java:209) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel$SetItemsAsyncQuery$1.$onSuccess(SearchableListModel.java:902) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel$SetItemsAsyncQuery$1.onSuccess(SearchableListModel.java:902) at org.ovirt.engine.ui.frontend.Frontend$1.$onSuccess(Frontend.java:227) [frontend.jar:] at org.ovirt.engine.ui.frontend.Frontend$1.onSuccess(Frontend.java:227) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$1.$onSuccess(OperationProcessor.java:133) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$1.onSuccess(OperationProcessor.java:133) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$5$1.$onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$5$1.onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at com.google.gwt.user.client.rpc.impl.RequestCallbackAdapter.onResponseReceived(RequestCallbackAdapter.java:198) [gwt-servlet.jar:] at com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:233) [gwt-servlet.jar:] at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) 2018-01-14 14:12:55,673-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-23) [] Permutation name: C1D1FEFE5DCAE683818762C75C501D92 2018-01-14 14:12:55,674-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-23) [] Uncaught exception: com.google.gwt.event.shared.UmbrellaException: Exception caught: (TypeError) : Cannot read property 'a' of null at java.lang.Throwable.Throwable(Throwable.java:70) [rt.jar:1.8.0_151] at java.lang.RuntimeException.RuntimeException(RuntimeException.java:32) [rt.jar:1.8.0_151] at com.google.web.bindery.event.shared.UmbrellaException.UmbrellaException(UmbrellaException.java:64) [gwt-servlet.jar:] at Unknown.new a0(webadmin-0.js) at com.google.gwt.event.shared.HandlerManager.$fireEvent(HandlerManager.java:117) [gwt-servlet.jar:] at com.google.gwt.view.client.SelectionChangeEvent.fire(SelectionChangeEvent.java:67) [gwt-servlet.jar:] at com.google.gwt.view.client.SingleSelectionModel.$resolveChanges(SingleSelectionModel.java:118) [gwt-servlet.jar:] at com.google.gwt.view.client.SingleSelectionModel.fireSelectionChangeEvent(SingleSelectionModel.java:107) [gwt-servlet.jar:] at com.google.gwt.view.client.SelectionModel$AbstractSelectionModel$1.execute(SelectionModel.java:128) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.SchedulerImpl.runScheduledTasks(SchedulerImpl.java:167) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.SchedulerImpl.$flushFinallyCommands(SchedulerImpl.java:272) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.exit(Impl.java:313) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) Caused by: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot read property 'a' of null at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$createNicsItemContainerPanel(VmSnapshotListViewItem.java:90) at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$updateValues(VmSnapshotListViewItem.java:387) at Unknown.new Swp(webadmin-143.js) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine.SubTabVirtualMachineSnapshotView.$createListViewItem(SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine.SubTabVirtualMachineSnapshotView.createListViewItem(SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$updateInfoPanel(PatternflyListView.java:137) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$processSelectionChanged(PatternflyListView.java:126) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView$lambda$0$Type.onSelectionChange(PatternflyListView.java:59) at com.google.gwt.view.client.SelectionChangeEvent.dispatch(SelectionChangeEvent.java:98) [gwt-servlet.jar:] at com.google.gwt.event.shared.GwtEvent.dispatch(GwtEvent.java:76) [gwt-servlet.jar:] at com.google.web.bindery.event.shared.SimpleEventBus.$doFire(SimpleEventBus.java:173) [gwt-servlet.jar:] ... 10 more On Sun, Jan 14, 2018 at 2:09 PM, Jayme <jaymef@gmail.com> wrote:
I managed to fix the error with HA broker and agent continually crashing. I found that it was not a permissions problem on the path mentioned in the log:
On Sun, Jan 14, 2018 at 2:07 PM, Greg Sheremeta <gshereme@redhat.com> wrote:
On Sun, Jan 14, 2018 at 8:50 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more
on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the
hosted VM
engine.
3. Similar to #2 above an error is given when clicking "snapshots" in
web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this
On Sun, Jan 14, 2018 at 3:46 PM, Yedidyah Bar David <didi@redhat.com> wrote: details the package
does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required?
Right. ui.log will include the same stack trace that it gives in the browser -- just deobfuscated (automatically now).
I do see some additional details in the ui.log that I can post if helpful.
So, yes, please share it.
There is obviously something odd going on here with the hosted engine
VM.
All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--
GREG SHEREMETA
SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
Red Hat NA
gshereme@redhat.com IRC: gshereme <https://red.ht/sig>

They all look like the same bug in 3 different places. Please open a bug at https://bugzilla.redhat.com/enter_bug.cgi?product=ovirt-engine and include your reproducer steps and your logs. Please set the "ovirt team" to UX. Thanks! On Sun, Jan 14, 2018 at 1:13 PM, Jayme <jaymef@gmail.com> wrote:
I managed to fix the error with HA broker and agent continually crashing. I found that it was not a permissions problem on the path mentioned in the log:
/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84- 4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
but this folder had wrong permissions:
_exports_hosted__engine/248f46f0-d793-4581-9810- c9d965e2f286/images/19f114a1-e1c3-41c5-9fcb-b6099612d012
That seems to have resolved the agent and broker problem. The UI error when clicking network interfaces or snapshots for hosted engine VM still appears to exist. Here is the ui log from that:
Network interfaces tab for hosted engine VM:
2018-01-14 14:12:06,781-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-19) [] Permutation name: C1D1FEFE5DCAE683818762C75C501D92 2018-01-14 14:12:06,781-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-19) [] Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot read property 'a' of null at Unknown.new pvp(webadmin-141.js) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine. SubTabVirtualMachineNetworkInterfaceView.$createListViewItem( SubTabVirtualMachineNetworkInterfaceView.java:69) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine. SubTabVirtualMachineNetworkInterfaceView.createListViewItem( SubTabVirtualMachineNetworkInterfaceView.java:69) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$ updateInfoPanel(PatternflyListView.java:137) at org.ovirt.engine.ui.common.widget.listgroup. PatternflyListView.$lambda$1(PatternflyListView.java:63) at org.ovirt.engine.ui.common.widget.listgroup. PatternflyListView$lambda$1$Type.eventRaised(PatternflyListView.java:63) at org.ovirt.engine.ui.uicompat.Event.$raise(Event.java:99) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel.$setItems( SearchableListModel.java:739) at org.ovirt.engine.ui.uicommonweb.models.vms. VmInterfaceListModel.$lambda$2(VmInterfaceListModel.java:143) at org.ovirt.engine.ui.uicommonweb.models.vms. VmInterfaceListModel$lambda$2$Type.executed(VmInterfaceListModel.java:143) at org.ovirt.engine.ui.frontend.Frontend$2.$onSuccess(Frontend.java:319) [frontend.jar:] at org.ovirt.engine.ui.frontend.Frontend$2.onSuccess(Frontend.java:319) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.$ onSuccess(OperationProcessor.java:170) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2. onSuccess(OperationProcessor.java:170) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication. GWTRPCCommunicationProvider$5$1.$onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication. GWTRPCCommunicationProvider$5$1.onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at com.google.gwt.user.client.rpc.impl.RequestCallbackAdapter. onResponseReceived(RequestCallbackAdapter.java:198) [gwt-servlet.jar:] at com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:233) [gwt-servlet.jar:] at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js)
snapshots tab for hosted engine vm:
2018-01-14 14:12:55,628-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-14) [] Permutation name: C1D1FEFE5DCAE683818762C75C501D92 2018-01-14 14:12:55,628-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-14) [] Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot read property 'a' of null at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$ createNicsItemContainerPanel(VmSnapshotListViewItem.java:90) at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$ updateValues(VmSnapshotListViewItem.java:387) at Unknown.new Swp(webadmin-143.js) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine. SubTabVirtualMachineSnapshotView.$createListViewItem( SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine. SubTabVirtualMachineSnapshotView.createListViewItem( SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$ updateInfoPanel(PatternflyListView.java:137) at org.ovirt.engine.ui.common.widget.listgroup. PatternflyListView.$lambda$1(PatternflyListView.java:63) at org.ovirt.engine.ui.common.widget.listgroup. PatternflyListView$lambda$1$Type.eventRaised(PatternflyListView.java:63) at org.ovirt.engine.ui.uicompat.Event.$raise(Event.java:99) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel.$setItems( SearchableListModel.java:739) at org.ovirt.engine.ui.uicommonweb.models.vms.VmSnapshotListModel.$ updateItems(VmSnapshotListModel.java:230) at org.ovirt.engine.ui.uicommonweb.models.vms. VmSnapshotListModel.setItems(VmSnapshotListModel.java:209) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel$ SetItemsAsyncQuery$1.$onSuccess(SearchableListModel.java:902) at org.ovirt.engine.ui.uicommonweb.models.SearchableListModel$ SetItemsAsyncQuery$1.onSuccess(SearchableListModel.java:902) at org.ovirt.engine.ui.frontend.Frontend$1.$onSuccess(Frontend.java:227) [frontend.jar:] at org.ovirt.engine.ui.frontend.Frontend$1.onSuccess(Frontend.java:227) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$1.$ onSuccess(OperationProcessor.java:133) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$1. onSuccess(OperationProcessor.java:133) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication. GWTRPCCommunicationProvider$5$1.$onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication. GWTRPCCommunicationProvider$5$1.onSuccess(GWTRPCCommunicationProvider.java:270) [frontend.jar:] at com.google.gwt.user.client.rpc.impl.RequestCallbackAdapter. onResponseReceived(RequestCallbackAdapter.java:198) [gwt-servlet.jar:] at com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:233) [gwt-servlet.jar:] at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js)
2018-01-14 14:12:55,673-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-23) [] Permutation name: C1D1FEFE5DCAE683818762C75C501D92 2018-01-14 14:12:55,674-04 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-23) [] Uncaught exception: com.google.gwt.event.shared.UmbrellaException: Exception caught: (TypeError) : Cannot read property 'a' of null at java.lang.Throwable.Throwable(Throwable.java:70) [rt.jar:1.8.0_151] at java.lang.RuntimeException.RuntimeException(RuntimeException.java:32) [rt.jar:1.8.0_151] at com.google.web.bindery.event.shared.UmbrellaException. UmbrellaException(UmbrellaException.java:64) [gwt-servlet.jar:] at Unknown.new a0(webadmin-0.js) at com.google.gwt.event.shared.HandlerManager.$fireEvent(HandlerManager.java:117) [gwt-servlet.jar:] at com.google.gwt.view.client.SelectionChangeEvent.fire(SelectionChangeEvent.java:67) [gwt-servlet.jar:] at com.google.gwt.view.client.SingleSelectionModel.$resolveChanges(SingleSelectionModel.java:118) [gwt-servlet.jar:] at com.google.gwt.view.client.SingleSelectionModel. fireSelectionChangeEvent(SingleSelectionModel.java:107) [gwt-servlet.jar:] at com.google.gwt.view.client.SelectionModel$AbstractSelectionModel$1. execute(SelectionModel.java:128) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.SchedulerImpl.runScheduledTasks(SchedulerImpl.java:167) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.SchedulerImpl.$flushFinallyCommands(SchedulerImpl.java:272) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.exit(Impl.java:313) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) Caused by: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot read property 'a' of null at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$ createNicsItemContainerPanel(VmSnapshotListViewItem.java:90) at org.ovirt.engine.ui.common.widget.uicommon.vm.VmSnapshotListViewItem.$ updateValues(VmSnapshotListViewItem.java:387) at Unknown.new Swp(webadmin-143.js) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine. SubTabVirtualMachineSnapshotView.$createListViewItem( SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.webadmin.section.main.view.tab.virtualMachine. SubTabVirtualMachineSnapshotView.createListViewItem( SubTabVirtualMachineSnapshotView.java:68) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$ updateInfoPanel(PatternflyListView.java:137) at org.ovirt.engine.ui.common.widget.listgroup.PatternflyListView.$ processSelectionChanged(PatternflyListView.java:126) at org.ovirt.engine.ui.common.widget.listgroup. PatternflyListView$lambda$0$Type.onSelectionChange( PatternflyListView.java:59) at com.google.gwt.view.client.SelectionChangeEvent.dispatch(SelectionChangeEvent.java:98) [gwt-servlet.jar:] at com.google.gwt.event.shared.GwtEvent.dispatch(GwtEvent.java:76) [gwt-servlet.jar:] at com.google.web.bindery.event.shared.SimpleEventBus.$doFire(SimpleEventBus.java:173) [gwt-servlet.jar:] ... 10 more
On Sun, Jan 14, 2018 at 2:09 PM, Jayme <jaymef@gmail.com> wrote:
I managed to fix the error with HA broker and agent continually crashing. I found that it was not a permissions problem on the path mentioned in the log:
On Sun, Jan 14, 2018 at 2:07 PM, Greg Sheremeta <gshereme@redhat.com> wrote:
On Sun, Jan 14, 2018 at 8:50 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more
on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the
hosted VM
engine.
3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this
On Sun, Jan 14, 2018 at 3:46 PM, Yedidyah Bar David <didi@redhat.com> wrote: details package
does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required?
Right. ui.log will include the same stack trace that it gives in the browser -- just deobfuscated (automatically now).
I do see some additional details in the ui.log that I can post if helpful.
So, yes, please share it.
There is obviously something odd going on here with the hosted
engine VM.
All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--
GREG SHEREMETA
SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
Red Hat NA
gshereme@redhat.com IRC: gshereme <https://red.ht/sig>
-- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA <https://www.redhat.com/> gshereme@redhat.com IRC: gshereme <https://red.ht/sig>

Sure not a problem. For the first issue regarding agent and broker crashing. Again the hosted engine VM is up and running at this time, I have no idea why the logs are saying volume doesn't exist and why file /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84- 4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 does not exist when the file actually does exist in that path. I assume this problem is most likely also related or causing my other problems when accessing hosted vm snapshot section of web gui as well. vdsm log: jsonrpc/0::ERROR::2018-01-14 09:48:09,302::task::875::stora ge.TaskManager.Task::(_setError) (Task='37eba553-9c13-4e69-90f7-d0c987cc694c') Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in prepareImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage raise se.VolumeDoesNotExist(leafUUID) VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b 7ec1f1cf8',) jsonrpc/0::ERROR::2018-01-14 09:48:09,303::dispatcher::82::storage.Dispatcher::(wrapper) FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) agent log: MainThread::ERROR::2018-01-14 09:49:26,546::agent::145::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::ERROR::2018-01-14 09:49:37,782::hosted_engine::5 38::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2018-01-14 09:49:37,783::agent::144::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor .format(type, options, e)) RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory MainThread::ERROR::2018-01-14 09:49:37,783::agent::145::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent broker log: StatusStorageThread::ERROR::2018-01-12 14:03:57,629::status_broker:: 85::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 81, in run entry.data File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 212, in put_stats .format(str(e))) RequestError: failed to write metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810- c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e- f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,629::storage_broker:: 160::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata from /var/run/vdsm/storage/248f46f0-d793-4581-9810- c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e- f5b7ec1f1cf8 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 151, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/var/run/vdsm/storage/ 248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/ 8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,630::status_broker:: 92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 88, in run self._storage_broker.get_raw_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 162, in get_raw_stats .format(str(e))) RequestError: failed to read metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84- 4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' Syslog: Jan 12 16:52:34 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b 7ec1f1cf8',) Jan 12 16:52:34 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:34 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:34 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:34 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:34 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:36 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='73141dec-9d8f-4164-9c4e-67c43a102eff') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:36 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b 7ec1f1cf8',) Jan 12 16:52:36 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:36 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:36 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:36 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:36 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:37 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='bc7af1e2-0ab2-4164-ae88-d2bee03500f9') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:37 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b 7ec1f1cf8',) Jan 12 16:52:37 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:38 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:38 cultivar0 systemd: start request repeated too quickly for ovirt-ha-broker.service Jan 12 16:52:38 cultivar0 systemd: Failed to start oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:40 cultivar0 systemd: ovirt-ha-agent.service holdoff time over, scheduling restart. Jan 12 16:52:40 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Monitoring Agent... Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor#012 .format(type, options, e))#012RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service: main process exited, code=exited, status=157/n/a Jan 12 16:52:42 cultivar0 systemd: Unit ovirt-ha-agent.service entered failed state. Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service failed. On Sun, Jan 14, 2018 at 9:46 AM, Yedidyah Bar David <didi@redhat.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote: that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to
debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster
domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more
and details
on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the
engine.
3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this
hosted VM package
does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required? I do see some additional details in the ui.log that I can post if helpful.
There is obviously something odd going on here with the hosted engine VM. All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi

On Sun, Jan 14, 2018 at 3:57 PM, Jayme <jaymef@gmail.com> wrote:
Sure not a problem. For the first issue regarding agent and broker crashing. Again the hosted engine VM is up and running at this time, I have no idea why the logs are saying volume doesn't exist and why file /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 does not exist when the file actually does exist in that path.
Perhaps not enough permissions? Can you try reading it as user 'vdsm'? E.g. su - vdsm cp /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 /dev/null
I assume this problem is most likely also related or causing my other problems when accessing hosted vm snapshot section of web gui as well.
vdsm log:
jsonrpc/0::ERROR::2018-01-14 09:48:09,302::task::875::storage.TaskManager.Task::(_setError) (Task='37eba553-9c13-4e69-90f7-d0c987cc694c') Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in prepareImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage raise se.VolumeDoesNotExist(leafUUID) VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) jsonrpc/0::ERROR::2018-01-14 09:48:09,303::dispatcher::82::storage.Dispatcher::(wrapper) FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
agent log:
MainThread::ERROR::2018-01-14 09:49:26,546::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::ERROR::2018-01-14 09:49:37,782::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2018-01-14 09:49:37,783::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor .format(type, options, e)) RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory
MainThread::ERROR::2018-01-14 09:49:37,783::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
broker log:
StatusStorageThread::ERROR::2018-01-12 14:03:57,629::status_broker::85::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 81, in run entry.data File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 212, in put_stats .format(str(e))) RequestError: failed to write metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,629::storage_broker::160::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata from /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 151, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,630::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 88, in run self._storage_broker.get_raw_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 162, in get_raw_stats .format(str(e))) RequestError: failed to read metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
Syslog:
Jan 12 16:52:34 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:34 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:34 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:34 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:34 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:34 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:36 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='73141dec-9d8f-4164-9c4e-67c43a102eff') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:36 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:36 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:36 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service failed.
Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:36 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:36 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:36 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:37 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='bc7af1e2-0ab2-4164-ae88-d2bee03500f9') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:37 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:37 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:38 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:38 cultivar0 systemd: start request repeated too quickly for ovirt-ha-broker.service Jan 12 16:52:38 cultivar0 systemd: Failed to start oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:40 cultivar0 systemd: ovirt-ha-agent.service holdoff time over, scheduling restart. Jan 12 16:52:40 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Monitoring Agent... Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor#012 .format(type, options, e))#012RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service: main process exited, code=exited, status=157/n/a Jan 12 16:52:42 cultivar0 systemd: Unit ovirt-ha-agent.service entered failed state. Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service failed.
On Sun, Jan 14, 2018 at 9:46 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more details on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the hosted VM engine.
3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this package does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required? I do see some additional details in the ui.log that I can post if helpful.
There is obviously something odd going on here with the hosted engine VM. All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi

On Sun, Jan 14, 2018 at 4:34 PM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:57 PM, Jayme <jaymef@gmail.com> wrote:
Sure not a problem. For the first issue regarding agent and broker crashing. Again the hosted engine VM is up and running at this time, I have no idea why the logs are saying volume doesn't exist and why file /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 does not exist when the file actually does exist in that path.
Perhaps not enough permissions?
Can you try reading it as user 'vdsm'? E.g.
su - vdsm
su - vdsm -s /bin/bash
cp /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 /dev/null
I assume this problem is most likely also related or causing my other problems when accessing hosted vm snapshot section of web gui as well.
vdsm log:
jsonrpc/0::ERROR::2018-01-14 09:48:09,302::task::875::storage.TaskManager.Task::(_setError) (Task='37eba553-9c13-4e69-90f7-d0c987cc694c') Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in prepareImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage raise se.VolumeDoesNotExist(leafUUID) VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) jsonrpc/0::ERROR::2018-01-14 09:48:09,303::dispatcher::82::storage.Dispatcher::(wrapper) FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',)
agent log:
MainThread::ERROR::2018-01-14 09:49:26,546::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::ERROR::2018-01-14 09:49:37,782::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2018-01-14 09:49:37,783::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor .format(type, options, e)) RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory
MainThread::ERROR::2018-01-14 09:49:37,783::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
broker log:
StatusStorageThread::ERROR::2018-01-12 14:03:57,629::status_broker::85::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 81, in run entry.data File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 212, in put_stats .format(str(e))) RequestError: failed to write metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,629::storage_broker::160::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata from /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 151, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' StatusStorageThread::ERROR::2018-01-12 14:03:57,630::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 88, in run self._storage_broker.get_raw_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 162, in get_raw_stats .format(str(e))) RequestError: failed to read metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
Syslog:
Jan 12 16:52:34 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:34 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:34 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:34 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:34 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:34 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:34 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:36 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='73141dec-9d8f-4164-9c4e-67c43a102eff') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:36 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:36 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:36 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service failed.
Jan 12 16:52:36 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:36 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:36 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:36 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:37 cultivar0 journal: vdsm storage.TaskManager.Task ERROR (Task='bc7af1e2-0ab2-4164-ae88-d2bee03500f9') Unexpected error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run#012 return fn(*args, **kargs)#012 File "<string>", line 2, in prepareImage#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3162, in prepareImage#012 raise se.VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:37 cultivar0 journal: vdsm storage.Dispatcher ERROR FINISH prepareImage error=Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) Jan 12 16:52:37 cultivar0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 abrt-server: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Jan 12 16:52:38 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:38 cultivar0 systemd: start request repeated too quickly for ovirt-ha-broker.service Jan 12 16:52:38 cultivar0 systemd: Failed to start oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:38 cultivar0 systemd: Unit ovirt-ha-broker.service entered failed state. Jan 12 16:52:38 cultivar0 systemd: ovirt-ha-broker.service failed. Jan 12 16:52:40 cultivar0 systemd: ovirt-ha-agent.service holdoff time over, scheduling restart. Jan 12 16:52:40 cultivar0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Communications Broker. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 12 16:52:40 cultivar0 systemd: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 12 16:52:40 cultivar0 systemd: Starting oVirt Hosted Engine High Availability Monitoring Agent... Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 416, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor#012 .format(type, options, e))#012RequestError: Failed to start monitor ping, options {'addr': '192.168.0.1'}: [Errno 2] No such file or directory Jan 12 16:52:41 cultivar0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service: main process exited, code=exited, status=157/n/a Jan 12 16:52:42 cultivar0 systemd: Unit ovirt-ha-agent.service entered failed state. Jan 12 16:52:42 cultivar0 systemd: ovirt-ha-agent.service failed.
On Sun, Jan 14, 2018 at 9:46 AM, Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 14, 2018 at 3:37 PM, Jayme <jaymef@gmail.com> wrote:
First, apologies for all the posts to this list lately, I've been having a heck of a time after 4.2 upgrade and you've been helpful, I appreciate that.
Since 4.2 upgrade I'm experiencing a few problems that I'm trying to debug.
Current status is engine and all hosts are upgraded to 4.2, and cluster and domain set to 4.2 compatibility. Hosted Engine VM is running and ui accessible etc, all VMs on hosts are running but no HA service. Web UI is giving a few errors when checking network and snapshots on the hosted engine VM only, it doesn't give errors on any of the others VMs that I spot checked.
1. HA-agent and HA-broker are continually crashing on all three hosts over and over every few seconds. I sent an email to users list with more details on this problem but unfortunately haven't heard anything back yet. The general error in the logs seems to be: VolumeDoesNotExist(leafUUID)#012VolumeDoesNotExist: Volume does not exist: (u'8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8',) -- What? Volume doesn't exist, why not?
If agent/broker logs do not reveal this, the next step is usually checking vdsm logs and/or system logs. Can you please check/share these? Thanks.
2. Error when clicking "network interfaces" in the web gui for the hosted VM engine.
3. Similar to #2 above an error is given when clicking "snapshots" in the web gui for the hosted engine VM.
The errors for #2 and #3 are generic "cannot read property 'a' of null". I've read previous postings on ovirt-mailing list that suggest you can install debug-info package to get a human readable error.. but this package does not seem to be compatible with 4.2, it expects 4.1: Requires: "ovirt-engine-webadmin-portal = 4.1.2.2-1.el7.centos" -- Perhaps this package is no longer required? I do see some additional details in the ui.log that I can post if helpful.
There is obviously something odd going on here with the hosted engine VM. All three errors appear to related to a problem with it, although it is indeed up and running. I'd really like to get HA broker and agent back up and running, and fix these GUI errors related to hosted engine VM. All three problems may be connected to one common issue?
Thanks in advance!
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Didi
-- Didi
-- Didi
participants (3)
-
Greg Sheremeta
-
Jayme
-
Yedidyah Bar David