[ovirt-users] Server Not Responding

Piotr Kliczewski piotr.kliczewski at gmail.com
Tue Sep 19 07:22:11 UTC 2017


Bryan,

I checked your logs and I see extensive activity of ioprocess at that
time which seems to cause getAllVmStats to be processed 7 seconds
later.
The call was invoked on vdsm side much later than we saw heartbeat
exceeded exception on the engine side.

@Nir please take a look

Thanks,
Piotr

2017-09-13 04:07:07,126-0500 INFO  (itmap/0) [IOProcessClient]
Starting client ioprocess-10031 (__init__:330)
2017-09-13 04:07:07,176-0500 INFO  (itmap/1) [IOProcessClient]
Starting client ioprocess-10032 (__init__:330)
2017-09-13 04:07:07,222-0500 INFO  (ioprocess/9466) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,226-0500 INFO  (itmap/2) [IOProcessClient]
Starting client ioprocess-10033 (__init__:330)
2017-09-13 04:07:07,279-0500 INFO  (ioprocess/9474) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,284-0500 INFO  (itmap/3) [IOProcessClient]
Starting client ioprocess-10034 (__init__:330)
2017-09-13 04:07:07,332-0500 INFO  (ioprocess/9482) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,340-0500 INFO  (itmap/4) [IOProcessClient]
Starting client ioprocess-10035 (__init__:330)
2017-09-13 04:07:07,395-0500 INFO  (ioprocess/9502) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,401-0500 INFO  (itmap/5) [IOProcessClient]
Starting client ioprocess-10036 (__init__:330)
2017-09-13 04:07:07,461-0500 INFO  (ioprocess/9523) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,469-0500 INFO  (itmap/6) [IOProcessClient]
Starting client ioprocess-10037 (__init__:330)
2017-09-13 04:07:07,525-0500 INFO  (ioprocess/9538) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,532-0500 INFO  (itmap/7) [IOProcessClient]
Starting client ioprocess-10038 (__init__:330)
2017-09-13 04:07:07,579-0500 INFO  (ioprocess/9556) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,583-0500 INFO  (itmap/8) [IOProcessClient]
Starting client ioprocess-10039 (__init__:330)
2017-09-13 04:07:07,628-0500 INFO  (ioprocess/9569) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,642-0500 INFO  (monitor/cacbac0)
[storage.StorageDomain] Resource namespace
01_img_cacbac0c-4b08-46cd-ace8-b23266c92291 already registered
(sd:727)
2017-09-13 04:07:07,642-0500 INFO  (monitor/cacbac0)
[storage.StorageDomain] Resource namespace
02_vol_cacbac0c-4b08-46cd-ace8-b23266c92291 already registered
(sd:736)
2017-09-13 04:07:07,646-0500 INFO  (monitor/e371d38)
[storage.StorageDomain] Resource namespace
01_img_e371d380-7194-4950-b901-5f2aed5dfb35 already registered
(sd:727)
2017-09-13 04:07:07,646-0500 INFO  (monitor/e371d38)
[storage.StorageDomain] Resource namespace
02_vol_e371d380-7194-4950-b901-5f2aed5dfb35 already registered
(sd:736)
2017-09-13 04:07:07,655-0500 INFO  (monitor/ea15319)
[storage.StorageDomain] Resource namespace
01_img_ea153191-ac25-4984-aaec-4a59cbf383a5 already registered
(sd:727)
2017-09-13 04:07:07,655-0500 INFO  (monitor/ea15319)
[storage.StorageDomain] Resource namespace
02_vol_ea153191-ac25-4984-aaec-4a59cbf383a5 already registered
(sd:736)
2017-09-13 04:07:07,661-0500 INFO  (monitor/58a7c5d)
[storage.StorageDomain] Resource namespace
01_img_58a7c5dd-0b31-4066-ae05-8f541614dfde already registered
(sd:727)
2017-09-13 04:07:07,661-0500 INFO  (monitor/58a7c5d)
[storage.StorageDomain] Resource namespace
02_vol_58a7c5dd-0b31-4066-ae05-8f541614dfde already registered
(sd:736)
2017-09-13 04:07:07,665-0500 INFO  (ioprocess/9582) [IOProcess]
Starting ioprocess (__init__:452)
2017-09-13 04:07:07,668-0500 INFO  (monitor/f9cab69)
[storage.StorageDomain] Resource namespace
01_img_f9cab69e-ae3d-4b6d-8a02-389e4b1d8764 already registered
(sd:727)
2017-09-13 04:07:07,668-0500 INFO  (monitor/f9cab69)
[storage.StorageDomain] Resource namespace
02_vol_f9cab69e-ae3d-4b6d-8a02-389e4b1d8764 already registered
(sd:736)
2017-09-13 04:07:07,669-0500 INFO  (monitor/f927ceb)
[storage.StorageDomain] Resource namespace
01_img_f927ceb8-91d2-41bd-ba42-dc795395b6d0 already registered
(sd:727)
2017-09-13 04:07:07,669-0500 INFO  (monitor/f927ceb)
[storage.StorageDomain] Resource namespace
02_vol_f927ceb8-91d2-41bd-ba42-dc795395b6d0 already registered
(sd:736)
2017-09-13 04:07:07,688-0500 INFO  (monitor/031c06e)
[storage.StorageDomain] Resource namespace
01_img_031c06e4-9ba6-4572-8800-39da4a5420f0 already registered
(sd:727)
2017-09-13 04:07:07,688-0500 INFO  (monitor/031c06e)
[storage.StorageDomain] Resource namespace
02_vol_031c06e4-9ba6-4572-8800-39da4a5420f0 already registered
(sd:736)
2017-09-13 04:07:07,725-0500 INFO  (monitor/804be30)
[storage.StorageDomain] Resource namespace
01_img_804be302-3763-403b-96db-febebcbfd778 already registered
(sd:727)
2017-09-13 04:07:07,725-0500 INFO  (monitor/804be30)
[storage.StorageDomain] Resource namespace
02_vol_804be302-3763-403b-96db-febebcbfd778 already registered
(sd:736)
2017-09-13 04:07:14,688-0500 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer]
RPC call Host.getAllVmStats succeeded in 0.02 seconds (__init__:539)

On Mon, Sep 18, 2017 at 6:43 PM, Bryan Sockel <Bryan.Sockel at altn.com> wrote:
> Here are the logs as requested.
>
>
> Bryan Sockel
> Systems Administrator
> Alt-N Technologies | Grapevine, TX
> Office 817.601.3222 x233
>
> Sent using Alt-N's own MDaemon Messaging Server
> Get to know the Alt-N family by liking us on Facebook!
>
>
>
> -----Original Message-----
> From: "piotr.kliczewski at gmail.com -- Piotr Kliczewski"
> <piotr.kliczewski at gmail.com>
> To: Bryan Sockel <Bryan.Sockel at altn.com>
> Cc: "users at ovirt.org" <users at ovirt.org>
> Date: Thu, 14 Sep 2017 21:30:19 +0200
> Subject: Re: [ovirt-users] Server Not Responding
>
> Bryan,
>
> In your engine logs I see :
>
> 2017-09-13 04:07:07,599-05 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
> (DefaultQuartzScheduler3) [] Command 'GetAllVmStatsVDSCommand(HostName
> = vm-host-colo-1, VdsIdVDSCommandParametersBase:{runAsync='true',
> hostId='e75d4446-9bfc-47cb-8bf8-a2e681720b66'})' execution failed:
> VDSGenericzException: VDSNetworkException: Heartbeat exceeded
>
> It would br great to understand what happened on vdsm side because the
> engine was still trying to connect at: 2017-09-13 09:30:46,275-05
>
> In vdsm logs you provided I see that they start at 2017-09-13
> 09:01:08,895-0500 and end at 2017-09-13 09:53:24,760-0500.
>
> Please provide vdsm logs from the time the issue occurred.
>
> Thanks,
> Piotr
>
> On Wed, Sep 13, 2017 at 5:09 PM, Bryan Sockel <Bryan.Sockel at altn.com> wrote:
>>
>> Hi
>>
>> Having an issue where i frequently have a server that is set to not
>> responsive.  VM's are set to unknown status, but still continue to run.
>> This issue is isolated to just a single host.  My Setup is currently a 2
>> Data Center Configuration with 2 servers in each data center.  Issue is
>> occurring at my remote site.
>>
>> The primary storage volumes are setup on dedicated hardware, with the
>> arbiter running on the server that is having issues.  There is also
>> another
>> gluster replica volume hosted on this box, the replica is the other
>> dedicated server.
>>
>> The logs are showing:
>>
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
>> (DefaultQuartzScheduler8) [] Command 'GetCapabilitiesVDSCommand(HostName =
>> vm-host-colo-1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
>> hostId='e75d4446-9bfc-47cb-8bf8-a2e681720b66',
>> vds='Host[vm-host-colo-1,e75d4446-9bfc-47cb-8bf8-a2e681720b66]'})'
>> execution
>> failed: java.rmi.ConnectException: Connection timeout
>>
>> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
>> (DefaultQuartzScheduler8) [] Failure to refresh host 'vm-host-colo-1'
>> runtime info: java.rmi.ConnectException: Connection timeout.
>>
>>
>> I have attached the vdsm.log from the server with issues and the
>> engine.log.
>>
>> Thanks
>>
>> Bryan Sockel
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>


More information about the Users mailing list