[Users] Hosted Engine always reports "unknown stale-data"

Hi, I was wondering if anyone has this same notice when they run: hosted-engine --vm-status The "engine status" will always be "unknown stale-data" even when the VM is powered on and the engine is online. engine-health will actually report the correct status. eg. --== Host 1 status ==-- Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data Is it some sort of blocked port causing this or is this by design? Thanks, Andrew

----- Original Message -----
From: "Andrew Lau" <andrew@andrewklau.com> To: "users" <users@ovirt.org> Sent: Monday, February 3, 2014 12:32:45 PM Subject: [Users] Hosted Engine always reports "unknown stale-data"
Hi,
I was wondering if anyone has this same notice when they run: hosted-engine --vm-status
The "engine status" will always be "unknown stale-data" even when the VM is powered on and the engine is online. engine-health will actually report the correct status.
eg.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data
Is it some sort of blocked port causing this or is this by design?
Thanks, Andrew
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Hi Andrew, it looks like an issue with the time stamp. Which time stamp do you have? How relevant is it?

On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com> wrote:
From: "Andrew Lau" <andrew@andrewklau.com> To: "users" <users@ovirt.org> Sent: Monday, February 3, 2014 12:32:45 PM Subject: [Users] Hosted Engine always reports "unknown stale-data"
Hi,
I was wondering if anyone has this same notice when they run: hosted-engine --vm-status
The "engine status" will always be "unknown stale-data" even when the VM is powered on and the engine is online. engine-health will actually report
----- Original Message ----- the
correct status.
eg.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data
Is it some sort of blocked port causing this or is this by design?
Thanks, Andrew
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Hi Andrew, it looks like an issue with the time stamp. Which time stamp do you have? How relevant is it?
timestamps seem to be outdated by a lot, interesting error in the broker.log Thread-24::INFO::2014-02-03 22:33:14,801::engine_health::90::engine_health.CpuLoadNoEngine::(action) VM not running on this host, status down Thread-22::INFO::2014-02-03 22:33:14,834::mem_free::53::mem_free.MemFree::(action) memFree: 27382 Thread-23::ERROR::2014-02-03 22:33:14,922::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to getVmStats: 'pid' Thread-23::INFO::2014-02-03 22:33:14,923::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load) System load total=0.0124, engine=0.0000, non-engine=0.0124 I'm assuming that update_stat_file is the metadata file the vm-status is getting pulled from?

----- Original Message -----
From: "Andrew Lau" <andrew@andrewklau.com> To: "Doron Fediuck" <dfediuck@redhat.com> Cc: "users" <users@ovirt.org>, "Jiri Moskovcak" <jmoskovc@redhat.com>, "Greg Padgett" <gpadgett@redhat.com> Sent: Monday, February 3, 2014 1:35:01 PM Subject: Re: [Users] Hosted Engine always reports "unknown stale-data"
On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com> wrote:
From: "Andrew Lau" <andrew@andrewklau.com> To: "users" <users@ovirt.org> Sent: Monday, February 3, 2014 12:32:45 PM Subject: [Users] Hosted Engine always reports "unknown stale-data"
Hi,
I was wondering if anyone has this same notice when they run: hosted-engine --vm-status
The "engine status" will always be "unknown stale-data" even when the VM is powered on and the engine is online. engine-health will actually report
----- Original Message ----- the
correct status.
eg.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data
Is it some sort of blocked port causing this or is this by design?
Thanks, Andrew
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Hi Andrew, it looks like an issue with the time stamp. Which time stamp do you have? How relevant is it?
timestamps seem to be outdated by a lot, interesting error in the broker.log
Thread-24::INFO::2014-02-03 22:33:14,801::engine_health::90::engine_health.CpuLoadNoEngine::(action) VM not running on this host, status down Thread-22::INFO::2014-02-03 22:33:14,834::mem_free::53::mem_free.MemFree::(action) memFree: 27382 Thread-23::ERROR::2014-02-03 22:33:14,922::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to getVmStats: 'pid' Thread-23::INFO::2014-02-03 22:33:14,923::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load) System load total=0.0124, engine=0.0000, non-engine=0.0124
I'm assuming that update_stat_file is the metadata file the vm-status is getting pulled from?
Yep. Can you please verify the time your host actually has? ie- we have a known issue with time, since we assume all hosts are in sync. So if one of your hosts has a time sync issue, this can explain the problem you see.

On Mon, Feb 3, 2014 at 10:40 PM, Doron Fediuck <dfediuck@redhat.com> wrote:
From: "Andrew Lau" <andrew@andrewklau.com> To: "Doron Fediuck" <dfediuck@redhat.com> Cc: "users" <users@ovirt.org>, "Jiri Moskovcak" <jmoskovc@redhat.com>, "Greg Padgett" <gpadgett@redhat.com> Sent: Monday, February 3, 2014 1:35:01 PM Subject: Re: [Users] Hosted Engine always reports "unknown stale-data"
On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com> wrote:
----- Original Message -----
From: "Andrew Lau" <andrew@andrewklau.com> To: "users" <users@ovirt.org> Sent: Monday, February 3, 2014 12:32:45 PM Subject: [Users] Hosted Engine always reports "unknown stale-data"
Hi,
I was wondering if anyone has this same notice when they run: hosted-engine --vm-status
The "engine status" will always be "unknown stale-data" even when
----- Original Message ----- the VM
powered on and the engine is online. engine-health will actually report
is the
correct status.
eg.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data
Is it some sort of blocked port causing this or is this by design?
Thanks, Andrew
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Hi Andrew, it looks like an issue with the time stamp. Which time stamp do you have? How relevant is it?
timestamps seem to be outdated by a lot, interesting error in the broker.log
Thread-24::INFO::2014-02-03 22:33:14,801::engine_health::90::engine_health.CpuLoadNoEngine::(action) VM not running on this host, status down Thread-22::INFO::2014-02-03 22:33:14,834::mem_free::53::mem_free.MemFree::(action) memFree: 27382 Thread-23::ERROR::2014-02-03
22:33:14,922::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file)
Failed to getVmStats: 'pid' Thread-23::INFO::2014-02-03
22:33:14,923::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load)
System load total=0.0124, engine=0.0000, non-engine=0.0124
I'm assuming that update_stat_file is the metadata file the vm-status is getting pulled from?
Yep. Can you please verify the time your host actually has? ie- we have a known issue with time, since we assume all hosts are in sync. So if one of your hosts has a time sync issue, this can explain the problem you see.
--== Host 1 status ==-- Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417611 --== Host 2 status ==-- Status up-to-date : False Hostname : 172.16.0.12 Host ID : 2 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417171 [root@hv01 ~]# date +%s │[root@hv02 ~]# date +%s 1391427754 │139142775 5

The issue was a split-brain issue on the dom_md/ids file causing an input/output error, thanks! On Mon, Feb 3, 2014 at 10:43 PM, Andrew Lau <andrew@andrewklau.com> wrote:
On Mon, Feb 3, 2014 at 10:40 PM, Doron Fediuck <dfediuck@redhat.com>wrote:
From: "Andrew Lau" <andrew@andrewklau.com> To: "Doron Fediuck" <dfediuck@redhat.com> Cc: "users" <users@ovirt.org>, "Jiri Moskovcak" <jmoskovc@redhat.com>, "Greg Padgett" <gpadgett@redhat.com> Sent: Monday, February 3, 2014 1:35:01 PM Subject: Re: [Users] Hosted Engine always reports "unknown stale-data"
On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com> wrote:
----- Original Message -----
From: "Andrew Lau" <andrew@andrewklau.com> To: "users" <users@ovirt.org> Sent: Monday, February 3, 2014 12:32:45 PM Subject: [Users] Hosted Engine always reports "unknown stale-data"
Hi,
I was wondering if anyone has this same notice when they run: hosted-engine --vm-status
The "engine status" will always be "unknown stale-data" even when
----- Original Message ----- the VM
powered on and the engine is online. engine-health will actually report
is the
correct status.
eg.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data
Is it some sort of blocked port causing this or is this by design?
Thanks, Andrew
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Hi Andrew, it looks like an issue with the time stamp. Which time stamp do you have? How relevant is it?
timestamps seem to be outdated by a lot, interesting error in the broker.log
Thread-24::INFO::2014-02-03
22:33:14,801::engine_health::90::engine_health.CpuLoadNoEngine::(action) VM
not running on this host, status down Thread-22::INFO::2014-02-03 22:33:14,834::mem_free::53::mem_free.MemFree::(action) memFree: 27382 Thread-23::ERROR::2014-02-03
22:33:14,922::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file)
Failed to getVmStats: 'pid' Thread-23::INFO::2014-02-03
22:33:14,923::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load)
System load total=0.0124, engine=0.0000, non-engine=0.0124
I'm assuming that update_stat_file is the metadata file the vm-status is getting pulled from?
Yep. Can you please verify the time your host actually has? ie- we have a known issue with time, since we assume all hosts are in sync. So if one of your hosts has a time sync issue, this can explain the problem you see.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417611
--== Host 2 status ==--
Status up-to-date : False Hostname : 172.16.0.12 Host ID : 2 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417171
[root@hv01 ~]# date +%s │[root@hv02 ~]# date +%s 1391427754 │139142775 5

On 02/03/2014 01:19 PM, Andrew Lau wrote:
The issue was a split-brain issue on the dom_md/ids file causing an input/output error, thanks!
is this with gluster?
On Mon, Feb 3, 2014 at 10:43 PM, Andrew Lau <andrew@andrewklau.com <mailto:andrew@andrewklau.com>> wrote:
On Mon, Feb 3, 2014 at 10:40 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> wrote:
----- Original Message ----- > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com>> > To: "Doron Fediuck" <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> > Cc: "users" <users@ovirt.org <mailto:users@ovirt.org>>, "Jiri Moskovcak" <jmoskovc@redhat.com <mailto:jmoskovc@redhat.com>>, "Greg Padgett" <gpadgett@redhat.com <mailto:gpadgett@redhat.com>> > Sent: Monday, February 3, 2014 1:35:01 PM > Subject: Re: [Users] Hosted Engine always reports "unknown stale-data" > > On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> wrote: > > > > > > > ----- Original Message ----- > > > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com>> > > > To: "users" <users@ovirt.org <mailto:users@ovirt.org>> > > > Sent: Monday, February 3, 2014 12:32:45 PM > > > Subject: [Users] Hosted Engine always reports "unknown stale-data" > > > > > > Hi, > > > > > > I was wondering if anyone has this same notice when they run: > > > hosted-engine --vm-status > > > > > > The "engine status" will always be "unknown stale-data" even when the VM > > is > > > powered on and the engine is online. engine-health will actually report > > the > > > correct status. > > > > > > eg. > > > > > > --== Host 1 status ==-- > > > > > > Status up-to-date : False > > > Hostname : 172.16.0.11 > > > Host ID : 1 > > > Engine status : unknown stale-data > > > > > > Is it some sort of blocked port causing this or is this by design? > > > > > > Thanks, > > > Andrew > > > > > > _______________________________________________ > > > Users mailing list > > > Users@ovirt.org <mailto:Users@ovirt.org> > > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > Hi Andrew, > > it looks like an issue with the time stamp. > > Which time stamp do you have? How relevant is it? > > > > timestamps seem to be outdated by a lot, interesting error in the broker.log > > Thread-24::INFO::2014-02-03 > 22:33:14,801::engine_health::90::engine_health.CpuLoadNoEngine::(action) VM > not running on this host, status down > Thread-22::INFO::2014-02-03 > 22:33:14,834::mem_free::53::mem_free.MemFree::(action) memFree: 27382 > Thread-23::ERROR::2014-02-03 > 22:33:14,922::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) > Failed to getVmStats: 'pid' > Thread-23::INFO::2014-02-03 > 22:33:14,923::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load) > System load total=0.0124, engine=0.0000, non-engine=0.0124 > > I'm assuming that update_stat_file is the metadata file the vm-status is > getting pulled from? >
Yep. Can you please verify the time your host actually has? ie- we have a known issue with time, since we assume all hosts are in sync. So if one of your hosts has a time sync issue, this can explain the problem you see.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417611
--== Host 2 status ==--
Status up-to-date : False Hostname : 172.16.0.12 Host ID : 2 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417171
[root@hv01 ~]# date +%s │[root@hv02 ~]# date +%s 1391427754 │139142775 5
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Mon, Feb 3, 2014 at 11:23 PM, Itamar Heim <iheim@redhat.com> wrote:
On 02/03/2014 01:19 PM, Andrew Lau wrote:
The issue was a split-brain issue on the dom_md/ids file causing an input/output error, thanks!
is this with gluster?
Yup a 2 brick gluster replicated instance serving the NFS server, sorry was meant to say I resolved it too.
On Mon, Feb 3, 2014 at 10:43 PM, Andrew Lau <andrew@andrewklau.com <mailto:andrew@andrewklau.com>> wrote:
On Mon, Feb 3, 2014 at 10:40 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> wrote:
----- Original Message ----- > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com>> > To: "Doron Fediuck" <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> > Cc: "users" <users@ovirt.org <mailto:users@ovirt.org>>, "Jiri Moskovcak" <jmoskovc@redhat.com <mailto:jmoskovc@redhat.com>>, "Greg Padgett" <gpadgett@redhat.com <mailto:gpadgett@redhat.com>> > Sent: Monday, February 3, 2014 1:35:01 PM > Subject: Re: [Users] Hosted Engine always reports "unknown stale-data" > > On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> wrote: > > > > > > > ----- Original Message ----- > > > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com>> > > > To: "users" <users@ovirt.org <mailto:users@ovirt.org>> > > > Sent: Monday, February 3, 2014 12:32:45 PM > > > Subject: [Users] Hosted Engine always reports "unknown stale-data" > > > > > > Hi, > > > > > > I was wondering if anyone has this same notice when they run: > > > hosted-engine --vm-status > > > > > > The "engine status" will always be "unknown stale-data" even when the VM > > is > > > powered on and the engine is online. engine-health will actually report > > the > > > correct status. > > > > > > eg. > > > > > > --== Host 1 status ==-- > > > > > > Status up-to-date : False > > > Hostname : 172.16.0.11 > > > Host ID : 1 > > > Engine status : unknown stale-data > > > > > > Is it some sort of blocked port causing this or is this by design? > > > > > > Thanks, > > > Andrew > > > > > > _______________________________________________ > > > Users mailing list > > > Users@ovirt.org <mailto:Users@ovirt.org>
> > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > Hi Andrew, > > it looks like an issue with the time stamp. > > Which time stamp do you have? How relevant is it? > > > > timestamps seem to be outdated by a lot, interesting error in the broker.log > > Thread-24::INFO::2014-02-03 > 22:33:14,801::engine_health::90::engine_health. CpuLoadNoEngine::(action) VM > not running on this host, status down > Thread-22::INFO::2014-02-03 > 22:33:14,834::mem_free::53::mem_free.MemFree::(action) memFree: 27382 > Thread-23::ERROR::2014-02-03 > 22:33:14,922::cpu_load_no_engine::156::cpu_load_no_ engine.EngineHealth::(update_stat_file) > Failed to getVmStats: 'pid' > Thread-23::INFO::2014-02-03 > 22:33:14,923::cpu_load_no_engine::121::cpu_load_no_ engine.EngineHealth::(calculate_load) > System load total=0.0124, engine=0.0000, non-engine=0.0124 > > I'm assuming that update_stat_file is the metadata file the vm-status is > getting pulled from? >
Yep. Can you please verify the time your host actually has? ie- we have a known issue with time, since we assume all hosts are in sync. So if one of your hosts has a time sync issue, this can explain the problem you see.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417611
--== Host 2 status ==--
Status up-to-date : False Hostname : 172.16.0.12 Host ID : 2 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417171
[root@hv01 ~]# date +%s │[root@hv02 ~]# date +%s 1391427754 │139142775 5
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 02/03/2014 01:25 PM, Andrew Lau wrote:
On Mon, Feb 3, 2014 at 11:23 PM, Itamar Heim <iheim@redhat.com <mailto:iheim@redhat.com>>wrote:
On 02/03/2014 01:19 PM, Andrew Lau wrote:
The issue was a split-brain issue on the dom_md/ids file causing an input/output error, thanks!
is this with gluster?
Yup a 2 brick gluster replicated instance serving the NFS server, sorry was meant to say I resolved it too.
you have to use a gluster with quorum, or this will happen often
On Mon, Feb 3, 2014 at 10:43 PM, Andrew Lau <andrew@andrewklau.com <mailto:andrew@andrewklau.com> <mailto:andrew@andrewklau.com <mailto:andrew@andrewklau.com>>__> wrote:
On Mon, Feb 3, 2014 at 10:40 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com> <mailto:dfediuck@redhat.com <mailto:dfediuck@redhat.com>>> wrote:
----- Original Message ----- > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com> <mailto:andrew@andrewklau.com <mailto:andrew@andrewklau.com>>__> > To: "Doron Fediuck" <dfediuck@redhat.com <mailto:dfediuck@redhat.com> <mailto:dfediuck@redhat.com <mailto:dfediuck@redhat.com>>> > Cc: "users" <users@ovirt.org <mailto:users@ovirt.org> <mailto:users@ovirt.org <mailto:users@ovirt.org>>>, "Jiri Moskovcak" <jmoskovc@redhat.com <mailto:jmoskovc@redhat.com> <mailto:jmoskovc@redhat.com <mailto:jmoskovc@redhat.com>>>, "Greg Padgett" <gpadgett@redhat.com <mailto:gpadgett@redhat.com> <mailto:gpadgett@redhat.com <mailto:gpadgett@redhat.com>>> > Sent: Monday, February 3, 2014 1:35:01 PM > Subject: Re: [Users] Hosted Engine always reports "unknown stale-data" > > On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com> <mailto:dfediuck@redhat.com <mailto:dfediuck@redhat.com>>> wrote: > > > > > > > ----- Original Message ----- > > > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com> <mailto:andrew@andrewklau.com <mailto:andrew@andrewklau.com>>__> > > > To: "users" <users@ovirt.org <mailto:users@ovirt.org> <mailto:users@ovirt.org <mailto:users@ovirt.org>>> > > > Sent: Monday, February 3, 2014 12:32:45 PM > > > Subject: [Users] Hosted Engine always reports "unknown stale-data" > > > > > > Hi, > > > > > > I was wondering if anyone has this same notice when they run: > > > hosted-engine --vm-status > > > > > > The "engine status" will always be "unknown stale-data" even when the VM > > is > > > powered on and the engine is online. engine-health will actually report > > the > > > correct status. > > > > > > eg. > > > > > > --== Host 1 status ==-- > > > > > > Status up-to-date : False > > > Hostname : 172.16.0.11 > > > Host ID : 1 > > > Engine status : unknown stale-data > > > > > > Is it some sort of blocked port causing this or is this by design? > > > > > > Thanks, > > > Andrew > > > > > > _________________________________________________ > > > Users mailing list > > > Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>
> > > http://lists.ovirt.org/__mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> > > > > > > > Hi Andrew, > > it looks like an issue with the time stamp. > > Which time stamp do you have? How relevant is it? > > > > timestamps seem to be outdated by a lot, interesting error in the broker.log > > Thread-24::INFO::2014-02-03 >
22:33:14,801::engine_health::__90::engine_health.__CpuLoadNoEngine::(action) VM > not running on this host, status down > Thread-22::INFO::2014-02-03 > 22:33:14,834::mem_free::53::__mem_free.MemFree::(action) memFree: 27382 > Thread-23::ERROR::2014-02-03 >
22:33:14,922::cpu_load_no___engine::156::cpu_load_no___engine.EngineHealth::(update___stat_file) > Failed to getVmStats: 'pid' > Thread-23::INFO::2014-02-03 >
22:33:14,923::cpu_load_no___engine::121::cpu_load_no___engine.EngineHealth::(__calculate_load) > System load total=0.0124, engine=0.0000, non-engine=0.0124 > > I'm assuming that update_stat_file is the metadata file the vm-status is > getting pulled from? >
Yep. Can you please verify the time your host actually has? ie- we have a known issue with time, since we assume all hosts are in sync. So if one of your hosts has a time sync issue, this can explain the problem you see.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417611
--== Host 2 status ==--
Status up-to-date : False Hostname : 172.16.0.12 Host ID : 2 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417171
[root@hv01 ~]# date +%s │[root@hv02 ~]# date +%s 1391427754 │139142775 5
_________________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/__mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

On Mon, Feb 3, 2014 at 11:27 PM, Itamar Heim <iheim@redhat.com> wrote:
On 02/03/2014 01:25 PM, Andrew Lau wrote:
On Mon, Feb 3, 2014 at 11:23 PM, Itamar Heim <iheim@redhat.com <mailto:iheim@redhat.com>>wrote:
On 02/03/2014 01:19 PM, Andrew Lau wrote:
The issue was a split-brain issue on the dom_md/ids file causing an input/output error, thanks!
is this with gluster?
Yup a 2 brick gluster replicated instance serving the NFS server, sorry was meant to say I resolved it too.
you have to use a gluster with quorum, or this will happen often
Yeah, I disabled quorum temporarily because I'm only using a two host scenario and I need to have the case scenario where one is to be shutdown the VMs won't end up in a paused state.
On Mon, Feb 3, 2014 at 10:43 PM, Andrew Lau <andrew@andrewklau.com <mailto:andrew@andrewklau.com> <mailto:andrew@andrewklau.com <mailto:andrew@andrewklau.com>>__> wrote:
On Mon, Feb 3, 2014 at 10:40 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com> <mailto:dfediuck@redhat.com <mailto:dfediuck@redhat.com>>> wrote:
----- Original Message ----- > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com> <mailto:andrew@andrewklau.com <mailto:andrew@andrewklau.com>>__> > To: "Doron Fediuck" <dfediuck@redhat.com <mailto:dfediuck@redhat.com> <mailto:dfediuck@redhat.com <mailto:dfediuck@redhat.com
> Cc: "users" <users@ovirt.org <mailto:users@ovirt.org> <mailto:users@ovirt.org <mailto:users@ovirt.org>>>, "Jiri Moskovcak" <jmoskovc@redhat.com <mailto:jmoskovc@redhat.com> <mailto:jmoskovc@redhat.com <mailto:jmoskovc@redhat.com>>>, "Greg Padgett" <gpadgett@redhat.com <mailto:gpadgett@redhat.com> <mailto:gpadgett@redhat.com <mailto:gpadgett@redhat.com>>> > Sent: Monday, February 3, 2014 1:35:01 PM > Subject: Re: [Users] Hosted Engine always reports "unknown stale-data" > > On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck <dfediuck@redhat.com <mailto:dfediuck@redhat.com> <mailto:dfediuck@redhat.com <mailto:dfediuck@redhat.com>>> wrote: > > > > > > > ----- Original Message ----- > > > From: "Andrew Lau" <andrew@andrewklau.com <mailto:andrew@andrewklau.com> <mailto:andrew@andrewklau.com <mailto:andrew@andrewklau.com>>__> > > > To: "users" <users@ovirt.org <mailto:users@ovirt.org> <mailto:users@ovirt.org <mailto:users@ovirt.org>>> > > > Sent: Monday, February 3, 2014 12:32:45 PM > > > Subject: [Users] Hosted Engine always reports "unknown stale-data" > > > > > > Hi, > > > > > > I was wondering if anyone has this same notice when they run: > > > hosted-engine --vm-status > > > > > > The "engine status" will always be "unknown stale-data" even when the VM > > is > > > powered on and the engine is online. engine-health will actually report > > the > > > correct status. > > > > > > eg. > > > > > > --== Host 1 status ==-- > > > > > > Status up-to-date : False > > > Hostname : 172.16.0.11 > > > Host ID : 1 > > > Engine status : unknown stale-data > > > > > > Is it some sort of blocked port causing this or is this by design? > > > > > > Thanks, > > > Andrew > > > > > > _________________________________________________ > > > Users mailing list > > > Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>
> > > http://lists.ovirt.org/__mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users> > > > > > > > Hi Andrew, > > it looks like an issue with the time stamp. > > Which time stamp do you have? How relevant is it? > > > > timestamps seem to be outdated by a lot, interesting error in the broker.log > > Thread-24::INFO::2014-02-03 >
22:33:14,801::engine_health::__90::engine_health.__ CpuLoadNoEngine::(action) VM > not running on this host, status down > Thread-22::INFO::2014-02-03 > 22:33:14,834::mem_free::53::__ mem_free.MemFree::(action) memFree: 27382 > Thread-23::ERROR::2014-02-03 >
22:33:14,922::cpu_load_no___engine::156::cpu_load_no___ engine.EngineHealth::(update___stat_file) > Failed to getVmStats: 'pid' > Thread-23::INFO::2014-02-03 >
22:33:14,923::cpu_load_no___engine::121::cpu_load_no___ engine.EngineHealth::(__calculate_load) > System load total=0.0124, engine=0.0000, non-engine=0.0124 > > I'm assuming that update_stat_file is the metadata file the vm-status is > getting pulled from? >
Yep. Can you please verify the time your host actually has? ie- we have a known issue with time, since we assume all hosts are in sync. So if one of your hosts has a time sync issue, this can explain the problem you see.
--== Host 1 status ==--
Status up-to-date : False Hostname : 172.16.0.11 Host ID : 1 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417611
--== Host 2 status ==--
Status up-to-date : False Hostname : 172.16.0.12 Host ID : 2 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1391417171
[root@hv01 ~]# date +%s │[root@hv02 ~]# date +%s 1391427754 │139142775 5
_________________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/__mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

On 02/03/2014 01:29 PM, Andrew Lau wrote:
On Mon, Feb 3, 2014 at 11:27 PM, Itamar Heim <iheim@redhat.com <mailto:iheim@redhat.com>>wrote:
On 02/03/2014 01:25 PM, Andrew Lau wrote:
On Mon, Feb 3, 2014 at 11:23 PM, Itamar Heim <iheim@redhat.com <mailto:iheim@redhat.com> <mailto:iheim@redhat.com <mailto:iheim@redhat.com>>>__wrote:
On 02/03/2014 01:19 PM, Andrew Lau wrote:
The issue was a split-brain issue on the dom_md/ids file causing an input/output error, thanks!
is this with gluster?
Yup a 2 brick gluster replicated instance serving the NFS server, sorry was meant to say I resolved it too.
you have to use a gluster with quorum, or this will happen often
Yeah, I disabled quorum temporarily because I'm only using a two host scenario and I need to have the case scenario where one is to be shutdown the VMs won't end up in a paused state.
iiuc, without quorum you'll get both hosted engine and the SPM into split brains

On Mon, Feb 3, 2014 at 11:33 PM, Itamar Heim <iheim@redhat.com> wrote:
On 02/03/2014 01:29 PM, Andrew Lau wrote:
On Mon, Feb 3, 2014 at 11:27 PM, Itamar Heim <iheim@redhat.com <mailto:iheim@redhat.com>>wrote:
On 02/03/2014 01:25 PM, Andrew Lau wrote:
On Mon, Feb 3, 2014 at 11:23 PM, Itamar Heim <iheim@redhat.com <mailto:iheim@redhat.com> <mailto:iheim@redhat.com <mailto:iheim@redhat.com>>>__wrote:
On 02/03/2014 01:19 PM, Andrew Lau wrote:
The issue was a split-brain issue on the dom_md/ids file causing an input/output error, thanks!
is this with gluster?
Yup a 2 brick gluster replicated instance serving the NFS server, sorry was meant to say I resolved it too.
you have to use a gluster with quorum, or this will happen often
Yeah, I disabled quorum temporarily because I'm only using a two host scenario and I need to have the case scenario where one is to be shutdown the VMs won't end up in a paused state.
iiuc, without quorum you'll get both hosted engine and the SPM into split brains
I've been using the non-quorum method for a while though and it's seemed to work all right, and this only case I had the split-brain was because I was actually messing with gluster and deleting full brick contents to test cgroups.
participants (3)
-
Andrew Lau
-
Doron Fediuck
-
Itamar Heim