On Mon, Feb 3, 2014 at 11:23 PM, Itamar Heim <iheim@redhat.com> wrote:
On 02/03/2014 01:19 PM, Andrew Lau wrote:
The issue was a split-brain issue on the dom_md/ids file causing an
input/output error, thanks!

is this with gluster?

​Yup a 2 brick gluster replicated instance serving the NFS server, sorry was meant to say I resolved it too.​

 


On Mon, Feb 3, 2014 at 10:43 PM, Andrew Lau <andrew@andrewklau.com
<mailto:andrew@andrewklau.com>> wrote:

    On Mon, Feb 3, 2014 at 10:40 PM, Doron Fediuck <dfediuck@redhat.com
    <mailto:dfediuck@redhat.com>> wrote:



        ----- Original Message -----
         > From: "Andrew Lau" <andrew@andrewklau.com
        <mailto:andrew@andrewklau.com>>
         > To: "Doron Fediuck" <dfediuck@redhat.com
        <mailto:dfediuck@redhat.com>>
         > Cc: "users" <users@ovirt.org <mailto:users@ovirt.org>>, "Jiri
        Moskovcak" <jmoskovc@redhat.com <mailto:jmoskovc@redhat.com>>,
        "Greg Padgett" <gpadgett@redhat.com <mailto:gpadgett@redhat.com>>
         > Sent: Monday, February 3, 2014 1:35:01 PM
         > Subject: Re: [Users] Hosted Engine always reports "unknown
        stale-data"
         >
         > On Mon, Feb 3, 2014 at 9:53 PM, Doron Fediuck
        <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> wrote:
         >
         > >
         > >
         > > ----- Original Message -----
         > > > From: "Andrew Lau" <andrew@andrewklau.com
        <mailto:andrew@andrewklau.com>>
         > > > To: "users" <users@ovirt.org <mailto:users@ovirt.org>>
         > > > Sent: Monday, February 3, 2014 12:32:45 PM
         > > > Subject: [Users] Hosted Engine always reports "unknown
        stale-data"
         > > >
         > > > Hi,
         > > >
         > > > I was wondering if anyone has this same notice when they run:
         > > > hosted-engine --vm-status
         > > >
         > > > The "engine status" will always be "unknown stale-data"
        even when the VM
         > > is
         > > > powered on and the engine is online. engine-health will
        actually report
         > > the
         > > > correct status.
         > > >
         > > > eg.
         > > >
         > > > --== Host 1 status ==--
         > > >
         > > > Status up-to-date : False
         > > > Hostname : 172.16.0.11
         > > > Host ID : 1
         > > > Engine status : unknown stale-data
         > > >
         > > > Is it some sort of blocked port causing this or is this
        by design?
         > > >
         > > > Thanks,
         > > > Andrew
         > > >
         > > > _______________________________________________
         > > > Users mailing list
         > > > Users@ovirt.org <mailto:Users@ovirt.org>

         > > > http://lists.ovirt.org/mailman/listinfo/users
         > > >
         > >
         > > Hi Andrew,
         > > it looks like an issue with the time stamp.
         > > Which time stamp do you have? How relevant is it?
         > >
         >
         > timestamps seem to be outdated by a lot, interesting error in
        the broker.log
         >
         > Thread-24::INFO::2014-02-03
         >
        22:33:14,801::engine_health::90::engine_health.CpuLoadNoEngine::(action)
        VM
         > not running on this host, status down
         > Thread-22::INFO::2014-02-03
         > 22:33:14,834::mem_free::53::mem_free.MemFree::(action)
        memFree: 27382
         > Thread-23::ERROR::2014-02-03
         >
        22:33:14,922::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file)
         > Failed to getVmStats: 'pid'
         > Thread-23::INFO::2014-02-03
         >
        22:33:14,923::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load)
         > System load total=0.0124, engine=0.0000, non-engine=0.0124
         >
         > I'm assuming that update_stat_file is the metadata file the
        vm-status is
         > getting pulled from?
         >

        Yep.
        Can you please verify the time your host actually has?
        ie- we have a known issue with time, since we assume all
        hosts are in sync. So if one of your hosts has a time sync
        issue, this can explain the problem you see.


    --== Host 1 status ==--

    Status up-to-date                  : False
    Hostname                           : 172.16.0.11
    Host ID                            : 1
    Engine status                      : unknown stale-data
    Score                              : 0
    Local maintenance                  : False
    Host timestamp                     : 1391417611

    --== Host 2 status ==--

    Status up-to-date                  : False
    Hostname                           : 172.16.0.12
    Host ID                            : 2
    Engine status                      : unknown stale-data
    Score                              : 0
    Local maintenance                  : False
    Host timestamp                     : 1391417171


    ​
    [root@hv01 ~]# date +%s
                             │[root@hv02 ~]# date +%s
    ​​
    1391427754
                            │139142775
    ​5​

    ​​




_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users