[ovirt-users] Unable to add host to cluster after network

Eitan Raviv eraviv at redhat.com
Wed Apr 18 13:41:14 UTC 2018


Hi Stack,

I read through your ordeal and I would like to post a few comments:

   - When I try to reproduce your scenario with the second network set to
   'not required' before on-boarding the second host, it  is processed and set
   to 'up' by the engine without any hiccups or any errors in the log.
   - On the other hand, if the network is 'required' the scenario
   reproduces, but on my setup it can be resolved: initially the second
   network is proclaimed missing and the host becomes non-operational, with
   its interfaces disappearing from the engine as you reported. But if the
   second network is rendered 'not-required' or even deleted for that matter
   from the engine, engine succeeds in reconnecting to the second host within
   a couple of minutes, and the host gains 'up' status.

HTH

On Tue, Apr 17, 2018 at 11:35 PM, ~Stack~ <i.am.stack at gmail.com> wrote:

> Greetings,
>
> After a few days of trial, error, and madness - I *think* I found the
> source of my problem. Or at least I can now replicate it reliably. These
> are the basics of my speed-run-to-test-failures setup.
>
> Fresh minimal install of Scientific Linux 7.4 on a physical host for my
> engine. Add the 4.2 repo and run engine-setup - just blast through the
> defaults. Configure it with default DC and cluster.
>
> Fresh minimal install of Scientific Linux 7.4 on node1 - configure only
> the primary network card. Add the ovirt repo.
>
> Add the host into cluster. Provisions just fine. Life is good.
>
> Now here is where things split.
>
> Scenario 1: build node2 same as node 1 configuring only the primary
> network card and add it as a host. Provisions just fine. Life is good.
>
> Scenario 2: Configure a second network. In my case a BMC/IPMI network.
> Doesn't matter if it is required or not - both will cause failures
> however the errors are slightly more evident with required. Make sure
> the network is assigned to your node1 and is properly assigned an IP and
> configured in the up state. Now build node2 same as before with only the
> primary network configured and add it as a host.
>
> Failure followed by infinite loop of setting it into Non-Operational!
>
>
> The pop-up gives you some crap about "Host has no default route." but
> that is 100% a red-herring.
>
> Dig a little deeper and you get a message like this:
> "node2 does not comply with the cluster Default networks, the following
> networks are missing on host: 'ovirtmgmt'"
>
> Ah. That's a bit more relevant, but why can't it configure it? Or at
> least get to the point where it asks me "Hey, networking is a bit off -
> do you want to configure that now?" That would be nice...
>
> Fortunately the troubleshooting guide has something about that!
> https://www.ovirt.org/documentation/how-to/troubleshooting/
> troubleshooting/
>
> Unfortunately, it doesn't do anything to help. Even after doing these
> steps, the loop just keeps going...nothing changes.
> https://www.ovirt.org/develop/developer-guide/vdsm/
> installing-vdsm-from-rpm/
>
> Scratch it all and completely rebuild AGAIN for...
> Scenario 3: Configure a second network (BMC) and assign it to node1 just
> like before. Build out node2 same as node1 but this time add in the
> EXACT SAME NETWORK CONFIGURATION THAT IS WORKING ON NODE1 - ALL of the
> ifcfg-* files (but update the IP address to correct host, obviously).
> Now add it as a host.
>
> Doh! Same error. :-/
>
> OK fine. Let's really get into it. First off, the networking page for
> the host is blank. It never pulls back the network cards so you can't
> actually make changes via the web page. Nor can you assign networks. So
> the web interface doesn't help at all.
>
> Let's look at the engine log instead.
>
>
> 2018-04-17 14:33:00,336-05 INFO
> [org.ovirt.engine.core.bll.VdsEventListener]
> (EE-ManagedThreadFactory-engine-Thread-1091) []
> ResourceManager::vdsNotResponding entered for Host
> 'f0a3d515-8ba2-490e-8d65-54edbb52cefc', '192.168.1.4'
> 2018-04-17 14:33:00,360-05 INFO
> [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand]
> (EE-ManagedThreadFactory-engine-Thread-1091) [5291eee5] Lock Acquired to
> object
> 'EngineLock:{exclusiveLocks='[f0a3d515-8ba2-490e-8d65-
> 54edbb52cefc=VDS_FENCE]',
> sharedLocks=''}'
> 2018-04-17 14:33:00,388-05 ERROR
> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-44) [2b853e43] Host
> 'node2' is set to Non-Operational, it is missing the following networks:
> 'ovirtmgmt'
> 2018-04-17 14:33:00,403-05 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedThreadFactory-engineScheduled-Thread-44) [2b853e43] EVENT_ID:
> VDS_SET_NONOPERATIONAL_NETWORK(519), Host node2 does not comply with the
> cluster Default networks, the following networks are missing on host:
> 'ovirtmgmt'
> 2018-04-17 14:33:00,407-05 INFO
> [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand]
> (EE-ManagedThreadFactory-engine-Thread-1091) [5291eee5] Running command:
> VdsNotRespondingTreatmentCommand internal: true. Entities affected :
> ID: f0a3d515-8ba2-490e-8d65-54edbb52cefc Type: VDS
>
>
> There's the message from before. Good. On the right track. Not sure why
> it thinks the host is unreachable because the host is just fine.
>
> 2018-04-17 14:33:01,978-05 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-31) [] Command
> 'GetAllVmStatsVDSCommand(HostName = node2,
> VdsIdVDSCommandParametersBase:{hostId='f0a3d515-8ba2-490e-
> 8d65-54edbb52cefc'})'
> execution failed: java.net.NoRouteToHostException: No route to host
>
> Huh. Again with the no route to host. But THERE IS! The network is
> functioning perfectly. IP's all work. DNS all works. Routing is fine. I
> have no idea what it is complaining about.
>
> 2018-04-17 14:33:03,873-05 INFO
> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-39) [4f72afaa] START,
> SetVdsStatusVDSCommand(HostName = node2,
> SetVdsStatusVDSCommandParameters:{hostId='f0a3d515-8ba2-
> 490e-8d65-54edbb52cefc',
> status='NonOperational', nonOperationalReason='NETWORK_UNREACHABLE',
> stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 7459a748
>
> Which network is unreachable? Because every single one of them is fine!
> Ugh!
>
> I am completely stumped as to why it works perfectly
> pre-additional-networks but fails every time after a network is configured.
>
> A couple of questions.
>
> 1. I assume people have added hosts _after_ they've configured multiple
> networks. So what am I doing wrong? Why am I unable to add a host?
> Again, if I don't configure that second network, it will happily add all
> my hosts. But what happens when I want to add a host in the future?
>
> 2. How do I break that infuriating infinite non-operational loop? I
> can't put it into maintenance mode, I can't delete the host, or anything
> else. The options are greyed out. The only solution I've found is yank
> the power and after it freaks out for about 30 minutes because it can't
> find the host, it will stop trying. But I still can't seem to remove the
> bad host. There has to be a way via command-line to say "stop timing
> out, knock that off, and delete this host!" but I'm not finding it in my
> searching.
>
> 3. I feel like I go through periods with oVirt where everything is
> running exactly the way I want then something happens (like me trying to
> add a host! Or thinking I can just change a host IP without the whole
> thing dying on me!) and it all just falls apart. I feel like I am just
> stumbling through most of it. I've previously gotten a lot out of the
> Red Hat classes and work has offered to send me to a training of my
> choice this year. I am really considering taking the 318 Virtualization
> class. I'm curious though, how close is that to what I would be working
> with oVirt? I'm guessing that since 4.2 recently came out, there is
> probably minimal chance the class will be over 4.2 but maybe it is close
> enough? I would love to hear feedback.
>
> Thanks!
> ~Stack~
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 
Eitan Raviv
IRC: erav (#ovirt #vdsm #devel #rhev-dev)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180418/bc028649/attachment.html>


More information about the Users mailing list