
Hello: I'm still fairly new to ovirt. I'm running a 3-node cluster largely built by Jason Brooks' howto for ovirt+gluster on the contributed docs section of the ovirt webpage. I had everything mostly working, and this morning when I logged in, I saw a new symbol attached to all three of my hosts indicating an upgrade is available. So I clicked on egine3 and told it to upgrade. It migrated my VMs off, did its upgrade, and everything looked good. I was able to migrate a vm or two back, and they continued to function just fine. Then I tried to upgrade eingine1, which was running my hosted engine. In theory, all three engines/hosts were set up to be able to run the engine, per Jason's instructions. However, it failed to migrate the engine off host1, and I realized that I still have the same issue I had on an earlier incarnation of this cluster: inability to migrate the engine around. Ok, I'll deal with that later (with help from this list, hopefully). I went on about the work I came in to do, and tried to start up a VM. It appeared to start, but it never booted. It did raise the CPU usage for that VM, but console was all black, no resize or anything. Tried several settings. This was on a VM I had just powered down. I noticed it was starting the VM on engine3, so I did a runonce specifying the vm start on engine2. Booted up just fine. After booting, I could migrate to engine3, and all was good. What happened? I get no error messages, starting any vm on engine3, start paused, attaching display, then running it, I always get the same thing: blank console, about 50% cpu usage reported by the web interface, no response on any network, and by all signs available to me, no actual booting (reminds me of a PC that doesn't POST). Simply changing the engine it starts on to one that has not been upgraded fixes the problem. I'd greatly appreciate your help: 1) how to fix it so the upgraded engine can start VMs again 2) How to fix the cluster so the HostedEngine can migrate between hosts (and I'm able to put host1 in maintence mode). Ovirt 4 series, latest in repos as of last weekend (Jan1). --Jim

--Sig_/TaBhvAtOI7N=Xv3g2v1xZMb Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Sat, 7 Jan 2017 15:02:10 -0800 Jim wrote: JK> I went on about the work I came in to do, and tried to start up a VM. = It JK> appeared to start, but it never booted. It did raise the CPU usage for JK> that VM, but console was all black, no resize or anything. Tried sever= al JK> settings. This was on a VM I had just powered down. I noticed it was JK> starting the VM on engine3, so I did a runonce specifying the vm start = on JK> engine2. Booted up just fine. After booting, I could migrate to engin= e3, JK> and all was good. JK>=20 JK> What happened? I get no error messages, starting any vm on engine3, st= art JK> paused, attaching display, then running it, I always get the same thing: JK> blank console, about 50% cpu usage reported by the web interface, no JK> response on any network, and by all signs available to me, no actual JK> booting (reminds me of a PC that doesn't POST). Simply changing the en= gine JK> it starts on to one that has not been upgraded fixes the problem. I had this issue too, except I had 100% cpu usage reported on the web interface. have you rebooted the troublesome host since it was upgraded? I think that was what solved it for me. Robert --=20 Senior Software Engineer @ Parsons --Sig_/TaBhvAtOI7N=Xv3g2v1xZMb Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlhycn8ACgkQ7/fVLLY1mnhKNQCePJ17eZcgb7uNlZM+ste4E/ha mnkAnjnKwNsdDQdz2OU/8TTvGrk+/34Z =FWH+ -----END PGP SIGNATURE----- --Sig_/TaBhvAtOI7N=Xv3g2v1xZMb--

Well, it turned out it was 100% of one core, percentage reported took into account how many cores the VM had assigned. Rebooting the node did fix the problem. Just to be clear, the "proper" procedure for rebooting a host in oVirt is to put it in maintence mode, ssh to the node, issue the reboot, then after confirming its back up, right click on the node in the web UI and select "confirm node reboot", then take it out of maintence mode? --Jim On Sun, Jan 8, 2017 at 9:10 AM, Robert Story <rstory@tislabs.com> wrote:
On Sat, 7 Jan 2017 15:02:10 -0800 Jim wrote: JK> I went on about the work I came in to do, and tried to start up a VM. It JK> appeared to start, but it never booted. It did raise the CPU usage for JK> that VM, but console was all black, no resize or anything. Tried several JK> settings. This was on a VM I had just powered down. I noticed it was JK> starting the VM on engine3, so I did a runonce specifying the vm start on JK> engine2. Booted up just fine. After booting, I could migrate to engine3, JK> and all was good. JK> JK> What happened? I get no error messages, starting any vm on engine3, start JK> paused, attaching display, then running it, I always get the same thing: JK> blank console, about 50% cpu usage reported by the web interface, no JK> response on any network, and by all signs available to me, no actual JK> booting (reminds me of a PC that doesn't POST). Simply changing the engine JK> it starts on to one that has not been upgraded fixes the problem.
I had this issue too, except I had 100% cpu usage reported on the web interface. have you rebooted the troublesome host since it was upgraded? I think that was what solved it for me.
Robert
-- Senior Software Engineer @ Parsons

--Sig_/Qgz9W9sOZGNbqCYOucN3ti1 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Sun, 8 Jan 2017 18:15:27 -0800 Jim wrote: JK> Just to be clear, the "proper" procedure for rebooting a host in oVirt = is JK> to put it in maintence mode, ssh to the node, issue the reboot, then af= ter JK> confirming its back up, right click on the node in the web UI and select JK> "confirm node reboot", then take it out of maintence mode? I think the 'confirm node reboot' step can be stepped if you put it into maintenance before rebooting. I think it's needed when a host which was running VMs gets hung and you need to force a reboot. Confirming the reboot lets the engine know that those VMs are no longer running on that node. But it certainly won't do any harm to do it anyways. Robert --=20 Senior Software Engineer @ Parsons --Sig_/Qgz9W9sOZGNbqCYOucN3ti1 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlhy+E0ACgkQ7/fVLLY1mnjNxACgn+erk+UT9L/6TCb/wKv0HYKM TSwAoIn1BMHW1DVnfgr5i4A/bHybdLGQ =DLWK -----END PGP SIGNATURE----- --Sig_/Qgz9W9sOZGNbqCYOucN3ti1--
participants (2)
-
Jim Kusznir
-
Robert Story