October 2017 - Users - oVirt List Archives

Doubt about iptables host config
by Gianluca Cecchi 03 Oct '17

03 Oct '17

Hello, I have read this interesting blog post https://www.ovirt.org/blog/2016/12/extension-iptables-rules-oVirt-hosts/ In my case, to allow incoming connections from Nagios server to connect to Nagios nrpe daemon installed on hosts I have run [root@ovmgr1 ~]# engine-config --set IPTablesConfigSiteCustom=' > -A INPUT -p tcp --dport 5666 -s 10.4.5.99/32 -m comment --comment "Nagios NRPE daemon" -j ACCEPT > ' [root@ovmgr1 ~]# and systemctl restart ovirt-engine BTW: the link above misses the final ' apex at the end of the similar command in the given example On my oVirt running host (CentOS 7.4) in the mean time I have run [g.cecchi@ov300 ~]$ sudo iptables -I INPUT 16 -p tcp --dport 5666 -s 10.4.5.99/32 -m comment --comment "Nagios NRPE daemon" -j ACCEPT In fact the current "reject-with icmp-host-prohibited" was line 16 and I have inserted it right before. So far so good. I have a doubt if, in case of host put into maintenance and then reactivated, or rebooted, the rule will remain. Or do I have anyway to put any line in any file on host to set it persistently? I wouldn't like to go and reinstall it only to statically set a new iptables rule. Thanks, Gianluca

2 3

Proper Network Configuration
by ~Stack~ 03 Oct '17

03 Oct '17

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --wEg6kdsJlbpnhkD0mpdWdqOEdlJP6oNc1 Content-Type: multipart/mixed; boundary="HkTRrIMlc5sthQGPoqAwgFo6PqOrG1D7i"; protected-headers="v1" From: ~Stack~ <i.am.stack(a)gmail.com> To: users <users(a)ovirt.org> Message-ID: <731fe359-8d20-949c-f0fa-50220389cbaf(a)gmail.com> Subject: Proper Network Configuration --HkTRrIMlc5sthQGPoqAwgFo6PqOrG1D7i Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Greetings, For various reasons I have multiple networks that I am required to work with. I just want to ensure that I've understood the documentation for setting up Ovirt correctly. - First is my BMC/ilo network. The security team wants as few entry points into this as possible and wants as much segregation as possible. - Second is my "management" access network. For my other machines on this network this means admin-SSH/rsyslog/SaltStack configuration management/ect. - Third is my high speed network where my NFS storage sits and applications that need the bandwidth do their thing. - Fourth is my "public" access. My Engine host has the "management" and "public" networks. My Hypervisor hosts have the "BMC/ilo", "management", and "storage" networks. Is there a reason why I should add "public" on the hypervisors? Is there a reason why I may need "BMC/ilo" or "storage" on the Engine hos= t? Thanks! ~Stack~ --HkTRrIMlc5sthQGPoqAwgFo6PqOrG1D7i-- --wEg6kdsJlbpnhkD0mpdWdqOEdlJP6oNc1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJZ0rRVAAoJELkej+ysXJPm6PYP/3egWmQRTAWJvpGnEMuJtPcE IWxTyNhlWpF4nO+Euihj2f9zCtkgQJ1zt1chhfvgGnNcfX2PplqJDXPJ7oOeVEnS v3sddpgjNXMLHUvz2txsSqGfsQjCQiBQRBmcm3WY7MFH7y4ZZkyzJulA/E257Ogw BUHSefvpvi6DophHoGjyq4dNf4+1fHCorFRqruCK3mDvOAdXc3aCsUWIlELZIkDh zPw+++FFvGlTWtMJHoiTTTmjkQkLYjdBHHy7C1FJwj/DHWnZRHBl4+BXoxTk40GY 1zH9iXALuuBf/edNT7cFS0ScE3n1XCWrHBPcpLWR/mPWFW5Da5WyPPF0glnI1nN0 Qdwl8yreWz/oo8HS8beCYxSvqOro7yqzvdN5yjvQlxgYr9k/cJW81CMqnqMs+uMT iECxWQ5s1ICeBghF2sFvpVnED3yRlqOdxLr/gmBMgGi6fb+x1D6grnT8Hb4o+wDd mUOlx9wvDILtJUaJh/h9FNibrO/oFmxVQtJqsLACxZGcv9x0bz/wNGlswy+MPv8R O38ys9rvq7XQEpEtw/P4uLw2zb03Pnah6UFMredZpY2LdK4KyLNc5bwfoaafCjxW 2mj0WQMx9lAtLquznZaSkd1wouhjmmtBlhlACgTH/019TBzf7oy0pFJcHbXIoxac Je8z0wc4tveMHCK5oVpO =B3Ot -----END PGP SIGNATURE----- --wEg6kdsJlbpnhkD0mpdWdqOEdlJP6oNc1--

2 2

Hosted engine setup question
by Demeter Tibor 03 Oct '17

03 Oct '17

--=_d5a03c9f-d720-4690-b3f7-196f2e084694 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, I just installed a hosted engine based four nodes cluster to glustered storage. It seems to working fine, but I have some question about it. - I would like to make an own cluster and datacenter. Is it possible to remove a host and re-add to an another cluster while it is running the hosted engine? - Is it possible to remove default datacenter without any problems? - I have a productive ovirt cluter that is based on 3.5 series. It is using a shared nfs storage. Is it possible to migrate VMs from 3.5 to 4.1 with detach shared storage from the old cluster and attach it to the new cluster? - If yes what will happend with the VM properies? For example mac addresses, limits, etc. Those will be migrated or not? Thanks in advance, Regard Tibor --=_d5a03c9f-d720-4690-b3f7-196f2e084694 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><body><div style=3D"font-family: arial, helvetica, sans-serif; font-s= ize: 12pt; color: #000000"><div>Hi,</div><div><br data-mce-bogus=3D"1"></di= v><div>I just installed a hosted engine based four nodes cluster to gluster= ed storage.</div><div>It seems to working fine, but I have some question ab= out it.</div><div><br data-mce-bogus=3D"1"></div><div>- I would like to mak= e an own cluster and datacenter. Is it possible to remove a host and re-add= to an another cluster while it is running the hosted engine? </div><d= iv>- Is it possible to remove default datacenter without any problems?&nbsp= ;</div><div><br></div><div>- I have a productive ovirt cluter that is based= on 3.5 series. It is using a shared nfs storage.  Is it possible to m= igrate VMs from 3.5 to 4.1 with detach shared storage from the old cluster = and attach it to the new cluster? </div><div>- If yes what will happen= d with the VM properies? For example mac addresses, limits, etc. Those will= be migrated or not?</div><div><br data-mce-bogus=3D"1"></div><div>Thanks i= n advance,</div><div>Regard</div><div><br data-mce-bogus=3D"1"></div><div><= br data-mce-bogus=3D"1"></div><div>Tibor</div><div data-marker=3D"__SIG_PRE= __"><p style=3D"font-family: 'Times New Roman'; font-size: medium; margin: = 0px;" data-mce-style=3D"font-family: 'Times New Roman'; font-size: medium; = margin: 0px;"><strong><span style=3D"font-size: medium;" data-mce-style=3D"= font-size: medium;"><span style=3D"color: #2d67b0;" data-mce-style=3D"color= : #2d67b0;"><br></span></span></strong></p><p style=3D"font-family: 'Times = New Roman'; font-size: medium; margin: 0px;" data-mce-style=3D"font-family:= 'Times New Roman'; font-size: medium; margin: 0px;"><span style=3D"font-fa= mily: georgia, serif; color: #000080;" data-mce-style=3D"font-family: georg= ia, serif; color: #000080;"><strong><span style=3D"font-size: medium;" data= -mce-style=3D"font-size: medium;"><span></span></span></strong></span></p><= p></p></div></div></body></html> --=_d5a03c9f-d720-4690-b3f7-196f2e084694--

2 3

Help with Power Management network
by ~Stack~ 02 Oct '17

02 Oct '17

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --OnEhT75wdmSi56KelkEfSHNlc4EOmbWar Content-Type: multipart/mixed; boundary="QtNjKESroum0Gs4B16RTXPJnq400V1QDU"; protected-headers="v1" From: ~Stack~ <i.am.stack(a)gmail.com> To: users(a)ovirt.org Message-ID: <42d5325d-217f-5559-ec5a-11a10fbad2ed(a)gmail.com> Subject: Help with Power Management network --QtNjKESroum0Gs4B16RTXPJnq400V1QDU Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Greetings, I hit up the IRC earlier, but only crickets. Guess no one wants to stick around late on a Friday night. :-D I'm an ovirt newb here. I've been going through the docs setting up 4.1 on Scientific Linux 7.4. For the most part everything is going well once I learn how to do it. I'm, however, stuck on power management. I have multiple networks: 192.168.1.x is my BMC/ilo network. The security team wants as few entry points into this as possible and wants as much segregation as possible. 192.168.2.x is my "management" access network. For my other machines on this network this means admin-SSH/rsyslog/SaltStack configuration management/ect. 192.168.3.x is my high speed network where my NFS storage sits and applications that need the bandwidth do their thing. 10.10.86.x is my "public" access All networks are configured on the Host network settings. Mostly confident I got it right...at least each network/IP matches the right interface. ;-) Right now I only have the engine server and one hyper-visor. On either host I can ssh into the command line and run fence_ipmilan -a 192.168.1.x -l USER -p PASS -o status -v -P" it works, all is good. However, when I try to add it in the ovirt interface I get an error. :-/ Edit Host -> Power Management: Address: 192.168.1.14 User Name: root Password: SorryCantTellYou Type: ipmilan Options: <blank> Test Test failed: Failed to run fence status-check on host '192.168.2.14'. No other host was available to serve as proxy for the operation. Yes, same host because I only have one right now. :-) Any help or guidance would be much appreciated. In the meantime I'm going back to the docs to poke at a few other things I need to figure out. :-) Thanks! ~Stack~ --QtNjKESroum0Gs4B16RTXPJnq400V1QDU-- --OnEhT75wdmSi56KelkEfSHNlc4EOmbWar Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJZzqxCAAoJELkej+ysXJPmkn8P/i7sx6DP5aSOejTEvOzq45jc uTYNnoAqniDK/do47z2ojjB0+Oa6czExR7IqyzAzz9+pFEMZlRttxVwQ0XyEj+4t Fw44htR1PhU+YnNQm4fgEo04P7X72qEzdgeMgA/vVVp6chpw0tSG5/bLosrX/yJC NsUF4X0yhnfsCtLZ9Tw78S392OqIQ1iyx12Brmxtip0c97JenMXxXXrxPoUHDFcR T+mqVf7jnC+VxpRj0x5qU+JAOr05oje9coAgbDE6MhWaL6sjClEwhsi5VOU47he9 JcBjKbye4bRHIlzkgpg01Ge0m5fQ4FclJl9wnV4V5vX1Rkuol61wiPQ6SXd/CPy2 PiVsbvX3WloealAupANhaaYG93QPpQsmrw/6Ew/Finlsz6CNfg2VZHbzBGc79QV6 trLMhu+fw7Hsi/lmiU9Rkkmi8OOSgtapMkA283ft1wnBr7gYTyPZwQsp2chO66X5 QZvrRC64nBv9QcVswawWruWSIsETWNNRg7NltEiy8CKBDUsaJ4vJftXzEuHe++ML 2tgOaVRK9nikf6C5OlGPf2TVTVuBRyXGQTVQhGmPVx40499B5sUaen3+dyDHy8QW qLWi6iPiN0YGZkzh/inl/jT4aowQlZEZTfT3KpnH5tyZQ018rcJBQnKFBiTwi5aM /KzRHvKBIvKpjiIREQ7V =kxQZ -----END PGP SIGNATURE----- --OnEhT75wdmSi56KelkEfSHNlc4EOmbWar--

3 6

Re: [ovirt-users] xfs fragmentation problem caused data domain to hang
by Jason Keltz 02 Oct '17

02 Oct '17

This is a multi-part message in MIME format. --------------8888294F2C8EF5B404842160 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 10/02/2017 11:00 AM, Yaniv Kaul wrote: > > > On Mon, Oct 2, 2017 at 5:57 PM, Jason Keltz <jas(a)cse.yorku.ca > <mailto:jas@cse.yorku.ca>> wrote: > > > On 10/02/2017 10:51 AM, Yaniv Kaul wrote: >> >> >> On Mon, Oct 2, 2017 at 5:14 PM, Jason Keltz <jas(a)cse.yorku.ca >> <mailto:jas@cse.yorku.ca>> wrote: >> >> >> On 10/02/2017 01:22 AM, Yaniv Kaul wrote: >>> >>> >>> On Mon, Oct 2, 2017 at 5:11 AM, Jason Keltz >>> <jas(a)cse.yorku.ca <mailto:jas@cse.yorku.ca>> wrote: >>> >>> Hi. >>> >>> For my data domain, I have one NFS server with a large >>> RAID filesystem (9 TB). >>> I'm only using 2 TB of that at the moment. Today, my NFS >>> server hung with >>> the following error: >>> >>> xfs: possible memory allocation deadlock in kmem_alloc >>> >>> >>> Can you share more of the log so we'll see what happened >>> before and after? >>> Y. >>> >>> >>> Here is engine-log from yesterday.. the problem started >>> around 14:29 PM. >>> http://www.eecs.yorku.ca/~jas/ovirt-debug/10012017/engine-log.txt >>> <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt> >>> >>> Here is the vdsm log on one of the virtualization hosts, >>> virt01: >>> http://www.eecs.yorku.ca/~jas/ovirt-debug/10012017/vdsm.log.2 >>> <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2> >>> >>> Doing further investigation, I found that the XFS error >>> messages didn't start yesterday. You'll see they >>> started at the very end of the day on September 23. See: >>> >>> http://www.eecs.yorku.ca/~jas/ovirt-debug/messages-20170924 >>> <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924> >>> >>> >>> >>> Our storage guys do NOT think it's an XFS fragmentation >>> issue, but we'll be looking at it. >> Hmmm... almost sorry to hear that because that would be easy >> to "fix"... >> >>> >>> They continued on the 24th, then on the 26th... I think >>> there were a few "hangs" on those times that people were >>> complaining about, but we didn't catch the problem. >>> However, the errors hit big time yesterday at 14:27 >>> PM... see here: >>> >>> http://www.eecs.yorku.ca/~jas/ovirt-debug/messages-20171001 >>> <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001> >>> >>> If you want any other logs, I'm happy to provide them. I >>> just don't know exactly what to provide. >>> >>> Do you know if I can run the XFS defrag command live? >>> Rather than on a disk by disk, I'd rather just do it on >>> the whole filesystem. There really aren't that many >>> files since it's just ovirt disk images. However, I >>> don't understand the implications to running VMs. I >>> wouldn't want to do anything to create more downtime. >>> >>> >>> Should be enough to copy the disks to make them less fragmented. >> Yes, but this requires downtime.. but there's plenty of >> additional storage, so this would fix things well. >> > > Live storage migration could be used. > Y. > > > >> >> I had upgraded the engine server + 4 virtualization hosts >> from 4.1.1 to current on September 20 along with upgrading >> them from CentOS 7.3 to CentOS 7.4. virtfs, the NFS file >> server, was running CentOS 7.3 and kernel >> vmlinuz-3.10.0-514.16.1.el7.x86_64. Only yesterday, did I >> upgrade it to CentOS 7.4 and hence kernel >> vmlinuz-3.10.0-693.2.2.el7.x86_64. >> >> I believe the problem is fully XFS related, and not ovirt at >> all. Although, I must admit, ovirt didn't help either. When >> I rebooted the file server, the iso and export domains were >> immediately active, but the data domain took quite a long >> time. I kept trying to activate it, and it couldn't do it. >> I couldn't make a host an SPM. I found that the data domain >> directory on the virtualization host was a "stale NFS file >> handle". I rebooted one of the virtualization hosts (virt1), >> and tried to make it the SPM. Again, it wouldn't work. >> Finally, I ended up turning everything into maintenance mode, >> then activating just it, and I was able to make it the SPM. >> I was then able to bring everything up. I would have >> expected ovirt to handle the problem a little more >> gracefully, and give me more information because I was >> sweating thinking I had to restore all the VMs! >> >> >> Stale NFS is on our todo list to handle. Quite challenging. > Thanks.. > >> >> I didn't think when I chose XFS as the filesystem for my >> virtualization NFS server that I would have to defragment the >> filesystem manually. This is like the old days of running >> Norton SpeedDisk to defrag my 386... >> >> >> We are still not convinced it's an issue - but we'll look into it >> (and perhaps ask for more stats and data). > Thanks! > > >> Y. >> >> >> Thanks for any help you can provide... >> >> Jason. >> >> >>> >>> All 4 virtualization hosts of course had problems since >>> there was no >>> longer any storage. >>> >>> In the end, it seems like the problem is related to XFS >>> fragmentation... >>> >>> I read this great blog here: >>> >>> https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadl… >>> <https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadl…> >>> >>> In short, I tried this: >>> >>> # xfs_db -r -c "frag -f" /dev/sdb1 >>> actual 4314253, ideal 43107, fragmentation factor 99.00% >>> >>> Apparently the fragmentation factor doesn't mean much, >>> but the fact that >>> "actual" number of extents is considerably higher than >>> "ideal" extents seems that it >>> may be the problem. >>> >>> I saw that many of my virtual disks that are written to >>> a lot have, of course, >>> a lot of extents... >>> >>> For example, on our main web server disk image, there >>> were 247,597 >>> extents alone! I took the web server down, and ran the >>> XFS defrag >>> command on the disk... >>> >>> # xfs_fsr -v 9a634692-1302-471f-a92e-c978b2b67fd0 >>> 9a634692-1302-471f-a92e-c978b2b67fd0 >>> extents before:247597 after:429 DONE >>> 9a634692-1302-471f-a92e-c978b2b67fd0 >>> >>> 247,597 before and 429 after! WOW! >>> >>> Are virtual disks a problem with XFS? Why isn't this >>> memory allocation >>> deadlock issue more prevalent. I do see this article >>> mentioned on many >>> web posts. I don't specifically see any recommendation >>> to *not* use >>> XFS for the data domain though. >>> >>> I was running CentOS 7.3 on the file server, but before >>> rebooting the server, >>> I upgraded to the latest kernel and CentOS 7.4 in the >>> hopes that if there >>> was a kernel issue, that this would solve it. >>> >>> I took a few virtual systems down, and ran the defrag on >>> the disks. However, >>> with over 30 virtual systems, I don't really want to do >>> this individually. >>> I was wondering if I could run xfs_fsr on all the disks >>> LIVE? It says in the >>> manual that you can run it live, but I can't see how >>> this would be good when >>> a system is using that disk, and I don't want to deal >>> with major >>> corruption across the board. Any thoughts? >>> >>> Thanks, >>> >>> Jason. >>> >>> _______________________________________________ >>> Users mailing list >>> Users(a)ovirt.org <mailto:Users@ovirt.org> >>> http://lists.ovirt.org/mailman/listinfo/users >>> <http://lists.ovirt.org/mailman/listinfo/users> >>> >>> >> >> > > --------------8888294F2C8EF5B404842160 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">On 10/02/2017 11:00 AM, Yaniv Kaul wrote:<br> </div> <blockquote type="cite" cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Mon, Oct 2, 2017 at 5:57 PM, Jason Keltz <span dir="ltr"><<a href="mailto:jas@cse.yorku.ca" target="_blank" moz-do-not-send="true">jas(a)cse.yorku.ca</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"><span class=""> <br> <div class="m_3456688468548054330moz-cite-prefix">On 10/02/2017 10:51 AM, Yaniv Kaul wrote:<br> </div> <blockquote type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Mon, Oct 2, 2017 at 5:14 PM, Jason Keltz <span dir="ltr"><<a href="mailto:jas@cse.yorku.ca" target="_blank" moz-do-not-send="true">jas(a)cse.yorku.ca</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"><span> <br> <div class="m_3456688468548054330m_-6564063642909371047moz-cite-prefix">On 10/02/2017 01:22 AM, Yaniv Kaul wrote:<br> </div> <blockquote type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Mon, Oct 2, 2017 at 5:11 AM, Jason Keltz <span dir="ltr"><<a href="mailto:jas@cse.yorku.ca" target="_blank" moz-do-not-send="true">jas(a)cse.yorku.ca</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi.<br> <br> For my data domain, I have one NFS server with a large RAID filesystem (9 TB).<br> I'm only using 2 TB of that at the moment. Today, my NFS server hung with<br> the following error:<br> <br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> xfs: possible memory allocation deadlock in kmem_alloc<br> </blockquote> </blockquote> <div><br> </div> <div>Can you share more of the log so we'll see what happened before and after?</div> <div>Y.</div> </div> </div> </div> </blockquote> </span><span class=""> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> <br> Here is engine-log from yesterday.. the problem started around 14:29 PM.<br> <a class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext" href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt" target="_blank" moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/engine-lo<wbr>g.txt</a><br> <br> Here is the vdsm log on one of the virtualization hosts, virt01:<br> <a class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext" href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2" target="_blank" moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/vdsm.log.<wbr>2</a><br> <br> Doing further investigation, I found that the XFS error messages didn't start yesterday. You'll see they started at the very end of the day on September 23. See:<br> <br> <a class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext" href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924" target="_blank" moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20170924</a> <br> </div> </blockquote> <div><br> </div> <div>Our storage guys do NOT think it's an XFS fragmentation issue, but we'll be looking at it.</div> <div> </div> </div> </div> </div> </blockquote> </span> Hmmm... almost sorry to hear that because that would be easy to "fix"... <br> <span class=""> <br> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> <br> They continued on the 24th, then on the 26th... I think there were a few "hangs" on those times that people were complaining about, but we didn't catch the problem. However, the errors hit big time yesterday at 14:27 PM... see here:<br> <br> <a class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext" href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001" target="_blank" moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20171001</a><br> <br> If you want any other logs, I'm happy to provide them. I just don't know exactly what to provide.<br> <br> Do you know if I can run the XFS defrag command live? Rather than on a disk by disk, I'd rather just do it on the whole filesystem. There really aren't that many files since it's just ovirt disk images. However, I don't understand the implications to running VMs. I wouldn't want to do anything to create more downtime.<br> </div> </blockquote> <div><br> </div> <div>Should be enough to copy the disks to make them less fragmented.</div> <div> </div> </div> </div> </div> </blockquote> </span> Yes, but this requires downtime.. but there's plenty of additional storage, so this would fix things well.</div> </blockquote> </div> </div> </div> </blockquote> </span></div> </blockquote> <div><br> </div> <div>Live storage migration could be used.</div> <div>Y.</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"><span class=""><br> <br> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> <br> I had upgraded the engine server + 4 virtualization hosts from 4.1.1 to current on September 20 along with upgrading them from CentOS 7.3 to CentOS 7.4. virtfs, the NFS file server, was running CentOS 7.3 and kernel vmlinuz-3.10.0-514.16.1.el7.x8<wbr>6_64. Only yesterday, did I upgrade it to CentOS 7.4 and hence kernel vmlinuz-3.10.0-693.2.2.el7.x86<wbr>_64.<br> <br> I believe the problem is fully XFS related, and not ovirt at all. Although, I must admit, ovirt didn't help either. When I rebooted the file server, the iso and export domains were immediately active, but the data domain took quite a long time. I kept trying to activate it, and it couldn't do it. I couldn't make a host an SPM. I found that the data domain directory on the virtualization host was a "stale NFS file handle". I rebooted one of the virtualization hosts (virt1), and tried to make it the SPM. Again, it wouldn't work. Finally, I ended up turning everything into maintenance mode, then activating just it, and I was able to make it the SPM. I was then able to bring everything up. I would have expected ovirt to handle the problem a little more gracefully, and give me more information because I was sweating thinking I had to restore all the VMs!<br> </div> </blockquote> <div><br> </div> <div>Stale NFS is on our todo list to handle. Quite challenging.</div> <div> </div> </div> </div> </div> </blockquote> </span> Thanks..<span class=""><br> <br> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> <br> I didn't think when I chose XFS as the filesystem for my virtualization NFS server that I would have to defragment the filesystem manually. This is like the old days of running Norton SpeedDisk to defrag my 386...<br> </div> </blockquote> <div><br> </div> <div>We are still not convinced it's an issue - but we'll look into it (and perhaps ask for more stats and data).</div> </div> </div> </div> </blockquote> </span> Thanks! <div> <div class="h5"><br> <br> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div>Y.</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> <br> Thanks for any help you can provide...<span class="m_3456688468548054330HOEnZb"><font color="#888888"><br> <br> Jason.</font></span> <div> <div class="m_3456688468548054330h5"><br> <br> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> </blockquote> <br> All 4 virtualization hosts of course had problems since there was no<br> longer any storage.<br> <br> In the end, it seems like the problem is related to XFS fragmentation...<br> <br> I read this great blog here:<br> <br> <a href="https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadl…" rel="noreferrer" target="_blank" moz-do-not-send="true">https://blog.codecentric.de/en<wbr>/2017/04/xfs-possible-memory-a<wbr>llocation-deadlock-kmem_alloc/</a><br> <br> In short, I tried this:<br> <br> # xfs_db -r -c "frag -f" /dev/sdb1<br> actual 4314253, ideal 43107, fragmentation factor 99.00%<br> <br> Apparently the fragmentation factor doesn't mean much, but the fact that<br> "actual" number of extents is considerably higher than "ideal" extents seems that it<br> may be the problem.<br> <br> I saw that many of my virtual disks that are written to a lot have, of course,<br> a lot of extents...<br> <br> For example, on our main web server disk image, there were 247,597<br> extents alone! I took the web server down, and ran the XFS defrag<br> command on the disk...<br> <br> # xfs_fsr -v 9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br> 9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br> extents before:247597 after:429 DONE 9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br> <br> 247,597 before and 429 after! WOW!<br> <br> Are virtual disks a problem with XFS? Why isn't this memory allocation<br> deadlock issue more prevalent. I do see this article mentioned on many<br> web posts. I don't specifically see any recommendation to *not* use<br> XFS for the data domain though.<br> <br> I was running CentOS 7.3 on the file server, but before rebooting the server,<br> I upgraded to the latest kernel and CentOS 7.4 in the hopes that if there<br> was a kernel issue, that this would solve it.<br> <br> I took a few virtual systems down, and ran the defrag on the disks. However,<br> with over 30 virtual systems, I don't really want to do this individually.<br> I was wondering if I could run xfs_fsr on all the disks LIVE? It says in the<br> manual that you can run it live, but I can't see how this would be good when<br> a system is using that disk, and I don't want to deal with major<br> corruption across the board. Any thoughts?<br> <br> Thanks,<br> <br> Jason.<br> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" target="_blank" moz-do-not-send="true">Users(a)ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> </blockquote> </div> <br> </div> </div> </blockquote> <br> </div> </div> </div> </blockquote> </div> <br> </div> </div> </blockquote> <br> </div> </div> </div> </blockquote> </div> <br> </div> </div> </blockquote> <br> </body> </html> --------------8888294F2C8EF5B404842160--

1 1

libvirt: XML-RPC error : authentication failed: Failed to start SASL
by Ozan Uzun 02 Oct '17

02 Oct '17

Hello, Today I updated my ovirt engine v3.5 and all my hosts on one datacenter (centos 7.4 ones). and suddenly my vdsm and vdsm-network services stopped working. btw: My other DC is centos 6 based (managed from the same ovirt engine), everything works just fine there. vdsm fails dependent on vdsm-network service, with lots of RPC error. I tried to configure vdsm-tool configure --force, deleted everything (vdsm-libvirt), reinstalled. Could not make it work. My logs are filled with the follogin Sep 18 23:06:01 node6 python[5340]: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: KEYRING:persistent:0)) Sep 18 23:06:01 node6 vdsm-tool[5340]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -1 (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credent Sep 18 23:06:01 node6 libvirtd[4312]: 2017-09-18 20:06:01.954+0000: 4312: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error ------- journalctl -xe output for vdsm-network Sep 18 23:06:02 node6 vdsm-tool[5340]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -1 (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credent Sep 18 23:06:02 node6 vdsm-tool[5340]: Traceback (most recent call last): Sep 18 23:06:02 node6 vdsm-tool[5340]: File "/usr/bin/vdsm-tool", line 219, in main Sep 18 23:06:02 node6 libvirtd[4312]: 2017-09-18 20:06:02.558+0000: 4312: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Sep 18 23:06:02 node6 vdsm-tool[5340]: return tool_command[cmd]["command"](*args) Sep 18 23:06:02 node6 vdsm-tool[5340]: File "/usr/lib/python2.7/site-packages/vdsm/tool/upgrade_300_networks.py", line 83, in upgrade_networks Sep 18 23:06:02 node6 vdsm-tool[5340]: networks = netinfo.networks() Sep 18 23:06:02 node6 vdsm-tool[5340]: File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 112, in networks Sep 18 23:06:02 node6 vdsm-tool[5340]: conn = libvirtconnection.get() Sep 18 23:06:02 node6 vdsm-tool[5340]: File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 159, in get Sep 18 23:06:02 node6 vdsm-tool[5340]: conn = _open_qemu_connection() Sep 18 23:06:02 node6 vdsm-tool[5340]: File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 95, in _open_qemu_connection Sep 18 23:06:02 node6 vdsm-tool[5340]: return utils.retry(libvirtOpen, timeout=10, sleep=0.2) Sep 18 23:06:02 node6 vdsm-tool[5340]: File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1108, in retry Sep 18 23:06:02 node6 vdsm-tool[5340]: return func() Sep 18 23:06:02 node6 vdsm-tool[5340]: File "/usr/lib64/python2.7/site-packages/libvirt.py", line 105, in openAuth Sep 18 23:06:02 node6 vdsm-tool[5340]: if ret is None:raise libvirtError('virConnectOpenAuth() failed') Sep 18 23:06:02 node6 vdsm-tool[5340]: libvirtError: authentication failed: Failed to start SASL negotiation: -1 (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials availa Sep 18 23:06:02 node6 systemd[1]: vdsm-network.service: control process exited, code=exited status=1 Sep 18 23:06:02 node6 systemd[1]: Failed to start Virtual Desktop Server Manager network restoration. ----- libvirt is running but throws some errors. [root@node6 ~]# systemctl status libvirtd ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/libvirtd.service.d └─unlimited-core.conf Active: active (running) since Mon 2017-09-18 23:15:47 +03; 19min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 6125 (libvirtd) CGroup: /system.slice/libvirtd.service └─6125 /usr/sbin/libvirtd --listen Sep 18 23:15:56 node6 libvirtd[6125]: 2017-09-18 20:15:56.195+0000: 6125: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Sep 18 23:15:56 node6 libvirtd[6125]: 2017-09-18 20:15:56.396+0000: 6125: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Sep 18 23:15:56 node6 libvirtd[6125]: 2017-09-18 20:15:56.597+0000: 6125: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error ---------------- [root@node6 ~]# virsh Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # list error: failed to connect to the hypervisor error: authentication failed: Failed to start SASL negotiation: -1 (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: KEYRING:persistent:0))) ================= I do not want to lose all my virtual servers, is there any way to recover them? Currenty everything is down. I am ok to install a new ovirt engine if somehow I can restore my virtual servers. I can also split centos 6 and centos 7 ovirt engine's.

3 6

SPM recovery after disaster
by Alexander Vrublevskiy 02 Oct '17

02 Oct '17

----ALT--hRuoHjf2JJIXx0JHEsfbvIkF0yxhgbcB1506953449 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: base64 CkhlbGxvIENvbW11bml0eSEKUmVjZW50bHkgd2UgaGFkIGEgZGlzYXN0ZXIgd2l0aCBvdXIgb1Zp cnQgNC4xIHRocmVlIG5vZGVzIGNsdXN0ZXIgCndpdGggSEUgYW5kIEdsdXN0ZXJGUyAoUkY9Mykg c3RvcmFnZSBkb21haW4uIFdlJ3ZlIG1vdmVkIG9uZSBub2RlIHRvIAptYWludGVuYW5jZSBhbmQg ZHVyaW5nIGFjdHVhbCBtYWludGVuYW5jZSBvbmUgb2Ygd29ya2luZyBub2RlcyB3aXRoIFNQTSAK cm9sZSB3ZW50IGRvd24uIEl0IHdhcyBoYXJkd2FyZSBmYWlsdXJlIHNvIHdlIGhhZCB0byByZW1v dmUgaXQgZnJvbSB0aGUgCmNsdXN0ZXIuCkFmdGVyIHRpbmtlcmluZyBhcm91bmQgbm93IHdlIGhh dmUgYWxtb3N0IHdvcmtpbmcgY2x1c3RlciB3aXRoIHR3byAKbm9kZXMgYW5kIHdpdGggR2x1c3Rl ckZTIFJGPTIuIEJ1dCB0aGUgcHJvYmxlbSBpcyBvVmlydCBjYW4ndCBmaW5kIFNQTSAKYW5kIHNw YW1pbmcgd2ViIGludGVyZmFjZSBsb2dzIHdpdGggIkhTTUdldEFsbFRhc2tzU3RhdHVzZXNWRFMg ZmFpbGVkOiAKTm90IFNQTSIgZXJyb3IuCkFmdGVyIHNvbWUgdGltZSBvZiBvcGVyYXRpbmcgd2l0 aCBzdGF0ZWQgY29uZmlndXJhdGlvbiB3ZSBsb3N0IGNvbnRlbnRzIG9mIGRvbV9tZCBzb21laG93 LgpMb29rcyBsaWtlIHRoZXNlIHR3byBwcm9ibGVtcyBhcmUgcmVsYXRlZCBhbmQgc2Vjb25kIG9u ZSBpcyBhIGNvbnNlcXVlbmNlIG9mIHRoZSBmaXJzdC4KUGxlYXNlIHN1Z2dlc3QgaG93IHRvIHJl Y292ZXIgU1BNIGFuZCBkb21fbWQuIElzIHRoZXJlIGEgd2F5IHRvIHJlY3JlYXRlIGJvdGg/ClRJ QQpSZWdhcmRzCkFsZXg= ----ALT--hRuoHjf2JJIXx0JHEsfbvIkF0yxhgbcB1506953449 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: base64 CjxIVE1MPjxCT0RZPjxkaXYgY2xhc3M9InBvc3QtdGV4dCIgaXRlbXByb3A9InRleHQiPgoKPHA+ SGVsbG8gQ29tbXVuaXR5ITxiciBkYXRhLW1jZS1ib2d1cz0iMSI+PC9wPjxwPlJlY2VudGx5IHdl IGhhZCBhIGRpc2FzdGVyIHdpdGggb3VyIG9WaXJ0IDQuMSB0aHJlZSBub2RlcyBjbHVzdGVyIAp3 aXRoIEhFIGFuZCBHbHVzdGVyRlMgKFJGPTMpIHN0b3JhZ2UgZG9tYWluLiBXZSd2ZSBtb3ZlZCBv bmUgbm9kZSB0byAKbWFpbnRlbmFuY2UgYW5kIGR1cmluZyBhY3R1YWwgbWFpbnRlbmFuY2Ugb25l IG9mIHdvcmtpbmcgbm9kZXMgd2l0aCBTUE0gCnJvbGUgd2VudCBkb3duLiBJdCB3YXMgaGFyZHdh cmUgZmFpbHVyZSBzbyB3ZSBoYWQgdG8gcmVtb3ZlIGl0IGZyb20gdGhlIApjbHVzdGVyLjwvcD4K CjxwPkFmdGVyIHRpbmtlcmluZyBhcm91bmQgbm93IHdlIGhhdmUgYWxtb3N0IHdvcmtpbmcgY2x1 c3RlciB3aXRoIHR3byAKbm9kZXMgYW5kIHdpdGggR2x1c3RlckZTIFJGPTIuIEJ1dCB0aGUgcHJv YmxlbSBpcyBvVmlydCBjYW4ndCBmaW5kIFNQTSAKYW5kIHNwYW1pbmcgd2ViIGludGVyZmFjZSBs b2dzIHdpdGggIkhTTUdldEFsbFRhc2tzU3RhdHVzZXNWRFMgZmFpbGVkOiAKTm90IFNQTSIgZXJy b3IuPC9wPgoKPHA+QWZ0ZXIgc29tZSB0aW1lIG9mIG9wZXJhdGluZyB3aXRoIHN0YXRlZCBjb25m aWd1cmF0aW9uIHdlIGxvc3QgY29udGVudHMgb2YgZG9tX21kIHNvbWVob3cuPC9wPgoKPHA+TG9v a3MgbGlrZSB0aGVzZSB0d28gcHJvYmxlbXMgYXJlIHJlbGF0ZWQgYW5kIHNlY29uZCBvbmUgaXMg YSBjb25zZXF1ZW5jZSBvZiB0aGUgZmlyc3QuPC9wPgoKPHA+UGxlYXNlIHN1Z2dlc3QgaG93IHRv IHJlY292ZXIgU1BNIGFuZCBkb21fbWQuIElzIHRoZXJlIGEgd2F5IHRvIHJlY3JlYXRlIGJvdGg/ PC9wPgoKPHA+VElBPC9wPgoKPHA+UmVnYXJkczxicj5BbGV4PC9wPgogICAgPC9kaXY+PC9CT0RZ PjwvSFRNTD4K ----ALT--hRuoHjf2JJIXx0JHEsfbvIkF0yxhgbcB1506953449--

1 0

Having issue with external IPA
by Yan Naing Myint 02 Oct '17

02 Oct '17

Hello guys, I'm having problem with adding users from my FreeIPA server to oVirt. 1. Status of ovirt-engine-extension-aaa-ldap-setup is success with RHDS 2. I cannot add IPA users in oVirt webadmin panel 3. In oVirt web admin panel it says "Error while executing action AddUser: Internal Engine Error" What will be the problem or is it a bug? Is there any suggestion of how do it make it work? in the engine.log it says; 2017-10-01 17:30:52,436+06 ERROR [org.ovirt.engine.core.bll.aaa.AddUserCommand] (default task-113) [bf5822eb-39da-49e5-b2ab-9865f71346a3] Transaction rolled-back for command 'org.ovirt.engine.core.bll.aaa.AddUserCommand'. 2017-10-01 17:30:52,459+06 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-113) [bf5822eb-39da-49e5-b2ab-9865f71346a3] EVENT_ID: USER_FAILED_ADD_ADUSER(327), Correlation ID: bf5822eb-39da-49e5-b2ab-9865f71346a3, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to add User 'mgorca' to the system. in cyberwings.local.properties ovirt.engine.extension.name = cyberwings.local ovirt.engine.extension.bindings.method = jbossmodule ovirt.engine.extension.binding.jbossmodule.module = org.ovirt.engine-extensions.aaa.ldap ovirt.engine.extension.binding.jbossmodule.class = org.ovirt.engineextensions.aaa.ldap.AuthzExtension ovirt.engine.extension.provides = org.ovirt.engine.api.extensions.aaa.Authz config.profile.file.1 = ../aaa/cyberwings.local.properties config.globals.baseDN.simple_baseDN = dc=cyberwings,dc=local in cyberwings.local-authn.properties ovirt.engine.extension.name = cyberwings.local-authn ovirt.engine.extension.bindings.method = jbossmodule ovirt.engine.extension.binding.jbossmodule.module = org.ovirt.engine-extensions.aaa.ldap ovirt.engine.extension.binding.jbossmodule.class = org.ovirt.engineextensions.aaa.ldap.AuthnExtension ovirt.engine.extension.provides = org.ovirt.engine.api.extensions.aaa.Authn ovirt.engine.aaa.authn.profile.name = cyberwings.local ovirt.engine.aaa.authn.authz.plugin = cyberwings.local config.profile.file.1 = ../aaa/cyberwings.local.properties config.globals.baseDN.simple_baseDN = dc=cyberwings,dc=local -- Yan Naing Myint CEO Server & Network Engineer Cyber Wings Co., Ltd http://cyberwings.asia 09799950510

3 2

None
by Jason Keltz 02 Oct '17

02 Oct '17

Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP Subject: [ovirt-users] xfs fragmentation problem caused data domain to hang X-BeenThere: users(a)ovirt.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Main users mailing list for oVirt <users.ovirt.org> List-Unsubscribe: <http://lists.ovirt.org/mailman/options/users>, <mailto:users-request@ovirt.org?subject=unsubscribe> List-Archive: <http://lists.ovirt.org/pipermail/users/> List-Post: <mailto:users@ovirt.org> List-Help: <mailto:users-request@ovirt.org?subject=help> List-Subscribe: <http://lists.ovirt.org/mailman/listinfo/users>, <mailto:users-request@ovirt.org?subject=subscribe> X-List-Received-Date: Mon, 02 Oct 2017 02:42:38 -0000 Hi. For my data domain, I have one NFS server with a large RAID filesystem (9 TB). I'm only using 2 TB of that at the moment. Today, my NFS server hung with the following error: > xfs: possible memory allocation deadlock in kmem_alloc All 4 virtualization hosts of course had problems since there was no longer any storage. In the end, it seems like the problem is related to XFS fragmentation... I read this great blog here: https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadl… In short, I tried this: # xfs_db -r -c "frag -f" /dev/sdb1 actual 4314253, ideal 43107, fragmentation factor 99.00% Apparently the fragmentation factor doesn't mean much, but the fact that "actual" number of extents is considerably higher than "ideal" extents seems that it may be the problem. I saw that many of my virtual disks that are written to a lot have, of course, a lot of extents... For example, on our main web server disk image, there were 247,597 extents alone! I took the web server down, and ran the XFS defrag command on the disk... # xfs_fsr -v 9a634692-1302-471f-a92e-c978b2b67fd0 9a634692-1302-471f-a92e-c978b2b67fd0 extents before:247597 after:429 DONE 9a634692-1302-471f-a92e-c978b2b67fd0 247,597 before and 429 after! WOW! Are virtual disks a problem with XFS? Why isn't this memory allocation deadlock issue more prevalent. I do see this article mentioned on many web posts. I don't specifically see any recommendation to *not* use XFS for the data domain though. I was running CentOS 7.3 on the file server, but before rebooting the server, I upgraded to the latest kernel and CentOS 7.4 in the hopes that if there was a kernel issue, that this would solve it. I took a few virtual systems down, and ran the defrag on the disks. However, with over 30 virtual systems, I don't really want to do this individually. I was wondering if I could run xfs_fsr on all the disks LIVE? It says in the manual that you can run it live, but I can't see how this would be good when a system is using that disk, and I don't want to deal with major corruption across the board. Any thoughts? Thanks, Jason.

1 0

iSCSI VLAN host connections - bond or multipath & IPv6
by Ben Bradley 01 Oct '17

01 Oct '17

Hi All I'm looking to add a new host to my oVirt lab installation. I'm going to share out some LVs from a separate box over iSCSI and will hook the new host up to that. I have 2 NICs on the storage host and 2 NICs on the new Ovirt host to dedicate to the iSCSI traffic. I also have 2 separate switches so I'm looking for redundancy here. Both iSCSI host and oVirt host plugged into both switches. If this was non-iSCSI traffic and without oVirt I would create bonded interfaces in active-backup mode and layer the VLANs on top of that. But for iSCSI traffic without oVirt involved I wouldn't bother with a bond and just use multipath. From scanning the oVirt docs it looks like there is an option to have oVirt configure iSCSI multipathing. So what's the best/most-supported option for oVirt? Manually create active-backup bonds so oVirt just sees a single storage link between host and storage? Or leave them as separate interfaces on each side and use oVirt's multipath/bonding? Also I quite like the idea of using IPv6 for the iSCSI VLAN, purely down to the fact I could use link-local addressing and not have to worry about setting up static IPv4 addresses or DHCP. Is IPv6 iSCSI supported by oVirt? Thanks, Ben

4 8