Infra issu retrospective

R P Herrold

22 Jan 2014 22 Jan '14

4:35 p.m.

for the weekly sync, I see the following matters I was absent Monday for an appt, and do not see an email with minutes. Prior week was skipped becuase of member availability issues as well. So this is a summary from the list traffic for the last few days In no particular order: - Kimchi asks for jenkins coverage #105 - ditto standing up an Ubuntu test instance was requested - Disk space issues on lists were hit on a transient basis Sunday - I have observed wink outages on gerrit, and lists of less than an hour's duration - linnode PTR and it turns out A and AAAA record have not proceded, as the request was being 'sat on' This is really needed to solve an email filtering issue at Comcast, and one assumes other ISPs, They also examine this data, along with _SPF TXT records. DNS management is weak as responsibility and capability to solve are not unified here - the gerrit is sluggish for unknown reasons doing version control CO's (several reports) ... possible BW limitations on some link paths? - jenkins got a 'just in case' reboot last week, but no root cause analysis was performed Personally I am building a 'knock-off' iscsi and NFS unit, based on the QNAP doco and git content, for oVirt testing locally, ... particularly performance timing trials -- -- end ================================== .-- -... ---.. ... -.- -.-- Copyright (C) 2014 R P Herrold herrold@owlriver.com My words are not deathless prose, but they are mine.

Show replies by date

Eyal Edri

22 Jan 22 Jan

5:05 p.m.

----- Original Message -----

...

From: "R P Herrold" <herrold@owlriver.com> To: "oVirt infrastructure ML" <infra@ovirt.org> Sent: Wednesday, January 22, 2014 5:35:31 PM Subject: Infra issu retrospective

for the weekly sync, I see the following matters

I was absent Monday for an appt, and do not see an email with minutes.

i was absent as well, but i think there was a meeting held, maybe summary wasn't sent, kiril/dcaro?

...

Prior week was skipped becuase of member availability issues as well. So this is a summary from the list traffic for the last few days

In no particular order: - Kimchi asks for jenkins coverage #105

you mixed up 2 requests: #105 Jenkins server for oVirt Kimchi incubator project personally, i'm not familiar with Kimchi, but considering our very limited resources now on ovirt (both physical resources like servers/storage/etc... and especially human resources which right now is mostly dcaro handling multiple failures on infra issues on jenkins). but if they are willing to pitch in with resources such as hosts and people to support jenkins failures, we can consider integrating them into jenkins.ovirt.org, otherwise i think we can mostly give support in knowledge #107 Enable coverage report during vdsm unit and functional tests, again, will be handles after most issues will be resolved with infra, unless someone from the vdsm team power users is ready to take this on.

...

- ditto standing up an Ubuntu test instance was requested

at first, a minidell was thought to be added, but due to the lack of resources for running findbugs/other per patch tests it was decided to allocate it to fedora/centos for now. we can reinstall on of the rackspace vms for that.

...

- Disk space issues on lists were hit on a transient basis Sunday

this is a well known hurting issue, i think we should address that ASAP, either buy purchasing a storage server running SSD's from softlayer or expanding the 50GB disk we have now on linode (how much it costs to add a 100-200 GB disk there?)

...

- I have observed wink outages on gerrit, and lists of less than an hour's duration

gerrit is a major issue which we suffer almost on a daily basis, not sure if it's from tlv slow network or the VM itself needs to migrate to a much more strong infra with high-availability feature.

...

- linnode PTR and it turns out A and AAAA record have not proceded, as the request was being 'sat on' This is really needed to solve an email filtering issue at Comcast, and one assumes other ISPs, They also examine this data, along with _SPF TXT records.

DNS management is weak as responsibility and capability to solve are not unified here

- the gerrit is sluggish for unknown reasons doing version control CO's (several reports) ... possible BW limitations on some link paths?

possible post the upgrade to 2.8?

...

- jenkins got a 'just in case' reboot last week, but no root cause analysis was performed

i belive i know the reason for the reboot, which was jobs being stuck and running for hours. this issue was solved by dcaro for finding root cause on findbugs job running for more than one hour due to an option enabled on the job, comparing to older builds results, disabling that reduced time to 15 min.

...

Personally I am building a 'knock-off' iscsi and NFS unit, based on the QNAP doco and git content, for oVirt testing locally, ... particularly performance timing trials

all in all, we're suffering from a limited infra on jenkins due to slow network connection i belive to tlv office (mini dells) and high load on rackspace servers (which we are schedule to migrate from). decision on migration to softlayer is on halt due to limited budget and consideration on the optimal layout, i'll try to bring up a suggestion on the next meeting. Eyal.

...

-- -- end ================================== .-- -... ---.. ... -.- -.-- Copyright (C) 2014 R P Herrold herrold@owlriver.com My words are not deathless prose, but they are mine. _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

Eyal Edri

5:14 p.m.

one more pending item: upgrading jenkins to latest LTS + updating jenkins plugins. seems like lots of issues are fixed and we should upgrade. eyal. ----- Original Message -----

...

From: "Eyal Edri" <eedri@redhat.com> To: "R P Herrold" <herrold@owlriver.com>, "Kiril Nesenko" <knesenko@redhat.com>, "David Caro Estevez" <dcaroest@redhat.com> Cc: "oVirt infrastructure ML" <infra@ovirt.org> Sent: Wednesday, January 22, 2014 6:05:29 PM Subject: Re: Infra issu retrospective

----- Original Message -----

...
From: "R P Herrold" <herrold@owlriver.com> To: "oVirt infrastructure ML" <infra@ovirt.org> Sent: Wednesday, January 22, 2014 5:35:31 PM Subject: Infra issu retrospective

for the weekly sync, I see the following matters

I was absent Monday for an appt, and do not see an email with minutes.

i was absent as well, but i think there was a meeting held, maybe summary wasn't sent, kiril/dcaro?

...
Prior week was skipped becuase of member availability issues as well. So this is a summary from the list traffic for the last few days

In no particular order: - Kimchi asks for jenkins coverage #105

you mixed up 2 requests: #105 Jenkins server for oVirt Kimchi incubator project personally, i'm not familiar with Kimchi, but considering our very limited resources now on ovirt (both physical resources like servers/storage/etc... and especially human resources which right now is mostly dcaro handling multiple failures on infra issues on jenkins).

but if they are willing to pitch in with resources such as hosts and people to support jenkins failures, we can consider integrating them into jenkins.ovirt.org, otherwise i think we can mostly give support in knowledge

#107 Enable coverage report during vdsm unit and functional tests, again, will be handles after most issues will be resolved with infra, unless someone from the vdsm team power users is ready to take this on.

...
- ditto standing up an Ubuntu test instance was requested

at first, a minidell was thought to be added, but due to the lack of resources for running findbugs/other per patch tests it was decided to allocate it to fedora/centos for now. we can reinstall on of the rackspace vms for that.

...
- Disk space issues on lists were hit on a transient basis Sunday

this is a well known hurting issue, i think we should address that ASAP, either buy purchasing a storage server running SSD's from softlayer or expanding the 50GB disk we have now on linode (how much it costs to add a 100-200 GB disk there?)

...
- I have observed wink outages on gerrit, and lists of less than an hour's duration

gerrit is a major issue which we suffer almost on a daily basis, not sure if it's from tlv slow network or the VM itself needs to migrate to a much more strong infra with high-availability feature.

...
- linnode PTR and it turns out A and AAAA record have not proceded, as the request was being 'sat on' This is really needed to solve an email filtering issue at Comcast, and one assumes other ISPs, They also examine this data, along with _SPF TXT records.

DNS management is weak as responsibility and capability to solve are not unified here

- the gerrit is sluggish for unknown reasons doing version control CO's (several reports) ... possible BW limitations on some link paths?

possible post the upgrade to 2.8?

...
- jenkins got a 'just in case' reboot last week, but no root cause analysis was performed

i belive i know the reason for the reboot, which was jobs being stuck and running for hours. this issue was solved by dcaro for finding root cause on findbugs job running for more than one hour due to an option enabled on the job, comparing to older builds results, disabling that reduced time to 15 min.

...
Personally I am building a 'knock-off' iscsi and NFS unit, based on the QNAP doco and git content, for oVirt testing locally, ... particularly performance timing trials

all in all, we're suffering from a limited infra on jenkins due to slow network connection i belive to tlv office (mini dells) and high load on rackspace servers (which we are schedule to migrate from). decision on migration to softlayer is on halt due to limited budget and consideration on the optimal layout, i'll try to bring up a suggestion on the next meeting.

Eyal.

...
-- -- end ================================== .-- -... ---.. ... -.- -.-- Copyright (C) 2014 R P Herrold herrold@owlriver.com My words are not deathless prose, but they are mine. _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

Karsten Wade

23 Jan 23 Jan

7:46 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/22/2014 07:35 AM, R P Herrold wrote:

...

- Disk space issues on lists were hit on a transient basis Sunday

I would like to get us off the Linode instance ASAP. I don't have a lot time to spare for it, but whatever I need to do as the Linode admin, let me know.

...

- linnode PTR and it turns out A and AAAA record have not proceded, as the request was being 'sat on' This is really needed to solve an email filtering issue at Comcast, and one assumes other ISPs, They also examine this data, along with _SPF TXT records.

Getting some help here right no with the DNS admins (Red Hat IT.) I fixed the reverse lookup, should propagate soon. Working on the AAAA record for IPv6 (which I'm pretty ignorant about.) Sorry that I didn't realize there was something hanging on my participation, thanks to Dave Neary for grabbing me by the ear and walking me through what I can do.

...

DNS management is weak as responsibility and capability to solve are not unified here

If we want to take over being primary/secondary nameservers for *.ovirt.org, I think we can do that. Get things setup and we can get the person handling the registrar details to switch us over. Do we want to handle our own DNS? Seems sane to me ... but I won't be doing the work. :) - - Karsten - -- Karsten 'quaid' Wade .^\ CentOS Engineering Manager http://TheOpenSourceWay.org \ http://community.redhat.com @quaid (identi.ca/twitter/IRC) \v' gpg: AD0E0C41 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLhY50ACgkQ2ZIOBq0ODEFSkACgmxF9A2WtXgeNQTy0lwMqq/FY A5oAnRcdrUFclkTWjhQENMPvFoMIKncZ =MygU -----END PGP SIGNATURE-----

4453

Age (days ago)

4454

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Eyal Edri
Karsten Wade
R P Herrold