----- Original Message -----
From: "R P Herrold" <herrold(a)owlriver.com>
To: "oVirt infrastructure ML" <infra(a)ovirt.org>
Sent: Wednesday, January 22, 2014 5:35:31 PM
Subject: Infra issu retrospective
for the weekly sync, I see the following matters
I was absent Monday for an appt, and do not see an email with
minutes.
i was absent as well, but i think there was a meeting held,
maybe summary wasn't sent, kiril/dcaro?
Prior week was skipped becuase of member availability issues
as well.
So this is a summary from the list traffic for the last few
days
In no particular order:
- Kimchi asks for jenkins coverage #105
you mixed up 2 requests:
#105 Jenkins server for oVirt Kimchi incubator project
personally, i'm not familiar with Kimchi, but considering our very limited resources
now on ovirt
(both physical resources like servers/storage/etc... and especially human resources which
right now is
mostly dcaro handling multiple failures on infra issues on jenkins).
but if they are willing to pitch in with resources such as hosts and people to support
jenkins failures,
we can consider integrating them into
jenkins.ovirt.org, otherwise i think we can mostly
give support
in knowledge
#107 Enable coverage report during vdsm unit and functional tests,
again, will be handles after most issues will be resolved with infra,
unless someone from the vdsm team power users is ready to take this on.
- ditto standing up an Ubuntu test instance was requested
at first, a minidell was thought to be added, but due to the lack of resources
for running findbugs/other per patch tests it was decided to allocate it to fedora/centos
for now.
we can reinstall on of the rackspace vms for that.
- Disk space issues on lists were hit on a transient basis
Sunday
this is a well known hurting issue, i think we should address that ASAP,
either buy purchasing a storage server running SSD's from softlayer or expanding the
50GB disk we have now
on linode (how much it costs to add a 100-200 GB disk there?)
- I have observed wink outages on gerrit, and lists of less
than an hour's duration
gerrit is a major issue which we suffer almost on a daily basis, not sure if it's from
tlv slow network
or the VM itself needs to migrate to a much more strong infra with high-availability
feature.
- linnode PTR and it turns out A and AAAA record have not
proceded, as the request was being 'sat on'
This is really needed to solve an email filtering issue at
Comcast, and one assumes other ISPs, They also examine this data,
along with _SPF TXT records.
DNS management is weak as responsibility and capability to
solve are not unified here
- the gerrit is sluggish for unknown reasons doing version
control CO's (several reports)
... possible BW limitations on some link paths?
possible post the upgrade to 2.8?
- jenkins got a 'just in case' reboot last week, but no root
cause analysis was performed
i belive i know the reason for the reboot, which was jobs being stuck and running for
hours.
this issue was solved by dcaro for finding root cause on findbugs job running for more
than one hour
due to an option enabled on the job, comparing to older builds results, disabling that
reduced time to 15 min.
Personally I am building a 'knock-off' iscsi and NFS unit,
based on the QNAP doco and git content, for oVirt testing
locally, ... particularly performance timing trials
all in all, we're suffering from a limited infra on jenkins due to slow network
connection i belive to tlv office (mini dells)
and high load on rackspace servers (which we are schedule to migrate from).
decision on migration to softlayer is on halt due to limited budget and consideration on
the optimal layout,
i'll try to bring up a suggestion on the next meeting.
Eyal.
--
--
end
==================================
.-- -... ---.. ... -.- -.--
Copyright (C) 2014 R P Herrold
herrold(a)owlriver.com
My words are not deathless prose,
but they are mine.
_______________________________________________
Infra mailing list
Infra(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra