
--=-rcFUPN9yATM22D984nr5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Due to CVE on openssl and on kernel, I did upgrade various piece of the infrastructure ( foreman, lists, stats, monitoring ), which implied a few reboots ( due to kernel lagging behind, which is not that great with local root exploit ). As this is friday and I assumed most of the Tel Aviv office was not working, i hope this kept the disruption to a minimum. However, if something is broken, please tell it so we can fix. This also got me thinking. In order to bring a bit more order, what about having a fixed schedule for upgrade ? In my previous position, we were doing that once per month ( except during end of quarter freeze ), with mandatory reboot ( cause if something do not boot, you want to know it when you have a planned outage, not when everyone is running around updating stuff ). Fedora has a rather complex procedure to decide what to upgrade, hilighted on http://infrastructure.fedoraproject.org/infra/docs/massupgrade.txt So we could adopt a schedule ( once per month, unless there is something critical, in which case we do it ASAP, with warning on the list and irc ).=20 The schedule should of course take in account "business need", which is "release schedule of ovirt". So what about "first friday of the month, unless exception" ? And by update, i mean "yum upgrade -y". Cleaning the list of repo on various servers is also IMHO another task to discuss, to make sure the task can be safely executed. ( having something like mcollective/ansible/func is also needed, but that's more a convenience than a requirement at this stage ). --=20 Michael Scherer Open Source and Standards, Sysadmin --=-rcFUPN9yATM22D984nr5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTkaYoAAoJEE89Wa+PrSK9rrcQAI7MICLBqIhmnTJQC8eW2Wnq h6UcClJxH5E0rJNT6LFYRfmOh6+Y8CPASevTsPFTUZFe1Eby6dwVYMeRINieVHRK y5/kgFXlpPVm+W+GdwwyHqNqy/zjZZXaJdOqczUiDerK57yA5IJMMGbvqGDIQH5c sJOBtL8/4D7lPZCrBd9rsw4Td6yY+mrmHjhaQfKojvHXn9ZKwjmFCRvi4eJ4/+04 n7w6nywQHDNzwaNS17wmWQyMkfOGFwdZCYxzi4I6kYtCmbgNF5iaYgt6GLpaZ24W qpNEp8RST1JpknzfVoZ9ASL5cNeP/gqaFX/1rK6AOCMIr23qghaz2Ku6rYhUwL/O jKHp+HOYs+o4xiIbMBKMCj9D4HE08xveOuLxKMkrENrL0V3FrRTZu9kCDxeMUorc cL/lG/oWEsEvJVbx4VCKYssdX2qXK57UaDCk2KT6oeUqnWcvYR9KgYhfDpUhjN2T rOD5uM9VhbD0LgVdom8iBpRVH8yLsawp0WWC/uSte0yxFnHG60IjHwl/nDC/teF8 YE9N5Hy/lAOWI9N43OB6ppAB8SmsPLendk3VG14nHOkmdWHGadw3i9A87VP45bZl NRr8H7miqTDFdp+v6pmrzvzP6L9pCGGYHoSI3pfF9CZDYCwgQ57gA314iyeDHrjK a8bpf5lAIELnWlmD3s2E =1/P5 -----END PGP SIGNATURE----- --=-rcFUPN9yATM22D984nr5--

On Fri, Jun 06, 2014 at 01:29:44PM +0200, Michael Scherer wrote:
Due to CVE on openssl and on kernel, I did upgrade various piece of the infrastructure ( foreman, lists, stats, monitoring ), which implied a few reboots ( due to kernel lagging behind, which is not that great with local root exploit ). As this is friday and I assumed most of the Tel Aviv office was not working, i hope this kept the disruption to a minimum. However, if something is broken, please tell it so we can fix.
Nice work.
This also got me thinking. In order to bring a bit more order, what about having a fixed schedule for upgrade ?
In my previous position, we were doing that once per month ( except during end of quarter freeze ), with mandatory reboot ( cause if something do not boot, you want to know it when you have a planned outage, not when everyone is running around updating stuff ). Fedora has a rather complex procedure to decide what to upgrade, hilighted on http://infrastructure.fedoraproject.org/infra/docs/massupgrade.txt
At my previous employer we had something similar. I also wrote a puppet custom fact reboot_needed which checks if the running kernel matches the default kernel that would be booted.
So we could adopt a schedule ( once per month, unless there is something critical, in which case we do it ASAP, with warning on the list and irc ).
+1
The schedule should of course take in account "business need", which is "release schedule of ovirt".
So what about "first friday of the month, unless exception" ?
We should also make sure that we don't reboot ALL servers at once. So if we have multiple centos 6 jenkins slaves, try to just reboot one at a time. Also would be nice if the slave did proper scheduling in jenkins so no jobs are running.
And by update, i mean "yum upgrade -y". Cleaning the list of repo on various servers is also IMHO another task to discuss, to make sure the task can be safely executed. ( having something like mcollective/ansible/func is also needed, but that's more a convenience than a requirement at this stage ).
We sometimes have pinned versions on jenkins build slaves. That means we should either do a proper yum versionlock or find something else. Note that I'm all in favor of being able to to a blind yum upgrades.

--=-FhjElmgt/jS0gQbeNfRJ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Le vendredi 06 juin 2014 =C3=A0 14:36 +0200, Ewoud Kohl van Wijngaarden a =C3=A9crit :
On Fri, Jun 06, 2014 at 01:29:44PM +0200, Michael Scherer wrote:
Due to CVE on openssl and on kernel, I did upgrade various piece of the infrastructure ( foreman, lists, stats, monitoring ), which implied a few reboots ( due to kernel lagging behind, which is not that great wit= h local root exploit ). As this is friday and I assumed most of the Tel Aviv office was not working, i hope this kept the disruption to a minimum. However, if something is broken, please tell it so we can fix. =20 Nice work.
This also got me thinking. In order to bring a bit more order, what about having a fixed schedule for upgrade ? =20 In my previous position, we were doing that once per month ( except during end of quarter freeze ), with mandatory reboot ( cause if something do not boot, you want to know it when you have a planned outage, not when everyone is running around updating stuff ). Fedora ha= s a rather complex procedure to decide what to upgrade, hilighted on http://infrastructure.fedoraproject.org/infra/docs/massupgrade.txt =20 At my previous employer we had something similar. I also wrote a puppet custom fact reboot_needed which checks if the running kernel matches the default kernel that would be booted.
We also want to restart services that are affected. Hence a reboot would be simpler from a engineering point of view.
So we could adopt a schedule ( once per month, unless there is somethin= g critical, in which case we do it ASAP, with warning on the list and irc ).=20 =20 +1 =20 The schedule should of course take in account "business need", which is "release schedule of ovirt". =20 So what about "first friday of the month, unless exception" ? =20 We should also make sure that we don't reboot ALL servers at once. So if we have multiple centos 6 jenkins slaves, try to just reboot one at a time. Also would be nice if the slave did proper scheduling in jenkins so no jobs are running.
Yep, proper orchestration would be nice. Now, if jenkins is resilient enough ( ie, if it can survive a 5 minutes downtime ), it may not be that business critical to have it running all the time. I am not a jenkins expert, not even a user, so I defer to David for that.=20
And by update, i mean "yum upgrade -y". Cleaning the list of repo on various servers is also IMHO another task to discuss, to make sure the task can be safely executed. ( having something like mcollective/ansible/func is also needed, but that's more a convenience than a requirement at this stage ). =20 We sometimes have pinned versions on jenkins build slaves. That means we should either do a proper yum versionlock or find something else. Note that I'm all in favor of being able to to a blind yum upgrades.
--=20 Michael Scherer Open Source and Standards, Sysadmin --=-FhjElmgt/jS0gQbeNfRJ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTkc70AAoJEE89Wa+PrSK9qrIP/1zYIV1fUHSJAfdLMzIqgbmM vg09H49Ovykb+0wr9Qmp8SEY4P6ITaQr2awamg4JtoLKZTuxSscx6OxWzkRjHLdE c5vy7c7qpvrL89lZBXO5si7mYyuAyHkNo4V0VAnRkPJh8uscqaEXJUTUcXQdcyzR cM9jyjuo9CJovPUrRW6fq8Jn7RjFRmNa7+0Czh0Rg5PVLrfYkESAoPSohXbZzkSk KKhvctH4Aax5YeKgz6vAcGK+xtQD5ReBqhX1kQJD2fEYvgXcfKYrKGuM1Dj6FyDU CHoDW9pslO6WO2SoSbLjh45Hi8gN0FYF1oc9gNUHzFTqVhITl8pU36QCocLo+lK0 0h0WkBa5NwQG1tLEPaZzY7FMKDlQ9XimFNNJJTnWam7BELyy9pZM95cbjEkFFW2G IXPZ6IJKaIe7/VA/R1qE1d+LZtDzaioMgcNOcq31tE+/UVuUuxc+XN+vg/jFVBk9 Kp+EFzLG94MJ6SYn/9t2FimycGShJ+z2mjDDwQTrliXE9fA5U1iqk0hrZIEpNTeV mDTqnqh47ZfUyIQxackS8PJnj0jFuPRHWqidEhhhhrEl0yo+SL5GPIKRzcwR06h0 XiIUcir+xkv8eyw5JGNpRZ672Lp2uCQoNo01Xw3zI5x3lUjls5TTwL4UVPli+ui2 4BCzsssWEHPA2PpFEP3o =kbfA -----END PGP SIGNATURE----- --=-FhjElmgt/jS0gQbeNfRJ--

----- Original Message -----
From: "Michael Scherer" <mscherer@redhat.com> To: infra@ovirt.org Sent: Friday, June 6, 2014 2:29:44 PM Subject: infra security update
Hi,
Due to CVE on openssl and on kernel, I did upgrade various piece of the infrastructure ( foreman, lists, stats, monitoring ), which implied a few reboots ( due to kernel lagging behind, which is not that great with local root exploit ). As this is friday and I assumed most of the Tel Aviv office was not working, i hope this kept the disruption to a minimum. However, if something is broken, please tell it so we can fix.
This also got me thinking. In order to bring a bit more order, what about having a fixed schedule for upgrade ?
In my previous position, we were doing that once per month ( except during end of quarter freeze ), with mandatory reboot ( cause if something do not boot, you want to know it when you have a planned outage, not when everyone is running around updating stuff ). Fedora has a rather complex procedure to decide what to upgrade, hilighted on http://infrastructure.fedoraproject.org/infra/docs/massupgrade.txt
So we could adopt a schedule ( once per month, unless there is something critical, in which case we do it ASAP, with warning on the list and irc ).
The schedule should of course take in account "business need", which is "release schedule of ovirt".
So what about "first friday of the month, unless exception" ?
And by update, i mean "yum upgrade -y". Cleaning the list of repo on various servers is also IMHO another task to discuss, to make sure the task can be safely executed. ( having something like mcollective/ansible/func is also needed, but that's more a convenience than a requirement at this stage ).
we use 'fabric' for these kind of stuff in redhat, so we might be able to use that for oVirt as well. +1 for monthly maintenance window,Friday sounds good, since most of the users are from tlv office. we can keep sunday as optional also, if a critical server should be up on a certain friday. so either tlv office of non-tlv can performan the outages. also, worth adding it to the calendar, as a monthly maintenance outage, where we can update servers like jenkins/gerrit/formean etc... we can use either ovirt cal [1] or open a new infra cal for that. thought, we should map pkg we want to keep latest, and ensure them via puppet, while the maintenance windows will be used for reboots and downtimes. we also should update that info on the wiki once ready [2] [1] https://www.google.com/calendar/ical/ppqtk46u9cglj7l987ruo2l0f8%40group.cale... [2] http://www.ovirt.org/Infra
-- Michael Scherer Open Source and Standards, Sysadmin
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

--=-32WVZRI+BqMRGUDcK7Zk Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Le dimanche 08 juin 2014 =C3=A0 02:55 -0400, Eyal Edri a =C3=A9crit :
=20 ----- Original Message -----
From: "Michael Scherer" <mscherer@redhat.com> To: infra@ovirt.org Sent: Friday, June 6, 2014 2:29:44 PM Subject: infra security update =20 Hi, =20 Due to CVE on openssl and on kernel, I did upgrade various piece of the infrastructure ( foreman, lists, stats, monitoring ), which implied a few reboots ( due to kernel lagging behind, which is not that great wit= h local root exploit ). As this is friday and I assumed most of the Tel Aviv office was not working, i hope this kept the disruption to a minimum. However, if something is broken, please tell it so we can fix. =20 This also got me thinking. In order to bring a bit more order, what about having a fixed schedule for upgrade ? =20 =20 =20 =20 In my previous position, we were doing that once per month ( except during end of quarter freeze ), with mandatory reboot ( cause if something do not boot, you want to know it when you have a planned outage, not when everyone is running around updating stuff ). Fedora ha= s a rather complex procedure to decide what to upgrade, hilighted on http://infrastructure.fedoraproject.org/infra/docs/massupgrade.txt =20 So we could adopt a schedule ( once per month, unless there is somethin= g critical, in which case we do it ASAP, with warning on the list and irc ). =20 =20 =20 =20 The schedule should of course take in account "business need", which is "release schedule of ovirt". =20 So what about "first friday of the month, unless exception" ? =20 And by update, i mean "yum upgrade -y". Cleaning the list of repo on various servers is also IMHO another task to discuss, to make sure the task can be safely executed. ( having something like mcollective/ansible/func is also needed, but that's more a convenience than a requirement at this stage ). =20 we use 'fabric' for these kind of stuff in redhat, so we might be able to= =20 use that for oVirt as well.
wel, IT use taboot/func, and Fedora use ansible ( some part of IT also do look at ansible ). Openshift use mcollective. basically, what we need is a tool to enforce configuration ( which is puppet ), and one to execute remote command with a specific schedule and orchestration.=20 For a simple task, they would likely all work fine, so maybe we should start to list what we want, and then see which one fullfill the need and some constraints. based on the issue I faced in the past, and my understanding of the sitaution, I would propose the following list of criteria: - be packaged for Fedora/Centos ( agent and server ) - packaged in epel/main repo (so we can be sure there is a proper update policy) - optionally, for Debian/Ubuntu (cf the need of having jenkins builder on others platform) - the simple, the better (I am not fond of having to deploy amqp for mcollective) - know by at least a few people in the current team (the more, the better) - able to orchestrate, ie, run tasks on X server on after the other - still maintained (ie, unlike func) - not moving too fast I guess we for now want to have it run the tasks of upgrades, but in the event of needing more ( regenerate ssl certificates, restart some specific service ), I guess we may need more. We should also take a look at the infra deployment, ie do we have stuff over ssh ( ansible, fabric ), which mean giving a lot of access (basically, root), with likely a need for using something like a restricted user and a bastion. Or there is mcollective, salt, func, where you have a 2nd communication channel and fine grained ACL
+1 for monthly maintenance window,Friday sounds good, since most of the u= sers are from tlv office. we can keep sunday as optional also, if a critical server should be up on= a certain friday. so either tlv office of non-tlv can performan the outages. =20 also, worth adding it to the calendar, as a monthly maintenance outage, where we can update servers like jenkins/gerrit/formean etc... we can use either ovirt cal [1] or open a new infra cal for that.=20
thought, we should map pkg we want to keep latest, and ensure them via pu=
Also, what happen if we forget. And how do we make sure that more than 1 people can do the job. ppet,
while the maintenance windows will be used for reboots and downtimes. =20 we also should update that info on the wiki once ready [2] =20 [1] https://www.google.com/calendar/ical/ppqtk46u9cglj7l987ruo2l0f8%40gro= up.calendar.google.com/public/basic.ics [2] http://www.ovirt.org/Infra
/me still have to enable https on the wiki --=20 Michael Scherer Open Source and Standards, Sysadmin --=-32WVZRI+BqMRGUDcK7Zk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTlaicAAoJEE89Wa+PrSK9lzAQAJBBaPD3q7dPfdTEhRkA5zz5 rpSdGl5bvhmqQkXSHFALuXUohVLkB6NJELNKYD9L/7WmngxJ+M1uuCtMMlnncOSa jsuvt7SyqhRg7uCAKyqjatch/8mFpFNwixU5WoBVndiAS1MUWazu5pctSx8Z0fCY aJwa2s4RW5GIBUSzZKagWoXsKKhkriqGSAPSi3L4ukGkQmPehvf/vCCPoHBKV+Ea +kpV6wCY/k/waOzJLG2NNIKDqk21CtZsbCTeI9LK+5OIGg55T0aO/dUHH4pxDTL7 gkJnL+SSRNVsOXME6KSTsujG5Ki+0smm3dPRBgHrdcrEhAMHJuGf9klfosgq/0FG n9pqfVcyTxrC2cs86zcNP8/6MngsBbYYCz2ZlD9+ztOvAPVPkOXi7+3730wH/z5g jnw/+9L/q869tdXuQwiCwhb3e4qykyeQ+4uMBXBnwgSx24I8LTWQhsW2vdVeAFMO xylaUaR1Pg7Y+3fYUGLQMl9Ykz+3CtYDhTAj+R61V7s64JIXSzilfdVQBGX1oYGv 5SD0yluiYp5iCuuVRDJ2o8Hogqjr37zHglsrSUx7TzfHQVNMfeHJ/ZtIfnIBvaoX DOEp2IkpGNBNkgd1uuYkWOGmaCNwJCmB2WiKZ1UOk1DkrYa7nI4UXR45CW0rRYxL Rs/bAXYUq0BRyLQKFSoP =zuJi -----END PGP SIGNATURE----- --=-32WVZRI+BqMRGUDcK7Zk--
participants (3)
-
Ewoud Kohl van Wijngaarden
-
Eyal Edri
-
Michael Scherer