[JIRA] (OVIRT-1490) Simplify oVirt storage configuration
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1490?page=com.atlassian.jir... ]
Barak Korren updated OVIRT-1490:
--------------------------------
Summary: Simplify oVirt storage configuration (was: Simplify oVirt sotrage configuration)
> Simplify oVirt storage configuration
> ------------------------------------
>
> Key: OVIRT-1490
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1490
> Project: oVirt - virtualization made easy
> Issue Type: Improvement
> Components: storage
> Reporter: Barak Korren
> Assignee: infra
> Priority: Highest
> Labels: infra, storage
>
> Today's outage was a clear reminder that our current storage configuration does not serve us well. We hardly know how to debug it, it seems to not be resistant to the very issues it was supposed to protect against and introduce potential failure scenarios of its own.
> I suggest we implement a new storage layout that meets the following criteria:
> # Ultimate simplicity at the lower level of the stack. More specifically:
> ## The storage severs should be simple NFS or iSCSI servers. No DRBD and no exotic file-systems.
> ## Only simple storage will be presented to oVirt for use as storage domains
> # Separation of resources between critical services - The 'Jenkins" master for e.g. should not share resources with the "resources" server or anything else.The separation should hold true down to the physical spindle level.
> # Duplication of services and use of local storage where possible - this is a longer term effort - but we have some low hanging fruits here like artifactory, where simple DNS/LB-based fail-over between two identical hosts would probably suffice.
> # Complexity only where needed and up the stack. For example we can just have the storage for Jenkins be mirrored at the VM level with fail-over to a backup VM.
--
This message was sent by Atlassian JIRA
(v1000.1092.0#100053)
7 years, 4 months
Build failed in Jenkins: system-sync_mirrors-epel-el6-x86_64 #480
by jenkins@jenkins.phx.ovirt.org
See <http://jenkins.ovirt.org/job/system-sync_mirrors-epel-el6-x86_64/480/disp...>
------------------------------------------
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on mirrors.phx.ovirt.org (mirrors) in workspace <http://jenkins.ovirt.org/job/system-sync_mirrors-epel-el6-x86_64/ws/>
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url http://gerrit.ovirt.org/jenkins.git # timeout=10
Cleaning workspace
> git rev-parse --verify HEAD # timeout=10
Resetting working tree
> git reset --hard # timeout=10
> git clean -fdx # timeout=10
Pruning obsolete local branches
Fetching upstream changes from http://gerrit.ovirt.org/jenkins.git
> git --version # timeout=10
> git fetch --tags --progress http://gerrit.ovirt.org/jenkins.git +refs/changes/13/75913/5:patch --prune
> git rev-parse origin/patch^{commit} # timeout=10
> git rev-parse patch^{commit} # timeout=10
Checking out Revision 4b0fe3e0c9fba26cdbaafe2b29fddd3411225d6f (patch)
> git config core.sparsecheckout # timeout=10
> git checkout -f 4b0fe3e0c9fba26cdbaafe2b29fddd3411225d6f
> git rev-list 4b0fe3e0c9fba26cdbaafe2b29fddd3411225d6f # timeout=10
[system-sync_mirrors-epel-el6-x86_64] $ /bin/bash -xe /tmp/hudson2203138297466724195.sh
+ jenkins/scripts/mirror_mgr.sh resync_yum_mirror epel-el6 x86_64 jenkins/data/mirrors-reposync.conf
Checking if mirror needs a resync
Traceback (most recent call last):
File "/usr/bin/reposync", line 343, in <module>
main()
File "/usr/bin/reposync", line 175, in main
my.doRepoSetup()
File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 681, in doRepoSetup
return self._getRepos(thisrepo, True)
File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 721, in _getRepos
self._repos.doSetup(thisrepo)
File "/usr/lib/python2.7/site-packages/yum/repos.py", line 157, in doSetup
self.retrieveAllMD()
File "/usr/lib/python2.7/site-packages/yum/repos.py", line 96, in retrieveAllMD
downloading = repo._commonRetrieveDataMD_list(mdtypes)
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1609, in _commonRetrieveDataMD_list
os.rename(local, local + '.old.tmp')
OSError: [Errno 2] No such file or directory
Build step 'Execute shell' marked build as failure
7 years, 4 months
Build failed in Jenkins: system-sync_mirrors-centos-ovirt-4.0-el7-x86_64 #483
by jenkins@jenkins.phx.ovirt.org
See <http://jenkins.ovirt.org/job/system-sync_mirrors-centos-ovirt-4.0-el7-x86...>
------------------------------------------
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on mirrors.phx.ovirt.org (mirrors) in workspace <http://jenkins.ovirt.org/job/system-sync_mirrors-centos-ovirt-4.0-el7-x86...>
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url http://gerrit.ovirt.org/jenkins.git # timeout=10
Cleaning workspace
> git rev-parse --verify HEAD # timeout=10
Resetting working tree
> git reset --hard # timeout=10
> git clean -fdx # timeout=10
Pruning obsolete local branches
Fetching upstream changes from http://gerrit.ovirt.org/jenkins.git
> git --version # timeout=10
> git fetch --tags --progress http://gerrit.ovirt.org/jenkins.git +refs/changes/13/75913/5:patch --prune
> git rev-parse origin/patch^{commit} # timeout=10
> git rev-parse patch^{commit} # timeout=10
Checking out Revision 4b0fe3e0c9fba26cdbaafe2b29fddd3411225d6f (patch)
> git config core.sparsecheckout # timeout=10
> git checkout -f 4b0fe3e0c9fba26cdbaafe2b29fddd3411225d6f
> git rev-list 4b0fe3e0c9fba26cdbaafe2b29fddd3411225d6f # timeout=10
[system-sync_mirrors-centos-ovirt-4.0-el7-x86_64] $ /bin/bash -xe /tmp/hudson2303515589631969475.sh
+ jenkins/scripts/mirror_mgr.sh resync_yum_mirror centos-ovirt-4.0-el7 x86_64 jenkins/data/mirrors-reposync.conf
Checking if mirror needs a resync
Traceback (most recent call last):
File "/usr/bin/reposync", line 343, in <module>
main()
File "/usr/bin/reposync", line 175, in main
my.doRepoSetup()
File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 681, in doRepoSetup
return self._getRepos(thisrepo, True)
File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 721, in _getRepos
self._repos.doSetup(thisrepo)
File "/usr/lib/python2.7/site-packages/yum/repos.py", line 157, in doSetup
self.retrieveAllMD()
File "/usr/lib/python2.7/site-packages/yum/repos.py", line 88, in retrieveAllMD
dl = repo._async and repo._commonLoadRepoXML(repo)
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1465, in _commonLoadRepoXML
if self._latestRepoXML(local):
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1443, in _latestRepoXML
repomd = self.metalink_data.repomd
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 916, in <lambda>
metalink_data = property(fget=lambda self: self._getMetalink(),
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 912, in _getMetalink
self._metalink = metalink.MetaLinkRepoMD(self.metalink_filename)
File "/usr/lib/python2.7/site-packages/yum/metalink.py", line 189, in __init__
raise MetaLinkRepoErrorParseFail, "File %s is not XML" % filename
yum.metalink.MetaLinkRepoErrorParseFail: File /home/jenkins/mirrors_cache/fedora-updates-fc24/metalink.xml is not XML
Build step 'Execute shell' marked build as failure
7 years, 4 months
[JIRA] (OVIRT-1487) ML listing broken
by Marc Dequènes (Duck) (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1487?page=com.atlassian.jir... ]
Marc Dequènes (Duck) updated OVIRT-1487:
----------------------------------------
Resolution: Fixed
Status: Done (was: To Do)
So, my bad, I looked at https://lists.phx.ovirt.org/ and not https://lists.ovirt.org/, and as it is not the same domain MM displays differently. Was tired, sorry for the noise :-/
> ML listing broken
> -----------------
>
> Key: OVIRT-1487
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1487
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: Mailing lists (Mailman)
> Reporter: Marc Dequènes (Duck)
> Assignee: infra
>
> Quack,
> After I rebooted for OVIRT-1473 I checked the services and noticed the list of Mls is empty. The server works well, and even the MLs archives and admin panel are all there, only the listing seem affected.
> So there was an upgrade of the package, see changelog:
> https://centos.pkgs.org/6/centos-x86_64/mailman-2.1.12-26.el6.x86_64.rpm....
> here is the issue solved:
> https://bugzilla.redhat.com/show_bug.cgi?id=1363835
> It does not seem to be related, as the rest works.
> So it seems this upgrade is unrelated. Maybe the bug was there since a while, or maybe the reboot uncovered it. I did not find anything obvious. Anyway we need to investigate more.
> \_o<
--
This message was sent by Atlassian JIRA
(v1000.1092.0#100053)
7 years, 4 months
[JIRA] (OVIRT-1491) Change oVirt DC production storage
by eedri (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1491?page=com.atlassian.jir... ]
eedri updated OVIRT-1491:
-------------------------
Resolution: Duplicate
Status: Done (was: To Do)
https://ovirt-jira.atlassian.net/browse/OVIRT-1490
> Change oVirt DC production storage
> -----------------------------------
>
> Key: OVIRT-1491
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1491
> Project: oVirt - virtualization made easy
> Issue Type: Task
> Reporter: eedri
> Assignee: infra
> Priority: Highest
>
> It seems that during the history of the DC, there wasn't one time we really used the DRDB features and it makes our storage configuration more complex and hard to maintain/debug than it brings value.
> Lets think on a new SIMPLE storage setup that will increase the availability of oVirt services and will prevent incidents like before where we have a single point of failure if the storage is down. without multiple services that adds more failure points and complexity to the setup.
> One suggestion was to install 2 new storage servers running ISCSI and connect them independently to oVirt and spread the load over both of them, so if one is down the other will keep running.
> We can consider having a few redundant VMs on both servers if possible like the resources server and maybe others as well.
> Please add more suggestions on this ticket, it should be our number one priority right now.
--
This message was sent by Atlassian JIRA
(v1000.1092.0#100053)
7 years, 4 months