Datacenter unresponsive: recovery procedure?
by Andrea Ghelardi
(sorry: resending as I wasn’t part of the list, yet)
hi,
this is my first post so hallo all and thank you for reading.
I have an issue with my production Ovirt environment (3.5.1.1-1.el6).
My system consists of several datancers.
2 of them are connected to an iSCSI SAN and they were working fine.
Until the moment I had the bad idea of deleting a SAN volume from the SAN
manager before deleting the associated storage on Ovirt. From that moment,
the DC where this storage was mounted became not responsive: it cannot
attach the master storage (or any other).
I tried to
1) manually destroy the offending storage (select -> destroy) but still
cannot recover the situation.
2) right click on master storage and activate it
3) re-initialize the datacenter using a NFS storage from the working sister
DC.
All Hosts are still running even though their status is "unknown".
All VM are still running even though their status is "not responding".
I half resolved the issue by manually restarting the host where the
datastore was originally mounted. This cleared the orphaned multipath.
However, the SPM does not come up still.
This is an extract of the log
*2015-04-16 03:51:48,069 WARN
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-14) [61a44b19] could not stop spm of pool
00000002-0002-0002-0002-00000000009c on vds
89254f23-8748-402a-afc9-08438dca0975 - reason:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues*
*2015-04-16 03:51:48,072 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-14) [61a44b19] FINISH, SpmStopVDSCommand,
log id: 4354cf46*
*2015-04-16 03:51:48,072 WARN
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-14) [61a44b19] spm stop on spm failed,
stopping spm selection!*
*2015-04-16 03:51:58,223 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] hostFromVds::selectedVds -
Brachetto, spmStatus Free, storage pool IRDC-INTEL*
*2015-04-16 03:51:58,225 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM Init: could not find
reported vds or not up - pool:IRDC-INTEL vds_spm_id: 3*
*2015-04-16 03:51:58,239 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM selection - vds seems as
spm sovana*
*2015-04-16 03:51:58,252 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] START,
SpmStopVDSCommand(HostName = sovana, HostId =
89254f23-8748-402a-afc9-08438dca0975, storagePoolId =
00000002-0002-0002-0002-00000000009c), log id: 63a17687*
storagePoolId = 00000002-0002-0002-0002-00000000009c is (was) hertz-dstore2
which does not exists anymore on SAN adn ovirt
hostid 89254f23-8748-402a-afc9-08438dca0975 is sovana server (current SPM)
I’m thinking about
*Put the hosted engine host into Maintenance*
*Shutdown Ovirt Manager*
*Rebooted SPM server*
*Restarted Ovirt Manager*
*Took hosted engine host out of Maintenance*
any help or clue is highly welcomed with cheers and beers
thank you!
Andrea
9 years, 7 months
oVirt Weekly Sync - 2015-04-15
by Sandro Bonazzola
=========================
#ovirt: oVirt Weekly Sync
=========================
Meeting started by sbonazzo at 14:01:45 UTC. The full logs are available
at http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-15-14.01.log.html
.
Meeting ended Wed Apr 15 14:59:06 2015 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
Minutes: http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-15-14.01.html
Minutes (text): http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-15-14.01.txt
Log: http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-15-14.01.log.html
Meeting summary
---------------
* Agenda and Roll Call (sbonazzo, 14:01:45)
* infra_ownr|dcaro: Security incident status (sbonazzo, 14:01:45)
* bkp: conferences and workshops (sbonazzo, 14:01:45)
* ydary: Sync future and format change (sbonazzo, 14:01:45)
* sbonazzo: Outages in Jenkins due to no space left on master
(sbonazzo, 14:01:46)
* sbonazzo: Blockers review for 3.5.2 GA (sbonazzo, 14:01:46)
* sbonazzo: Packages review for 3.5.2 GA (sbonazzo, 14:01:48)
* sbonazzo: Next week is feature submission deadline: plans for
reviewing submitted features (sbonazzo, 14:01:48)
* sbonazzo: Support for JDK 1.8 / Wildfly progress updates /
Contingency plan of building Java7 rpms for Fedora 22. (sbonazzo,
14:01:50)
* SvenKieske: review Bug 1156115 - [RFE] don't use dd to copy sparse
images in order to save time and bandwith from exportdomain
(sbonazzo, 14:01:50)
* Security incident status (sbonazzo, 14:03:46)
* We did got hacked, a couple of files where uploaded but no
privilege escalation. (sbonazzo, 14:05:18)
* oVirt infra solved the issue though they are still scanning for more
security and planning to highly improve the security asap
(sbonazzo, 14:05:18)
* ACTION: oVirt infra to highly improve the security asap (sbonazzo,
14:05:18)
* conferences and workshops (sbonazzo, 14:07:41)
* the most important thing is yet-another request for help at Grazer
Linux Days in Graz, Austria this weekend (sbonazzo, 14:07:41)
* Rene Koch is seeking backup help manning the booth at that event
(sbonazzo, 14:07:41)
* LINK: http://lists.ovirt.org/pipermail/users/2015-April/032251.html
(sbonazzo, 14:07:41)
* All other event and conference news is the same as two weeks ago.
(sbonazzo, 14:07:41)
* LINK:
http://plain.resources.ovirt.org/meetings/ovirt/2015/ovirt.2015-04-01-14....
(sbonazzo, 14:07:42)
* Sync future and format change (sbonazzo, 14:09:25)
* bkp sent some thoughts to mailing list (sbonazzo, 14:09:25)
* LINK: http://lists.ovirt.org/pipermail/users/2015-April/032389.html
(sbonazzo, 14:09:25)
* LINK: previous meeting logs:
http://plain.resources.ovirt.org/meetings/ovirt/2015/ovirt.2015-04-08-14....
(sbonazzo, 14:10:21)
* AGREED: on a series of "office hours," where a rotating set of oVirt
team members will commit to being on IRC to answer questions and be
generally available. This could be done by one or two people, a few
times a week/once a week, and not even a whole hour. Once a schedule
is figured out, it would be posted and people would be openly
encouraged to participate. This would be an open session on IRC...
support questions, c (sbonazzo, 14:17:34)
* AGREED: on carrying over to the mailing list topics that need to be
covered there because the volounteer doesn't have the answers
(sbonazzo, 14:18:35)
* Outages in Jenkins due to no space left on master (sbonazzo,
14:20:11)
* we had several issues in last few days with jenkins master going out
of space (sbonazzo, 14:24:36)
* solution is to limit job history to ~100 tops (sbonazzo, 14:24:36)
* ACTION: jenkins job owners to review their jobs and fix them if
needed (sbonazzo, 14:24:36)
* Blockers review for 3.5.2 GA (sbonazzo, 14:25:31)
* LINK: oVirt 3.5.z status:
http://lists.ovirt.org/pipermail/devel/2015-April/010261.html
(sbonazzo, 14:25:51)
* Storage updates (sbonazzo, 14:32:01)
* Ceph integration - We gave QE a scratch build and it underwent first
pre-integration, we got feedbacks from them, some are irrelevant
anymore since more work has been done after the scratch build given
to them and some was relevant and we are working on it (sbonazzo,
14:32:14)
* SPM removal - nothing new from last week, work is still in progress
(sbonazzo, 14:32:22)
* Resize LUN - We had our first meeting with QE, no feedbacks from
them yet (sbonazzo, 14:32:33)
* the rest of the RFEs there's nothing to report for now, work is in
progress (sbonazzo, 14:32:41)
* Packages review for 3.5.2 GA (sbonazzo, 14:34:55)
* LINK: candidates to 3.5.2 GA:
http://lists.ovirt.org/pipermail/devel/2015-April/010259.html
(sbonazzo, 14:35:35)
* qemu-kvm-rhev-1.5.3-86 is best candidate for 3.5.2 GA due to
https://bugzilla.redhat.com/show_bug.cgi?id=1209034 related to
qemu-kvm-ev-2.1.2-23 (sbonazzo, 14:37:24)
* ACTION: maintainers please review the package list and ensure it's
all you expect to be released there (sbonazzo, 14:37:53)
* Support for JDK 1.8 / Wildfly progress updates / Contingency plan of
building Java7 rpms for Fedora 22. (sbonazzo, 14:38:28)
* infra update: Proceeding well with 3.6 items, fence refactoring is
almost done, VDSM events work is in progress, host upgrade manager
is progressing well both in the host and engine side, and more. We
will also start looking at Jboss-widlfly support in a few days, and
see how we can proceed with supporting it. (sbonazzo, 14:38:51)
* ACTION: infra to follow up on Jboss-widlfly support for 3.6
(sbonazzo, 14:39:48)
* ACTION: integration to check if contingency plan of using jdk7 may
work (sbonazzo, 14:40:24)
* SvenKieske: review Bug 1156115 - [RFE] don't use dd to copy sparse
images in order to save time and bandwith from exportdomain
(sbonazzo, 14:40:47)
* Next week is feature submission deadline: plans for reviewing
submitted features (sbonazzo, 14:41:45)
* LINK: features spreadsheet: http://goo.gl/9X3G49 (sbonazzo,
14:47:46)
* features without a feature page will take the bugzilla page as
feature page to be reviewed (sbonazzo, 14:48:35)
* Usually such features wouldn't need e.g. a contingency plan and
test cases will be relatively obvious. (sbonazzo, 14:48:52)
* ACTION: features owner to ensure all features have at least the
bugzilla entry to be reviewed (sbonazzo, 14:49:26)
* submission deadline is April 22th (sbonazzo, 14:50:07)
* ACTION: follow up on how to review the proposed features (irc,
hangout, ...) (sbonazzo, 14:54:15)
* Other topics (sbonazzo, 14:54:25)
Meeting ended at 14:59:06 UTC.
Action Items
------------
* oVirt infra to highly improve the security asap
* jenkins job owners to review their jobs and fix them if needed
* maintainers please review the package list and ensure it's all you
expect to be released there
* infra to follow up on Jboss-widlfly support for 3.6
* integration to check if contingency plan of using jdk7 may work
* features owner to ensure all features have at least the bugzilla entry
to be reviewed
* follow up on how to review the proposed features (irc, hangout, ...)
Action Items, by person
-----------------------
* **UNASSIGNED**
* oVirt infra to highly improve the security asap
* jenkins job owners to review their jobs and fix them if needed
* maintainers please review the package list and ensure it's all you
expect to be released there
* infra to follow up on Jboss-widlfly support for 3.6
* integration to check if contingency plan of using jdk7 may work
* features owner to ensure all features have at least the bugzilla
entry to be reviewed
* follow up on how to review the proposed features (irc, hangout, ...)
People Present (lines said)
---------------------------
* sbonazzo (130)
* lvernia (31)
* ovirtbot (17)
* tal (11)
* SvenKieske (7)
* infra_ownr|dcaro (6)
* fromani (3)
* alitke (2)
* Mossel (1)
* danken (1)
* apuimedo (1)
Generated by `MeetBot`_ 0.1.4
.. _`MeetBot`: http://wiki.debian.org/MeetBot
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
9 years, 7 months
Is ovirt on HA-iscsi???
by Arman Khalatyan
I am trying to use HA-iscsi target as a ovirt storage.
The smooth fail-over is working nicely without ovirt setup.
my config is following:
HA-iscsi: hosta<-drbd->hostB->floatingIP->iscsi(iet)
ovirtengine(host)
cloud01->iscsi storage from floatingIP
Expected failover:
reboot hostB, hostA takeover.
cloud01 should not feel any error.
What I see actually the VMs are in the paused state.
It will be nice if someone can provide a reference on how ovirt/vdsm
is acting on iscsi failure?
Thanks,
Arman.
9 years, 7 months
Agenda for today oVirt Weekly Sync
by Sandro Bonazzola
Hi,
I would like to add to the oVirt Weekly Sync agenda the following topics
- Packages review for 3.5.2 GA
- Blockers review for 3.5.2 GA
- Outages in Jenkins due to no space left on master
- Next week is feature submission deadline: plans for reviewing submitted features
- Support for JDK 1.8 / Wildfly progress updates / Contingency plan of building Java7 rpms for Fedora 22.
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
9 years, 7 months
ovirt-shell for ubuntu
by Nathanaël Blanchet
Hi all,
Does an ovirt-shell-cli equivalent exist for debian/ubuntu client?
knowing it is written in python, it must be quite easy to port it...
9 years, 7 months
importing a single qcow2/raw disk
by Nathanaël Blanchet
Hi all,
Is it possible to simply import a single qcow/raw into storage domain
disk without creating a dedicated qemu-kvm with it and then importing
the full vm with virt-v2v?
Thank you.
9 years, 7 months
many dd ioctl messages on the SPM host.
by Arman Khalatyan
Hi,
I can see many
dd: sending ioctl 80306d02 to a partition!
messages on my CentOS 7.1 SPM host.
I the top I can see that lvm process is using 100%.
My migration from iscsiA to iscsiB storage is terrible slow.
The lvm process always changing its pid.
/sbin/lvm lvextend --config devices { preferred_names =
["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [
'a|/dev/mapper/149455400000000007265735f69534353494c6f676963616c|/dev/mapper/360014056427aebd90ef41b1b5df2cd79|',
'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1
wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50
retain_days = 0 } --autobackup n --size 5373952m
ab17ac41-3c43-4a57-902e-a88577d4a6c1/a57c120c-190a-4122-8f71-5d78dd5279c1
Are there some problems with migration?
I am using ovirt 3.5.x with iscsi storage.
***********************************************************
Dr. Arman Khalatyan eScience -SuperComputing
Leibniz-Institut für Astrophysik Potsdam (AIP)
An der Sternwarte 16, 14482 Potsdam, Germany
***********************************************************
9 years, 7 months
[QE][ACTION REQUIRED] oVirt 3.5.2 and 3.5.3 status
by Sandro Bonazzola
Hi,
We have 1 open blocker for 3.5.2[1]:
Bug ID Whiteboard Status Summary
1154399 network ASSIGNED VDSM script reset network configuration on every reboot when based on predefined bond
and 1 dependency not yet fixed:
Bug ID Status Whiteboard Summary
1209486 POST network Networks created by vdsm are not persisted
ACTION: Assignee please provide an ETA for above bugs and check if they must block GA requiring another release candidate before GA.
We still have 1 bugs in MODIFIED and 3 on QA[2]:
MODIFIED ON_QA Total
network 0 1 1
sla 1 0 1
storage 0 1 1
virt 0 1 1
Total 1 3 4
ACTION: Testers: you're welcome to verify bugs currently ON_QA.
ACTION: Assingee and Maintainers to check modified bugs to see if they're fixed in RC4
All remaining bugs not marked as blockers have been moved to 3.5.3.
A release management entry has been added for tracking the schedule of 3.5.3[3]
A bug tracker [4] has been created for 3.5.3 and currently shows no blockers.
We have 26 bugs currently targeted to 3.5.3[5]:
Whiteboard NEW ASSIGNED POST Total
docs 1 0 0 1
external 1 0 0 1
gluster 1 0 0 1
infra 1 1 1 3
network 1 0 1 2
node 4 0 1 5
ppc 0 0 1 1
sla 4 0 0 4
storage 4 0 2 6
ux 1 0 0 1
virt 1 0 0 1
Total 19 1 6 26
ACTION: Maintainers / Assignee: to review the bugs targeted to 3.5.3 ensuring they're correctly targeted.
ACTION: Maintainers: to fill release notes for 3.5.2, the page has been created and updated here [6]
ACTION: Testers: please add yourself to the test page [7]
[1] https://bugzilla.redhat.com/1186161
[2] http://goo.gl/UEVTCf
[3] http://www.ovirt.org/OVirt_3.5.z_Release_Management#oVirt_3.5.3
[4] https://bugzilla.redhat.com/1198142
[5] https://bugzilla.redhat.com/buglist.cgi?quicksearch=product%3Aovirt%20tar...
[6] http://www.ovirt.org/OVirt_3.5.2_Release_Notes
[7] http://www.ovirt.org/Testing/oVirt_3.5.2_Testing
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
9 years, 7 months
Network profile not being populated in cluster
by Pavel Gandalipov
Hi, I have a problem with Vnic Profiles assigning.
In the current release 3.5 after creating not required logical network(and
assigning it to cluster) i can't apply it profile in VM Edit Network
interface menu.
In 3.4 it works without any additional config. Should i change something to
apply another Vnic profile that belongs to not default logical network? I
also don't see "empty network" that was in 3.4.
Regards,
Pavel Gandalipov
9 years, 7 months