On Tue, Feb 5, 2019 at 7:15 PM Milan Zamazal <mzamazal(a)redhat.com> wrote:
Hi, I'm trying to run Vdsm with Python 3, which exposes various
Python 3
incompatibilities in Vdsm code. I'm currently semi-stuck on Vdsm
crashing very shortly after startup with:
Panic: Error initializing IRS
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 95, in
serve_clients
irs = Dispatcher(HSM())
File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
line 46, in __init__
self._exposeFunctions(obj)
File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
line 56, in _exposeFunctions
if hasattr(funcObj, _EXPORTED_ATTRIBUTE) and callable(funcObj):
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 85,
in __getattr__
raise se.StoragePoolNotConnected
vdsm.storage.exception.StoragePoolNotConnected: Storage pool not
connected: ()
This should be fixed first. exposing HSM function should not raise this
error.
The best way would be to write tests for the dispatcher and run the tests
with python 3.
When the dispatcher is ported to python 3 you can try again to start vdsm.
Even if you port entire vdsm to python 3, sanlock does not ship yet
python 3 bindings, so you cannot have any storage except local storage,
iso domain and export domain.
That wouldn't be a problem by itself, errors are expected in Python 3
experiments. However the problem is that this is often the *only*
error
I receive.
I've found out that HSM() call spawns threads, which apparently don't
have chance to report their errors before the panic above kills Vdsm.
For instance, there was a string-byte mismatch in running iscsiadm
scanning commands. Only after I added time.sleep to the end of
HSM.__init__, I got an error + traceback from that and could see what's
wrong. That worries me a bit, because it might mean that if some
initialization external commands are not fast enough, Vdsm may not be
able to start.
The real fix for this issue to eliminate the storage startup code.
Storage should do the minimal initialization needed (e.g. start monitoring
udev events) and wait for engine. When engine ask to connect to storage,
vdsm should the needed scanning and lvm initialization that is done now
during startup.
Then if some storage code fail, only connecting to this kind of storage
would fail and vdsm will not panic.
As temporary solution you can disable the storage refresh thread
started in HSM.__init__, and set HSM.ready to True immediately.
This will have similar effect to the long term fix.
After fixing some problems, I get the panic again, even with the sleep
at the end of HSM.__init__, and again without any other error.
It's
difficult to proceed in such a situation -- any tips what to look at or
to look for are welcome.
You can disable irs in the config, but then you will have no storage.
Maybe this is good enough at this point.
Nir
FWIW, my vdsm.log now ends with
INFO (MainThread) [storage.check] Starting check service (check:91)
DEBUG (check/loop) [root] START thread <Thread(check/loop, started
daemon 140180434425600)> (func=<bound method EventLoop.run_forever of
<EventLoop running=True closed=False at 0x140181096264704>>, args=(),
kwargs={}) (concurrent:193)
INFO (check/loop) [storage.asyncevent] Starting <EventLoop running=True
closed=False at 0x140181096264704> (asyncevent:125)
DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
(misc:386)
DEBUG (hsm/init) [storage.SamplingMethod] Trying to enter sampling
method (vdsm.storage.hba.<function rescan at 0x7f7e7a597e18>) (misc:376)
DEBUG (hsm/init) [storage.SamplingMethod] Got in to sampling method
(misc:379)
DEBUG (hsm/init) [storage.HBA] Starting scan (hba:58)
DEBUG (hsm/init) [storage.HBA] Scan finished (hba:64)
DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
(misc:386)
DEBUG (hsm/init) [root] /usr/bin/taskset --cpu-list 0-0 /sbin/udevadm
settle --timeout=5 (cwd None) (commands:200)
DEBUG (hsm/init) [root] SUCCESS: <err> = b''; <rc> = 0
(commands:221)
DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
(misc:386)
DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
/usr/bin/sudo -n /sbin/lvm pvs --config 'devices {
preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global
{ locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
--units b --nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count
(cwd None) (commands:200)
DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> =
0
(commands:221)
DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
/usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global
{ locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
--units b --nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
(cwd None) (commands:200)
DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> =
0
(commands:221)
DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
/usr/bin/sudo -n /sbin/lvm lvs --config 'devices {
preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global
{ locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
--units b --nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,vg_name,attr,size,seg_start_pe,devices,tags (cwd None)
(commands:200)
DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> =
0
(commands:221)
DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
/usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] } global
{ locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
--units b --nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
(cwd None) (commands:200)
DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b''; <rc> =
0
(commands:221)
INFO (hsm/init) [storage.HSM] FINISH HSM init succeeded in 0.43 seconds
(hsm:410)
DEBUG (hsm/init) [storage.HSM] FINISH thread <Thread(hsm/init, stopped
daemon 140180442818304)> (concurrent:196)
and is completely silent after that.
Thanks,
Milan