On Tue, Feb 5, 2019 at 9:25 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Tue, Feb 5, 2019 at 7:15 PM Milan Zamazal
<mzamazal(a)redhat.com> wrote:
> Hi, I'm trying to run Vdsm with Python 3, which exposes various Python 3
> incompatibilities in Vdsm code. I'm currently semi-stuck on Vdsm
> crashing very shortly after startup with:
>
> Panic: Error initializing IRS
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 95, in
> serve_clients
> irs = Dispatcher(HSM())
> File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
> line 46, in __init__
> self._exposeFunctions(obj)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
> line 56, in _exposeFunctions
> if hasattr(funcObj, _EXPORTED_ATTRIBUTE) and callable(funcObj):
> File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 85,
> in __getattr__
> raise se.StoragePoolNotConnected
> vdsm.storage.exception.StoragePoolNotConnected: Storage pool not
> connected: ()
>
This should be fixed first. exposing HSM function should not raise this
error.
Can you check if
https://gerrit.ovirt.org/c/97590/ fixes this issue?
The best way would be to write tests for the dispatcher and run the
tests
with python 3.
When the dispatcher is ported to python 3 you can try again to start vdsm.
Even if you port entire vdsm to python 3, sanlock does not ship yet
python 3 bindings, so you cannot have any storage except local storage,
iso domain and export domain.
That wouldn't be a problem by itself, errors are expected in Python 3
> experiments. However the problem is that this is often the *only* error
> I receive.
>
> I've found out that HSM() call spawns threads, which apparently don't
> have chance to report their errors before the panic above kills Vdsm.
> For instance, there was a string-byte mismatch in running iscsiadm
> scanning commands. Only after I added time.sleep to the end of
> HSM.__init__, I got an error + traceback from that and could see what's
> wrong. That worries me a bit, because it might mean that if some
> initialization external commands are not fast enough, Vdsm may not be
> able to start.
>
The real fix for this issue to eliminate the storage startup code.
Storage should do the minimal initialization needed (e.g. start monitoring
udev events) and wait for engine. When engine ask to connect to storage,
vdsm should the needed scanning and lvm initialization that is done now
during startup.
Then if some storage code fail, only connecting to this kind of storage
would fail and vdsm will not panic.
As temporary solution you can disable the storage refresh thread
started in HSM.__init__, and set HSM.ready to True immediately.
This will have similar effect to the long term fix.
After fixing some problems, I get the panic again, even with the sleep
> at the end of HSM.__init__, and again without any other error. It's
> difficult to proceed in such a situation -- any tips what to look at or
> to look for are welcome.
>
You can disable irs in the config, but then you will have no storage.
Maybe this is good enough at this point.
Nir
>
> FWIW, my vdsm.log now ends with
>
> INFO (MainThread) [storage.check] Starting check service (check:91)
> DEBUG (check/loop) [root] START thread <Thread(check/loop, started
> daemon 140180434425600)> (func=<bound method EventLoop.run_forever of
> <EventLoop running=True closed=False at 0x140181096264704>>, args=(),
> kwargs={}) (concurrent:193)
> INFO (check/loop) [storage.asyncevent] Starting <EventLoop
> running=True closed=False at 0x140181096264704> (asyncevent:125)
> DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
> (misc:386)
> DEBUG (hsm/init) [storage.SamplingMethod] Trying to enter sampling
> method (vdsm.storage.hba.<function rescan at 0x7f7e7a597e18>) (misc:376)
> DEBUG (hsm/init) [storage.SamplingMethod] Got in to sampling method
> (misc:379)
> DEBUG (hsm/init) [storage.HBA] Starting scan (hba:58)
> DEBUG (hsm/init) [storage.HBA] Scan finished (hba:64)
> DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
> (misc:386)
> DEBUG (hsm/init) [root] /usr/bin/taskset --cpu-list 0-0 /sbin/udevadm
> settle --timeout=5 (cwd None) (commands:200)
> DEBUG (hsm/init) [root] SUCCESS: <err> = b''; <rc> = 0
(commands:221)
> DEBUG (hsm/init) [storage.SamplingMethod] Returning last result
> (misc:386)
> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
> /usr/bin/sudo -n /sbin/lvm pvs --config 'devices {
> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] }
global
> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1
> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>
uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count
> (cwd None) (commands:200)
> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b'';
<rc> = 0
> (commands:221)
> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
> /usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] }
global
> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1
> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> (cwd None) (commands:200)
> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b'';
<rc> = 0
> (commands:221)
> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
> /usr/bin/sudo -n /sbin/lvm lvs --config 'devices {
> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] }
global
> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1
> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
> uuid,name,vg_name,attr,size,seg_start_pe,devices,tags (cwd None)
> (commands:200)
> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b'';
<rc> = 0
> (commands:221)
> DEBUG (hsm/init) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0
> /usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
> preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
> write_cache_state=0 disable_after_error_count=3 filter=["r|.*|"] }
global
> { locking_type=1 prioritise_write_locks=1 wait_for_locks=1
> use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings
> --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> (cwd None) (commands:200)
> DEBUG (hsm/init) [storage.Misc.excCmd] SUCCESS: <err> = b'';
<rc> = 0
> (commands:221)
> INFO (hsm/init) [storage.HSM] FINISH HSM init succeeded in 0.43
> seconds (hsm:410)
> DEBUG (hsm/init) [storage.HSM] FINISH thread <Thread(hsm/init, stopped
> daemon 140180442818304)> (concurrent:196)
>
> and is completely silent after that.
>
> Thanks,
> Milan
>