-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with pyxlma_flash_sort_grid script #51
Comments
I haven't tested the issue yet, but I believe this is largely what #42 is designed to address, handling datasets where the network changes configurations across different files. I haven't tested that draft against inconsistent number of columns across the files, only inconsistent column data across the files, but I'll be sure to do so before that gets merged. (Given my current schedule, that PR being polished and merged is probably a ways out, so it might be good to look at fixing this now temporarily) As noted in my rambling comments of that PR, Regarding out of memory operations, one of the things I haven't added yet (but would be relatively trivial) is to allow lma_read.dataset to include an I have done some benchmarking of pyxlma_flash_sort_grid vs lmatools and I remember it being faster, but I think I just compared script runtimes, and lmatools' version generates the pdf files with the plots. This was a while ago, so maybe some retesting with the plotting on lmatools disabled would be a good idea. I believe the goal is to eventually replace lmatools, but clearly we aren't there yet.. |
Thanks for sharing your thoughts, @wx4stg. I'll keep an eye on the updates! |
I found a possible issue while trying to run the
examples/pyxlma_flash_sort_grid.py
file. The code would break and throw error at the following line in the linedataset, start_time = lma_read.dataset(paths_to_read)
within theflash_sort_grid
function. This occurs beacuse thelmafile
class calls thegen_sta_data
function:Apparently, this happens when some
LYLOUT*.dat.gz
files have inconsistent number of columns under station data. For example, here's what I found in two different OKLMA files from the same day:Notice how the second file doesn't contain any values corresponding to the
dec_win(us)
column header.File content in
LYLOUT_110524_000000_0600.dat.gz
File content in
LYLOUT_110524_205000_0600.dat.gz
I figured some flexibility in both
gen_sta_data
andgen_sta_info
functions can deal with this inconsistency. For example, here's what worked for me:I could run the flash_sort script after these modifications, but it was quite slow compared to simply running lmatools. Ingesting too many files at the same time overloaded the kernel due to out-of-memory issues with xarray data handler. I am not sure if this script is still WIP or is meant to replace lmatools eventually, but at the time of testing, did not offer any advantage over the good old lmatools' processing speed. I'd love to hear what @deeplycloudy or @wx4stg have to say. Happy to be corrected, of course!
The text was updated successfully, but these errors were encountered: