-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas-based LMA datafile reader #2
Comments
OK, this is really straightforward. So then some next things to add would be:
|
Per number 4. The number of stations contributing to each source should already archived in the LMA file, right? Or are you thinking of a selection based on the list of which stations were contributing? |
Not from all networks. OKLMA data I used back in the day had the station count, but you have to count the bits for the WTLMA data and others I've seen.
|
Given the differences in the two formats, it seems that we actually need to parse on the basis of the |
Alright, I've been staring at the one format too long! That should be an easy adjustment. |
Here's a snippet to automatically pull the column headers from the file header. I'm sure there's a more fundamental method to parse the text line from the zipped file, but this does get the job done.
|
Added some pieces for pulling more of the header information via Pandas and separating out all of the stations in the mask using the mask_to_int function from lmatools. Time issues 1-3 are addressed in the addition of a new column called 'Datetime.' Potential issue: Header information (station locations, % contributions, etc.) are read in with a fixed width format since station names may white space. I'm unsure how consistent those widths are among existing file formats.
|
Nice, thanks! Could you add a complete version of your reader class in something like pyxlma will live at the same level as the main README, and will be the eventual package name. We'll probably need to rearrange/rename later. lmalib is to separate the plot-agnostic calculations and readers from any GUI or plotting stuff that makes use of data processed in lmalib. |
Regarding the header format, yeah – there's no |
With the lack of the check on format, it has been added. Maybe an check on whether the values read in are floats/strings would work? Pandas can infer the widths instead of specifying it. I have not had luck with that method, but maybe it's something that would be useful to troubleshoot. |
Cool, I just swapped this in to my glueviz example (now added, too) and it works great. I also added some minimal packaging stuff so Technically we've fulfilled the basics of this issue. Are we ready to start splitting out a separate issues for specific things like the header i/o? Or keep this as an open issue until it's a bit more mature? |
It's probably fine to split it into smaller issues and work on the specific things along the way. I just played around with the glueviz example, looks like it will be useful! |
Closing this issue, having opened #6 to continue the discussion about the remaining issue about the header format. |
A minimal, Pandas-based reader for LMA files would be useful. An example of such a method is below with a basic selection within the dataframe, which could be easily extended to other criteria.
Basic Pandas object from .dat.gz file
Use example:
The text was updated successfully, but these errors were encountered: