Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude.list management #249

Open
jbpaduan opened this issue Jun 3, 2024 · 5 comments
Open

Exclude.list management #249

jbpaduan opened this issue Jun 3, 2024 · 5 comments

Comments

@jbpaduan
Copy link
Collaborator

jbpaduan commented Jun 3, 2024

The exclude.list, by which directories in SeafloorMapping get excluded from consideration by the SMDB load script, has become cumbersome, will only get longer, and must be managed by someone with a working SMDB Docker. This issue addresses changes to improve its management, by changing how: 1) the exclude.list is constructed, 2) additions/subtractions to the list are made, and 3) the list can be evaluated.

  1. Spreadsheets in the year/SMDB directories will contain paths to directories to be excluded from the exclude.list, named like 2023/exclude_list.xlsx, and the load script will concatenate these source spreadsheets to regenerate the exclude.list file in the repository.
  2. Additions/subtractions can be made by modifying the source spreadsheets.
  3. In the header of the SMDB website, a pop-up window can be opened that lists, in descending year order and then alphabetically, the contents of the exclude.list, as a reflection of what the database thinks should be excluded. Presumably in the Load Log Output, those paths not found should be indicated with a warning, to mark a path that hasn't been found and make it obvious there might be a typo in a source spreadsheet.
@MBARIMike
Copy link
Contributor

I suggest naming the exclude_list files like:

2023/exclude_list_2023.xlsx
OceanImaging2017/exclude_list_OceanImaging2017.xlsx

Including the parent directory name in the .xlsx file name will avoid confusion if multiple spreadsheets are opened in Excel.

I suggest moving the button for the Load Log Output from the home page back to the header (undoing 4133a33 and 6415533) so that it's visible from all the other pages of the site. I often find myself on the missions or compilations page and want to check the load log for something is the way it is. It'd be nice to have simple hrefs with target='_blank' to the load log output and the exclude list in the header so that it's easily accessible from all pages on the site.

@MBARIMike
Copy link
Contributor

On second thought I propose naming the exclude_list files like:

/Volumes/SeafloorMapping/2024/SMDB/2024_exclude_list.csv
/Volumes/SeafloorMapping/MappingAUVOps2006/SMDB/MappingAUVOps2006_exclude_list.csv

This is analogous with how the survey_tally files are named. I also propose using the same workflow as is done for the survey_tally files:

  1. Existing exclude_list paths will be written to a .csv file
  2. The .csv file will be imported into and Excel file and saved and edited as an .xlsx file
  3. The exclude_list paths used during the load will be those that are in the .xlsx files

Once all of the .xlsx files are created from the .csv files I'll change the load.py logic to use them instead of the repo's smdb/config/exclude.list file.

@MBARIMike
Copy link
Contributor

#254 has been pulled to production and executed with this output:

INFO 2024-06-05 19:54:42,369 load.py read_config_exclude_list():1760 Read 177 paths to exclude from /app/config/exclude.list
INFO 2024-06-05 19:54:42,519 load.py read_exclude_path_xlsxs():1783 Read 6 paths to exclude from /mbari/SeafloorMapping/2019/SMDB/2019_exclude_list.xlsx
INFO 2024-06-05 19:54:42,550 load.py write_exclude_path_csvs():1808 Wrote 6 paths to /mbari/SeafloorMapping/2019/SMDB/2019_exclude_list.csv
INFO 2024-06-05 19:54:42,553 load.py write_exclude_path_csvs():1808 Wrote 6 paths to /mbari/SeafloorMapping/2020/SMDB/2020_exclude_list.csv
INFO 2024-06-05 19:54:42,555 load.py write_exclude_path_csvs():1808 Wrote 4 paths to /mbari/SeafloorMapping/2021/SMDB/2021_exclude_list.csv
INFO 2024-06-05 19:54:42,557 load.py write_exclude_path_csvs():1808 Wrote 18 paths to /mbari/SeafloorMapping/2022/SMDB/2022_exclude_list.csv
INFO 2024-06-05 19:54:42,559 load.py write_exclude_path_csvs():1808 Wrote 23 paths to /mbari/SeafloorMapping/2024/SMDB/2024_exclude_list.csv
INFO 2024-06-05 19:54:42,561 load.py write_exclude_path_csvs():1808 Wrote 9 paths to /mbari/SeafloorMapping/MappingAUVOps2006/SMDB/MappingAUVOps2006_exclude_list.csv
INFO 2024-06-05 19:54:42,563 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/MappingAUVOps2007/SMDB/MappingAUVOps2007_exclude_list.csv
INFO 2024-06-05 19:54:42,565 load.py write_exclude_path_csvs():1808 Wrote 7 paths to /mbari/SeafloorMapping/MappingAUVOps2008/SMDB/MappingAUVOps2008_exclude_list.csv
INFO 2024-06-05 19:54:42,567 load.py write_exclude_path_csvs():1808 Wrote 8 paths to /mbari/SeafloorMapping/MappingAUVOps2009/SMDB/MappingAUVOps2009_exclude_list.csv
INFO 2024-06-05 19:54:42,569 load.py write_exclude_path_csvs():1808 Wrote 8 paths to /mbari/SeafloorMapping/MappingAUVOps2010/SMDB/MappingAUVOps2010_exclude_list.csv
INFO 2024-06-05 19:54:42,571 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/MappingAUVOps2011/SMDB/MappingAUVOps2011_exclude_list.csv
INFO 2024-06-05 19:54:42,573 load.py write_exclude_path_csvs():1808 Wrote 4 paths to /mbari/SeafloorMapping/MappingAUVOps2012/SMDB/MappingAUVOps2012_exclude_list.csv
INFO 2024-06-05 19:54:42,575 load.py write_exclude_path_csvs():1808 Wrote 3 paths to /mbari/SeafloorMapping/MappingAUVOps2013/SMDB/MappingAUVOps2013_exclude_list.csv
INFO 2024-06-05 19:54:42,577 load.py write_exclude_path_csvs():1808 Wrote 3 paths to /mbari/SeafloorMapping/MappingAUVOps2014/SMDB/MappingAUVOps2014_exclude_list.csv
INFO 2024-06-05 19:54:42,579 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/MappingAUVOps2015/SMDB/MappingAUVOps2015_exclude_list.csv
INFO 2024-06-05 19:54:42,581 load.py write_exclude_path_csvs():1808 Wrote 14 paths to /mbari/SeafloorMapping/MappingAUVOps2016/SMDB/MappingAUVOps2016_exclude_list.csv
INFO 2024-06-05 19:54:42,583 load.py write_exclude_path_csvs():1808 Wrote 7 paths to /mbari/SeafloorMapping/MappingAUVOps2017/SMDB/MappingAUVOps2017_exclude_list.csv
INFO 2024-06-05 19:54:42,585 load.py write_exclude_path_csvs():1808 Wrote 2 paths to /mbari/SeafloorMapping/MappingAUVOps2018/SMDB/MappingAUVOps2018_exclude_list.csv
INFO 2024-06-05 19:54:42,587 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/MappingAUVOpsStuff/SMDB/MappingAUVOpsStuff_exclude_list.csv
INFO 2024-06-05 19:54:42,589 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/OceanImaging2012/SMDB/OceanImaging2012_exclude_list.csv
INFO 2024-06-05 19:54:42,591 load.py write_exclude_path_csvs():1808 Wrote 12 paths to /mbari/SeafloorMapping/OceanImaging2013/SMDB/OceanImaging2013_exclude_list.csv
INFO 2024-06-05 19:54:42,593 load.py write_exclude_path_csvs():1808 Wrote 10 paths to /mbari/SeafloorMapping/OceanImaging2014/SMDB/OceanImaging2014_exclude_list.csv
INFO 2024-06-05 19:54:42,595 load.py write_exclude_path_csvs():1808 Wrote 6 paths to /mbari/SeafloorMapping/OceanImaging2015/SMDB/OceanImaging2015_exclude_list.csv
INFO 2024-06-05 19:54:42,597 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/OceanImaging2016/SMDB/OceanImaging2016_exclude_list.csv
INFO 2024-06-05 19:54:42,599 load.py write_exclude_path_csvs():1808 Wrote 3 paths to /mbari/SeafloorMapping/OceanImaging2018/SMDB/OceanImaging2018_exclude_list.csv
INFO 2024-06-05 19:54:42,601 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/mbsystem/SMDB/mbsystem_exclude_list.csv
INFO 2024-06-05 19:54:42,603 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/swathdata/SMDB/swathdata_exclude_list.csv

File /mbari/SeafloorMapping/2019/SMDB/2019_exclude_list.xlsx was created from the corresponding .csv file as a test. The remaining .csv files need to be converted to .xlsx files where new edits can be made.

@MBARIMike
Copy link
Contributor

The .xlsx -> load -> .csv workflow is now in place in production with the sorted consolidated exclude list written to https://smdb.shore.mbari.org/media/logs/exclude_list.txt

@MBARIMike
Copy link
Contributor

Last weekend's load failed to exclude any exclude_paths because the logic was changed in #258. #259 should fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants