-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve data_prep.R file loading approach with config file to specify… #39
base: main
Are you sure you want to change the base?
Conversation
… which files to read in every ecoregion
Reviewer's Guide by SourceryThis pull request improves the data loading approach in the Class diagram for updated data processing in prep_data.RclassDiagram
class process_ecoregion_data {
-ecoregion
-config
+process_ecoregion_data(ecoregion, config)
}
class YAMLConfig {
+load_file(path)
+get_config_for_ecoregion(ecoregion)
}
class DataList {
+lapply(names, function)
+remove_null_entries()
}
process_ecoregion_data --> YAMLConfig : uses
process_ecoregion_data --> DataList : creates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Neilmagi - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- Update function call to match new
process_ecoregion_data
signature (link)
Overall Comments:
- Consider implementing more robust error handling and logging, especially for missing files or configuration errors. The current approach of warning and continuing execution could lead to silent failures.
- The function signature change for
process_ecoregion_data
might break existing code. Consider maintaining backward compatibility or clearly documenting this breaking change.
Here's what I looked at during the review
- 🔴 General issues: 1 blocking issue, 1 other issue
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
# Read raw data files for the ecoregion based on config | ||
data_list <- lapply(names(config$files), function(file_key) { | ||
file_path <- file.path(RAW_DATA_DIR, ecoregion, config$files[[file_key]]) | ||
if (!file.exists(file_path)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Enhance error handling and logging for missing data files
Consider adding more detailed logging, such as which ecoregion and file type is missing. This would help in debugging and data completeness verification.
if (!file.exists(file_path)) {
log_message <- sprintf("File not found for ecoregion '%s', file type '%s': %s",
ecoregion, file_key, file_path)
warning(log_message)
logging::logwarn(log_message)
} | ||
|
||
# Get list of ecoregions (assuming each subdirectory in RAW_DATA_DIR is an ecoregion) | ||
ecoregions <- list.dirs(RAW_DATA_DIR, full.names = FALSE, recursive = FALSE) | ||
|
||
# Process data for each ecoregion | ||
all_data <- map(ecoregions, process_ecoregion_data) | ||
all_data <- map(ecoregions, process_ecoregion_data, all_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Update function call to match new process_ecoregion_data
signature
The process_ecoregion_data
function now expects two arguments, but it's being called with only one. This will cause an error. Update the map
call to pass both ecoregions
and all_config
.
… which files to read in every ecoregion
Summary by Sourcery
Enhance the data loading approach in the data_prep.R script by using a configuration file to specify the files to read for each ecoregion, allowing for more dynamic and customizable data processing.
Enhancements: