Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve data_prep.R file loading approach with config file to specify… #39

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Neilmagi
Copy link
Collaborator

@Neilmagi Neilmagi commented Oct 7, 2024

… which files to read in every ecoregion

Summary by Sourcery

Enhance the data loading approach in the data_prep.R script by using a configuration file to specify the files to read for each ecoregion, allowing for more dynamic and customizable data processing.

Enhancements:

  • Introduce a configuration file (config.yml) to specify which files to read for each ecoregion, improving flexibility and maintainability.

Copy link
Contributor

sourcery-ai bot commented Oct 7, 2024

Reviewer's Guide by Sourcery

This pull request improves the data loading approach in the prep_data.R file by introducing a configuration file (YAML) to specify which files to read for each ecoregion. The changes make the data processing more flexible and configurable.

Class diagram for updated data processing in prep_data.R

classDiagram
    class process_ecoregion_data {
        -ecoregion
        -config
        +process_ecoregion_data(ecoregion, config)
    }
    class YAMLConfig {
        +load_file(path)
        +get_config_for_ecoregion(ecoregion)
    }
    class DataList {
        +lapply(names, function)
        +remove_null_entries()
    }
    process_ecoregion_data --> YAMLConfig : uses
    process_ecoregion_data --> DataList : creates
Loading

File-Level Changes

Change Details Files
Introduce YAML configuration for file loading
  • Add yaml library
  • Create config_path variable
  • Load YAML configuration file
  • Update process_ecoregion_data function to accept config parameter
  • Implement config-based file loading logic
data-raw/prep_data.R
Implement dynamic file loading based on configuration
  • Replace hardcoded file reading with dynamic approach
  • Add error handling for missing configuration
  • Add warning for missing files
  • Implement file existence check
  • Remove NULL entries for files not found
data-raw/prep_data.R
Update main processing loop
  • Modify map function call to include config parameter
  • Return data_list instead of hardcoded list
data-raw/prep_data.R

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Neilmagi - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • Update function call to match new process_ecoregion_data signature (link)

Overall Comments:

  • Consider implementing more robust error handling and logging, especially for missing files or configuration errors. The current approach of warning and continuing execution could lead to silent failures.
  • The function signature change for process_ecoregion_data might break existing code. Consider maintaining backward compatibility or clearly documenting this breaking change.
Here's what I looked at during the review
  • 🔴 General issues: 1 blocking issue, 1 other issue
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

# Read raw data files for the ecoregion based on config
data_list <- lapply(names(config$files), function(file_key) {
file_path <- file.path(RAW_DATA_DIR, ecoregion, config$files[[file_key]])
if (!file.exists(file_path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Enhance error handling and logging for missing data files

Consider adding more detailed logging, such as which ecoregion and file type is missing. This would help in debugging and data completeness verification.

if (!file.exists(file_path)) {
  log_message <- sprintf("File not found for ecoregion '%s', file type '%s': %s", 
                         ecoregion, file_key, file_path)
  warning(log_message)
  logging::logwarn(log_message)

}

# Get list of ecoregions (assuming each subdirectory in RAW_DATA_DIR is an ecoregion)
ecoregions <- list.dirs(RAW_DATA_DIR, full.names = FALSE, recursive = FALSE)

# Process data for each ecoregion
all_data <- map(ecoregions, process_ecoregion_data)
all_data <- map(ecoregions, process_ecoregion_data, all_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Update function call to match new process_ecoregion_data signature

The process_ecoregion_data function now expects two arguments, but it's being called with only one. This will cause an error. Update the map call to pass both ecoregions and all_config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant