Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1031 split tests #1039

Open
wants to merge 6 commits into
base: mar25-update
Choose a base branch
from
Open

1031 split tests #1039

wants to merge 6 commits into from

Conversation

lizihao-anu
Copy link
Contributor

@lizihao-anu lizihao-anu commented Dec 24, 2024

This branch writes out tests into different xlsx files separately to avoid writing into one file simultaneously and then combine the tests outputs into files by years. So, the final outputs are the same as before.

There is another branch 1031-remove-tests-from-targets, which takes out all tests from targets and do tests by sequence. Finishing all tests for each year would take around 25 minutes. It is time-consuming and hence this approach is much less favourable.

Summary of key changes:

  • Functions added/changed include
    • setup_tests_file_name: It is taken out from write_tests_xlsx and set up paths for tests file and folder.
    • combine_multi_xlsx, combine_extracts_tests_year, combine_lookup_tests, combine_tests: Combine xlsx files together
  • targets script has been restore for parallel computing for tests and there is no problem of writing tests into one file simultaneously.
  • processs_tests_sds has been added filter of check_year_valid to avoid corruption in targets for tests_sds_2425.

A thorough test has been done and successfully went through without any problem. So it is good for review and then merge.

@lizihao-anu lizihao-anu requested a review from Jennit07 December 24, 2024 10:49
Copy link

@check-spelling-bot Report

🔴 Please review

See the 📂 files view, the 📜action log, or 📝 job summary for details.

Unrecognized words (2)

processs
SPSS

These words are not needed and should be removed anomymous datas scoial spss

To accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands

... in a clone of the [email protected]:Public-Health-Scotland/source-linkage-files.git repository
on the 1031-split-tests branch (ℹ️ how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' |
perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/12480701565/attempts/1'

OR

To have the bot accept them for you, comment in the PR quoting the following line:
@check-spelling-bot apply updates.

Available 📚 dictionaries could cover words (expected and unrecognized) not in the 📘 dictionary

This includes both expected items (313) from .github/actions/spelling/expect.txt and unrecognized words (2)

Dictionary Entries Covers Uniquely
cspell:fullstack/dict/fullstack.txt 419 3 3
cspell:k8s/dict/k8s.txt 153 4 1
cspell:php/dict/php.txt 1689 4
cspell:node/dict/node.txt 891 3 1
cspell:npm/dict/npm.txt 302 3

Consider adding them (in .github/workflows/spelling.yml) in jobs:/spelling: for uses: check-spelling/check-spelling@main in its with:

      with:
        extra_dictionaries: |
          cspell:fullstack/dict/fullstack.txt
          cspell:k8s/dict/k8s.txt
          cspell:php/dict/php.txt
          cspell:node/dict/node.txt
          cspell:npm/dict/npm.txt

To stop checking additional dictionaries, add (in .github/workflows/spelling.yml) for uses: check-spelling/check-spelling@main in its with:

check_extra_dictionaries: ''
Errors (1)

See the 📂 files view, the 📜action log, or 📝 job summary for details.

❌ Errors Count
❌ ignored-expect-variant 3

See ❌ Event descriptions for more information.

If the flagged items are 🤯 false positives

If items relate to a ...

  • binary file (or some other file you wouldn't want to check at all).

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

Comment on lines +10 to +27
if (check_year_valid(year, "sds")) {
data <- data %>%
slfhelper::get_chi()

old_data <- get_existing_data_for_tests(data)
old_data <- get_existing_data_for_tests(data)

data <- rename_hscp(data)
data <- rename_hscp(data)

comparison <- produce_test_comparison(
old_data = produce_source_sds_tests(old_data),
new_data = produce_source_sds_tests(data)
) %>%
write_tests_xlsx(sheet_name = "sds", year, workbook_name = "extract")
comparison <- produce_test_comparison(
old_data = produce_source_sds_tests(old_data),
new_data = produce_source_sds_tests(data)
) %>%
write_tests_xlsx(sheet_name = "sds", year, workbook_name = "extract")

return(comparison)
return(comparison)
} else {
return(NULL)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this. Looks fine

@Jennit07
Copy link
Collaborator

Jennit07 commented Jan 8, 2025

Hi Zihao, i have had a look through your changes. I think we can have a discussion about this when you are back from leave. I have a few questions:

  • when you say targets script has been restore for parallel computing for tests and there is no problem of writing tests into one file simultaneously. im a bit confused about how we should be running the tests. Is this still in targets but then it processes each extract separately? When would we combine them?
  • I am wondering about what happens to each individual workbook when the tests are combined? do we remove the individual workbooks? I am not sure how much space this will take up, although i know these are small files. However, i think we should have some structure/organisation in place as we are writing multiple tests each update.
  • In the combined tests final output, the formatting looks like it is missing even though the individual extracts do have the formatting in place. Im not sure where this is lost.

Happy to have a discussion about this when you are back, Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants