Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request up to 5 principal components in PCA process #1404

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jkokosar
Copy link
Member

Checklist

  • Update CHANGELOG.rst for each commit separately:
    • Pay attention to write entries under the "Unreleased" section.
    • Mark all breaking changes as "BACKWARD INCOMPATIBLE:" and put them
      before non-breaking changes.
    • If a commit modifies a feature listed under "Unreleased" section,
      it might be sufficient to modify the existing CHANGELOG entry from previous
      commit(s).
  • Bump the process version:
    • MAJOR version (first number): Backward incompatible changes (changes
      that break the api/interface). Examples: renaming the input/output, adding
      mandatory input, removing input/output...
    • MINOR version (middle number): add functionality or changes in a
      backwards-compatible manner. Examples: add output field, add non-mandatory
      input parameter, use a different tool that produces same results...
    • PATCH version (last number): changes/bug fixes that do not affect
      the api/interface. Examples: typo fix, change/add warning messages...
  • All inputs are used in process.
  • All output fields have a value assigned to them.

Additional guidelines

Processes

  • Code that was once accepted should be kept like it is. It should be changed
    only when its functionality has changed. Small parts may be changed
    to fix bugs (by changing code only to make it nicer you might introduce a bug).
  • Set the optimal number of cores that the tool can use (check if parallelization
    can be performed).
  • Description of the tool should be self explanatory with links to the tool's
    homepage and publication.
  • Write explanatory input labels and add short command line parameter names
    in square brackets (for example: Maximum length [--maxlength]).
  • Where possible provide default values for the process input fields.
    If they are provided, then required = False is not needed.
  • Add descriptive error and warning messages.
  • Use Cmd() for each call in the process (call of a tool, a
    Bash command, a Python script or an R script) and handle return codes.
  • Add self.progress() calls where appropriate.

Commit and PR messages

  • Each commit should be minimal (i.e. 1 change) and self-contained (including
    tests).
  • Mark incomplete PRs with the [WIP] (Work In Progress) tag.
  • Do not paste links to private repositories from the public one.

Tests

  • Add meaningful names of test files:
    • Genomes: genome_species.fa.gz
    • Annotations: annotation_species.gtf.gz
    • Adapters: adapters-name/type-source.
  • Specific files should be saved in a folder with the name of the tested tool.
    This folder should include separate subfolders for inputs and outputs.
  • Keep test files' size small. If there is no other option, add the large
    test file to the files/large directory so it will be tracked with Git LFS.
  • Pay attention to edge cases: what will happen if any data is
    missing, if the length of some list is 0, if a string is given where one
    expects an integer, if a file is empty, ...
  • When testing workflows, try to verify workflow success by checking one
    of the last steps. Pick the one that is most likely to fail. Also test
    that all steps have 'OK' status.

Copy link
Contributor

@marcellevstek marcellevstek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder why there is an error on GHA.

@jkokosar jkokosar force-pushed the extend_pca_components branch 10 times, most recently from 5cea1ca to 5e342b4 Compare December 19, 2024 14:49
@jkokosar jkokosar force-pushed the extend_pca_components branch from 5e342b4 to a0736fb Compare December 19, 2024 14:50
@jkokosar jkokosar requested a review from gregorjerse December 19, 2024 14:51
@jkokosar
Copy link
Member Author

@marcellevstek please re-review. I had to modify what is being tested. In this test case, 4 PCA components are computed. On GHA, the last component (with the least amount of explained variance) differs from the value returned by the locally triggered tests. I've tried modifying the value of svd_solver from "auto" to "full", but it did't help to overcome the discrepancy. The same goes for fixing the value of "random_state" parameter to a fixed value.

@marcellevstek
Copy link
Contributor

@marcellevstek please re-review. I had to modify what is being tested. In this test case, 4 PCA components are computed. On GHA, the last component (with the least amount of explained variance) differs from the value returned by the locally triggered tests. I've tried modifying the value of svd_solver from "auto" to "full", but it did't help to overcome the discrepancy. The same goes for fixing the value of "random_state" parameter to a fixed value.

LGTM from the PR point of view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants