Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TopmedPipeline.py Cluster classes #55

Open
amstilp opened this issue Mar 20, 2023 · 0 comments
Open

Refactor TopmedPipeline.py Cluster classes #55

amstilp opened this issue Mar 20, 2023 · 0 comments

Comments

@amstilp
Copy link
Contributor

amstilp commented Mar 20, 2023

These changes could make it easier and more straightforward to subclass Cluster for your own specific cluster instance. I'm also not sure that it would work with all of the existing cluster subclasses.

  • Cluster: Add methods to process the arguments passed to submitJob. Any class subclassing Cluster would be expected to override these methods. The method in the Cluster class should return a raise a NotImplementedError.
  • Cluster: Add a submitJob method to the Cluster class. This method would call the methods defined in the previous step and then finally submit the generated command to the cluster. This also defines the keywords that are allowed to be passed to submitJob (which I think are only defined in the subclass methods)?
  • Subclasses of cluster: implement the methods from the first step, specific for the cluster type.

Users would then only have to override the get_job_name, get_submit_command etc methods in their subclass, rather than overriding the entire submitJob method.

Here is a brief toy example:

class Cluster:
    def __init__(self):
        # intended to hold a list of arguments to pass to the submit command (e.g. "-N <job_name>" for SGE and "--job-name=<job_name>" for Slurm.
        self.submit_arguments = []

    def set_job_name(self):
        raise NotImplementedError("implement get_job_name")

    def get_submit_command(self):
        raise NotImplementedError("implement get_submit_command")

    def construct_command(self, submit_command):
         return self.get_submit_command + " " + " ".join(self.submit_arguments)

    def submitJob(self, job_name):
        self.set_job_name(self, job_name):
        cmd = self.construct_command()
        (jobid, status) = port_popen.popen(cmd)
        return jobid

class SGECluster(Cluster):
    def get_job_name(self, job_name):
        self.submit_arguments = "-N " + job_name

    def get_submit_command(self):
        return "qsub"

class SlurmCluster(Cluster):
    def get_job_name(self, job_name):
        self.submit_arguments = "--job-name " + job_name

    def get_submit_command(self):
        return "sbatch"

Obviously we would need to think about the exact implementation, what would be defined where, and what methods would be needed, but in the long run it would probably make it easier for a user to create a new subclass.

amstilp added a commit that referenced this issue Mar 22, 2023
runRscript.sh was written for use with SGE, using SGE-specific
environment variables. Modify the script to process either SGE or
Slurm environment variables by setting cluster-agnostic variables
at the beginning of the script based on the SGE variables or the
Slurm variables, depending on are set. Then use these cluster-agnostic
variables in the remainder of the script.

Note: it would probably be better to have the Cluster subclasses
in TopmedPipeline.py set the proper variables for the script, if
we refactored the classes following #55.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant