Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline integration] Starting pipelines #4031

Open
4 tasks
Vince-janv opened this issue Dec 18, 2024 · 3 comments · May be fixed by #4227
Open
4 tasks

[Pipeline integration] Starting pipelines #4031

Vince-janv opened this issue Dec 18, 2024 · 3 comments · May be fixed by #4227

Comments

@Vince-janv
Copy link
Contributor

Vince-janv commented Dec 18, 2024

Description

The process of starting pipelines is currently extremely flexible. Most of them consist of the the same CLI commands and the same methods to support them, but this results in a lot of code duplication. This flexibility also allows each pipeline to implement the methods slightly differently causing maintainability issues.

Given this we can also define rules that pipelines need to abide to, to be able to be implemented.
For example:

  • The input parameters should be able to be parsed from a file
  • Sample information can be submitted in a samplesheet

Additionally we only rely on cg invoking the CLI of the pipeline. For Nextflow analyses we can send requests to seqera-platform instead.

Acceptance criteria

  • One should be able to add extra flags that will be appended to the start command
  • New NextFlow pipelines should use the seqera-platform API instead of invoking the CLI

Notes

Example start commands:

  • BALSAMIC: /home/proj/production/bin/miniconda3/bin/conda run --name P_balsamic16.0.0 /home/proj/production/bin/miniconda3/envs/P_balsamic16.0.0/bin/balsamic run analysis --account production --mail-user [email protected] --qos high --sample-config /home/proj/production/cancer/cases/invitingswan/invitingswan.json --run-analysis --benchmark
  • BALSAMIC-UMI: /home/proj/stage/bin/miniconda3/bin/conda run --name S_balsamic16.0.0 /home/proj/stage/bin/miniconda3/envs/S_balsamic16.0.0/bin/balsamic run analysis --account development --mail-user [email protected] --qos low --sample-config /home/proj/stage/cancer/cases/solidladybird/solidladybird.json --benchmark
  • FLUFFY: /home/proj/stage/bin/miniconda3/envs/S_fluffy/bin/fluffy --config /home/proj/stage/servers/config/hasta.scilifelab.se/fluffy-automated-stage.json --sample /home/proj/stage/nipt/cases/goldentiger/SampleSheet_1604_13_Orderform_Ready_Made_Libraries_NIPT2202196.csv --project /home/proj/stage/nipt/cases/goldentiger/fastq --out /home/proj/stage/nipt/cases/goldentiger/output --analyse --batch-ref --slurm_params qos:high
  • MICROSALT: /home/proj/stage/bin/miniconda3/bin/conda run --name S_microSALT /home/proj/stage/bin/miniconda3/envs/S_microSALT/bin/microSALT analyse /home/proj/stage/microbial/queries/poeticlioness.json --input /home/proj/stage/microbial/fastq/poeticlioness
  • MIP-DNA: /home/proj/production/bin/miniconda3/bin/conda run --name P_mip12.1 /home/proj/production/bin/miniconda3/envs/P_mip12.1/bin/mip analyse rd_dna --config /home/proj/production/servers/config/hasta.scilifelab.se/mip12.1-dna.yaml engagingcrow --slurm_quality_of_service normal --email [email protected]
  • MIP-RNA: /home/proj/production/bin/miniconda3/bin/conda run --name P_mip12.1 /home/proj/production/bin/miniconda3/envs/P_mip12.1/bin/mip analyse rd_rna --config /home/proj/production/servers/config/hasta.scilifelab.se/mip12.1-rna.yaml <case> --slurm_quality_of_service normal --email [email protected]
  • MUTANT: /home/proj/stage/bin/miniconda3/bin/conda run --name S_mutant /home/proj/stage/bin/miniconda3/envs/S_mutant/bin/mutant analyse sarscov2 --config_case /home/proj/stage/mutant/cases/happysole/case_config.json --outdir /home/proj/stage/mutant/cases/happysole/results /home/proj/stage/mutant/cases/happysole/fastq
  • RNAFUSION (current): /home/proj/production/bin/tw launch --work-dir /home/proj/production/analysis/cases/popularjavelin/work --profile singularity --params-file /home/proj/production/analysis/cases/popularjavelin/popularjavelin_params_file.yaml --config /home/proj/production/analysis/cases/popularjavelin/popularjavelin_nextflow_config.json --name popularjavelin --revision 3.0.1 --compute-env hasta-high nfcore_rnafusion
  • TAXPROFILER: /home/proj/stage/bin/tw launch --work-dir /home/proj/stage/analysis/cases/epicalien/work --profile singularity,hasta --params-file /home/proj/stage/analysis/cases/epicalien/epicalien_params_file.yaml --config /home/proj/stage/analysis/cases/epicalien/epicalien_nextflow_config.json --name epicalien --revision 1.1.7 --compute-env hasta-normal nfcore_taxprofiler
  • TOMTE: /home/proj/stage/bin/tw launch --work-dir /home/proj/stage/analysis/cases/legalpeacock/work --profile singularity --params-file /home/proj/stage/analysis/cases/legalpeacock/legalpeacock_params_file.yaml --config /home/proj/stage/analysis/cases/legalpeacock/legalpeacock_nextflow_config.json --name legalpeacock --revision 3.0.0 --compute-env hasta-low gms_tomte
@Vince-janv
Copy link
Contributor Author

Vince-janv commented Jan 27, 2025

Meeting notes 2025-01-27

  • Discussion largely centered around how parameters (varying between pipelines) should be passed into a the submitters and how they should translate that into a CLI string or a json request.
  • The idea was brought forward to use template strings (see cg/apps/demultiplex/sbatch.py for example).
  • The configurator would then send which workflow the submitter should start as well as a the values for the template placeholders. The submitter uses the workflow to fetch the correct template and creates its CLI command or request body.

Example of current suggestion (I didn't bother to put in the ABC inheritance in this example)

from subprocess import run
import requests

from pydantic import BaseModel, Field

## These are the command templates for the CLI's of the different workflows
BALSAMIC_TEMPLATE = ("{balsamic_binary} run analysis --account {account} --mail-user {mail_user} --qos {qos} "
                     "--sample-config {config} --run-analysis --benchmark")

FLUFFY_TEMPLATE = "{something} {else}"


## This is the model translating the configs into the request body for the Tower API
class TomteRequestBody(BaseModel):
    input: str = Field(..., alias="sample_sheet")
    run_bwa_mem: bool = Field(..., alias="run_alignment")


## These map workflows to their respective command templates
WORKFLOW_COMMAND_MAP: dict[str, str] = {
    "balsamic": BALSAMIC_TEMPLATE,
    "fluffy": FLUFFY_TEMPLATE
}

WORKFLOW_REQUEST_MAP: dict[str, type[BaseModel]] = {
    "Tomte": TomteRequestBody,
}


## These are the configs returned byt the configuratiors
class WorkflowConfig(BaseModel):
    workflow: str


class BalsamicConfig(WorkflowConfig):
    balsamic_binary: str
    account: str
    mail_user: str
    qos: str
    config: str


class TomteConfig(WorkflowConfig):
    run_alignment: bool
    sample_sheet: str


class SubProcesSubmitter:
    def submit(self, case_config: WorkflowConfig) -> None:
        command = WORKFLOW_COMMAND_MAP[case_config.workflow].format(**case_config.model_dump())
        print(command)
        #run(command)


class TowerSubmitter:
    def submit(self, case_config: WorkflowConfig) -> None:
        tower_request = WORKFLOW_REQUEST_MAP[case_config.workflow](**case_config.model_dump())
        print(tower_request.model_dump_json())
        # requests.post("https://tower.com/api/v1/workflows", json=tower_request.model_dump())


class BalsamicConfigurator:
    def configure(self, case_id: str) -> BalsamicConfig:
        config_file = self.write_config(case_id)
        return BalsamicConfig(
            workflow="balsamic",
            balsamic_binary="/usr/bin/balsamic",
            account="stage",
            mail_user="[email protected]",
            qos="high",
            config=config_file
        )

    def write_config(self, case_id: str) -> str:
        return f"config_{case_id}.yaml"


class TomteConfigurator:
    def configure(self, case_id: str) -> TomteConfig:
        sample_sheet = self.write_sample_sheet(case_id)
        return TomteConfig(
            workflow="Tomte",
            sample_sheet=sample_sheet,
            run_alignment=True
        )

    def write_sample_sheet(self, case_id: str) -> str:
        return f"sample_sheet_{case_id}.csv"


if __name__ == '__main__':
    balsamic_case = "case_1"
    balsamic_configurator = BalsamicConfigurator()
    balsamic_config = balsamic_configurator.configure(balsamic_case)
    submitter = SubProcesSubmitter()
    submitter.submit(balsamic_config)

    tomte_case = "case_2"
    tomte_configurator = TomteConfigurator()
    tomte_config = tomte_configurator.configure(tomte_case)
    submitter = TowerSubmitter()
    submitter.submit(tomte_config)

@Vince-janv
Copy link
Contributor Author

How to proceed?

Suggestion:

  • Aim to rewrite the starting logic for Raredisease and microSALT (one using tower and one using CLI)

@Vince-janv
Copy link
Contributor Author

Draft of design
Image

@diitaz93 diitaz93 linked a pull request Feb 19, 2025 that will close this issue
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant