[Pipeline integration] Starting pipelines #4031

Vince-janv · 2024-12-18T13:40:00Z

Description

The process of starting pipelines is currently extremely flexible. Most of them consist of the the same CLI commands and the same methods to support them, but this results in a lot of code duplication. This flexibility also allows each pipeline to implement the methods slightly differently causing maintainability issues.

Given this we can also define rules that pipelines need to abide to, to be able to be implemented.
For example:

The input parameters should be able to be parsed from a file
Sample information can be submitted in a samplesheet

Additionally we only rely on cg invoking the CLI of the pipeline. For Nextflow analyses we can send requests to seqera-platform instead.

Acceptance criteria

One should be able to add extra flags that will be appended to the start command
New NextFlow pipelines should use the seqera-platform API instead of invoking the CLI

Notes

Example start commands:

BALSAMIC: /home/proj/production/bin/miniconda3/bin/conda run --name P_balsamic16.0.0 /home/proj/production/bin/miniconda3/envs/P_balsamic16.0.0/bin/balsamic run analysis --account production --mail-user [email protected] --qos high --sample-config /home/proj/production/cancer/cases/invitingswan/invitingswan.json --run-analysis --benchmark
BALSAMIC-UMI: /home/proj/stage/bin/miniconda3/bin/conda run --name S_balsamic16.0.0 /home/proj/stage/bin/miniconda3/envs/S_balsamic16.0.0/bin/balsamic run analysis --account development --mail-user [email protected] --qos low --sample-config /home/proj/stage/cancer/cases/solidladybird/solidladybird.json --benchmark
FLUFFY: /home/proj/stage/bin/miniconda3/envs/S_fluffy/bin/fluffy --config /home/proj/stage/servers/config/hasta.scilifelab.se/fluffy-automated-stage.json --sample /home/proj/stage/nipt/cases/goldentiger/SampleSheet_1604_13_Orderform_Ready_Made_Libraries_NIPT2202196.csv --project /home/proj/stage/nipt/cases/goldentiger/fastq --out /home/proj/stage/nipt/cases/goldentiger/output --analyse --batch-ref --slurm_params qos:high
MICROSALT: /home/proj/stage/bin/miniconda3/bin/conda run --name S_microSALT /home/proj/stage/bin/miniconda3/envs/S_microSALT/bin/microSALT analyse /home/proj/stage/microbial/queries/poeticlioness.json --input /home/proj/stage/microbial/fastq/poeticlioness
MIP-DNA: /home/proj/production/bin/miniconda3/bin/conda run --name P_mip12.1 /home/proj/production/bin/miniconda3/envs/P_mip12.1/bin/mip analyse rd_dna --config /home/proj/production/servers/config/hasta.scilifelab.se/mip12.1-dna.yaml engagingcrow --slurm_quality_of_service normal --email [email protected]
MIP-RNA: /home/proj/production/bin/miniconda3/bin/conda run --name P_mip12.1 /home/proj/production/bin/miniconda3/envs/P_mip12.1/bin/mip analyse rd_rna --config /home/proj/production/servers/config/hasta.scilifelab.se/mip12.1-rna.yaml <case> --slurm_quality_of_service normal --email [email protected]
MUTANT: /home/proj/stage/bin/miniconda3/bin/conda run --name S_mutant /home/proj/stage/bin/miniconda3/envs/S_mutant/bin/mutant analyse sarscov2 --config_case /home/proj/stage/mutant/cases/happysole/case_config.json --outdir /home/proj/stage/mutant/cases/happysole/results /home/proj/stage/mutant/cases/happysole/fastq
RNAFUSION (current): /home/proj/production/bin/tw launch --work-dir /home/proj/production/analysis/cases/popularjavelin/work --profile singularity --params-file /home/proj/production/analysis/cases/popularjavelin/popularjavelin_params_file.yaml --config /home/proj/production/analysis/cases/popularjavelin/popularjavelin_nextflow_config.json --name popularjavelin --revision 3.0.1 --compute-env hasta-high nfcore_rnafusion
TAXPROFILER: /home/proj/stage/bin/tw launch --work-dir /home/proj/stage/analysis/cases/epicalien/work --profile singularity,hasta --params-file /home/proj/stage/analysis/cases/epicalien/epicalien_params_file.yaml --config /home/proj/stage/analysis/cases/epicalien/epicalien_nextflow_config.json --name epicalien --revision 1.1.7 --compute-env hasta-normal nfcore_taxprofiler
TOMTE: /home/proj/stage/bin/tw launch --work-dir /home/proj/stage/analysis/cases/legalpeacock/work --profile singularity --params-file /home/proj/stage/analysis/cases/legalpeacock/legalpeacock_params_file.yaml --config /home/proj/stage/analysis/cases/legalpeacock/legalpeacock_nextflow_config.json --name legalpeacock --revision 3.0.0 --compute-env hasta-low gms_tomte

The text was updated successfully, but these errors were encountered:

Vince-janv · 2025-01-27T16:18:11Z

Meeting notes 2025-01-27

Discussion largely centered around how parameters (varying between pipelines) should be passed into a the submitters and how they should translate that into a CLI string or a json request.
The idea was brought forward to use template strings (see cg/apps/demultiplex/sbatch.py for example).
The configurator would then send which workflow the submitter should start as well as a the values for the template placeholders. The submitter uses the workflow to fetch the correct template and creates its CLI command or request body.

Example of current suggestion (I didn't bother to put in the ABC inheritance in this example)

from subprocess import run
import requests

from pydantic import BaseModel, Field

## These are the command templates for the CLI's of the different workflows
BALSAMIC_TEMPLATE = ("{balsamic_binary} run analysis --account {account} --mail-user {mail_user} --qos {qos} "
                     "--sample-config {config} --run-analysis --benchmark")

FLUFFY_TEMPLATE = "{something} {else}"


## This is the model translating the configs into the request body for the Tower API
class TomteRequestBody(BaseModel):
    input: str = Field(..., alias="sample_sheet")
    run_bwa_mem: bool = Field(..., alias="run_alignment")


## These map workflows to their respective command templates
WORKFLOW_COMMAND_MAP: dict[str, str] = {
    "balsamic": BALSAMIC_TEMPLATE,
    "fluffy": FLUFFY_TEMPLATE
}

WORKFLOW_REQUEST_MAP: dict[str, type[BaseModel]] = {
    "Tomte": TomteRequestBody,
}


## These are the configs returned byt the configuratiors
class WorkflowConfig(BaseModel):
    workflow: str


class BalsamicConfig(WorkflowConfig):
    balsamic_binary: str
    account: str
    mail_user: str
    qos: str
    config: str


class TomteConfig(WorkflowConfig):
    run_alignment: bool
    sample_sheet: str


class SubProcesSubmitter:
    def submit(self, case_config: WorkflowConfig) -> None:
        command = WORKFLOW_COMMAND_MAP[case_config.workflow].format(**case_config.model_dump())
        print(command)
        #run(command)


class TowerSubmitter:
    def submit(self, case_config: WorkflowConfig) -> None:
        tower_request = WORKFLOW_REQUEST_MAP[case_config.workflow](**case_config.model_dump())
        print(tower_request.model_dump_json())
        # requests.post("https://tower.com/api/v1/workflows", json=tower_request.model_dump())


class BalsamicConfigurator:
    def configure(self, case_id: str) -> BalsamicConfig:
        config_file = self.write_config(case_id)
        return BalsamicConfig(
            workflow="balsamic",
            balsamic_binary="/usr/bin/balsamic",
            account="stage",
            mail_user="[email protected]",
            qos="high",
            config=config_file
        )

    def write_config(self, case_id: str) -> str:
        return f"config_{case_id}.yaml"


class TomteConfigurator:
    def configure(self, case_id: str) -> TomteConfig:
        sample_sheet = self.write_sample_sheet(case_id)
        return TomteConfig(
            workflow="Tomte",
            sample_sheet=sample_sheet,
            run_alignment=True
        )

    def write_sample_sheet(self, case_id: str) -> str:
        return f"sample_sheet_{case_id}.csv"


if __name__ == '__main__':
    balsamic_case = "case_1"
    balsamic_configurator = BalsamicConfigurator()
    balsamic_config = balsamic_configurator.configure(balsamic_case)
    submitter = SubProcesSubmitter()
    submitter.submit(balsamic_config)

    tomte_case = "case_2"
    tomte_configurator = TomteConfigurator()
    tomte_config = tomte_configurator.configure(tomte_case)
    submitter = TowerSubmitter()
    submitter.submit(tomte_config)

Vince-janv · 2025-01-29T14:48:08Z

How to proceed?

Suggestion:

Aim to rewrite the starting logic for Raredisease and microSALT (one using tower and one using CLI)

Vince-janv · 2025-02-18T14:37:18Z

Draft of design

Vince-janv added this to the Standardised pipeline integration milestone Dec 18, 2024

diitaz93 linked a pull request Feb 19, 2025 that will close this issue

DEV - Start pipelines #4227

Draft

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline integration] Starting pipelines #4031

[Pipeline integration] Starting pipelines #4031

Vince-janv commented Dec 18, 2024 •

edited

Loading

Vince-janv commented Jan 27, 2025 •

edited

Loading

Vince-janv commented Jan 29, 2025

Vince-janv commented Feb 18, 2025

[Pipeline integration] Starting pipelines #4031

[Pipeline integration] Starting pipelines #4031

Comments

Vince-janv commented Dec 18, 2024 • edited Loading

Description

Acceptance criteria

Notes

Vince-janv commented Jan 27, 2025 • edited Loading

Meeting notes 2025-01-27

Vince-janv commented Jan 29, 2025

How to proceed?

Vince-janv commented Feb 18, 2025

Vince-janv commented Dec 18, 2024 •

edited

Loading

Vince-janv commented Jan 27, 2025 •

edited

Loading