Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add toolsheet-related implementations #443

Open
wants to merge 37 commits into
base: dev
Choose a base branch
from
Open

Add toolsheet-related implementations #443

wants to merge 37 commits into from

Conversation

suzannejin
Copy link

@suzannejin suzannejin commented Mar 6, 2025

Changes made to enable the usage of toolsheets:

  • Add predefined toolsheets -> they define the combinations of tools allowed for each study type
  • Add toolsheet_custom flag -> needed later for benchmark functionalities
  • Modify gsea_run and gprofiler2_run flags into functional_method -> more consistent with the ch_tools behaviour
  • Update report rmd file to handle differential_method and functional_method flags
  • Add code to parse ch_tools from toolsheets
  • Handle the results channels inside the workflow in a way that it considers the tools used
  • Make ch_tools args accesible by modules.config through meta.args

Nice to have, but left for next PRs (to not extend this one too much):

  • Replace the checks on params.functional_method, etc by checks on ch_tools
  • Add checks for analysis_name flag
  • Remove tool-specific fixed params (eg. differential_file_suffix, etc) from user params scope (having too many params is confusing) and have them defined inside the workflow based on the chosen method.
  • Add tests for gprofiler2 runs
  • Clean some redundant checks/code/tests related to study_type vs tool combination - before it must be checked, but now we have the predefined tools that dictate which combinations can go together.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@suzannejin suzannejin changed the base branch from master to dev March 6, 2025 16:57
@nf-core nf-core deleted a comment from github-actions bot Mar 6, 2025
Copy link

github-actions bot commented Mar 6, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit bec9609

+| ✅ 310 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   6 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "

    \n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-differentialabundance_logo_dark.png">\n <img alt="nf-core/differentialabundance" src="docs/images/nf-core-differentialabundance_logo_light.png">\n \n

    \n\nGitHub Actions CI Status\nGitHub Actions Linting StatusAWS CICite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\nGet help on SlackFollow on TwitterFollow on MastodonWatch on YouTube\n\n## Introduction\n\nnf-core/differentialabundance is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline \n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run nf-core/differentialabundance \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nnf-core/differentialabundance was originally written by Oskar Wacker, Jonathan Manning.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #differentialabundance channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/differentialabundance for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.0
  • Run at 2025-03-20 10:15:48

@suzannejin suzannejin changed the title Add toolsheet Add toolsheet-related implementations Mar 18, 2025
@pinin4fjords
Copy link
Member

Thanks @suzannejin - another big one. Will review thoroughly when I can, just give me some time :-)

@suzannejin
Copy link
Author

suzannejin commented Mar 20, 2025

Thanks @suzannejin - another big one. Will review thoroughly when I can, just give me some time :-)

Unfortunately many parts of the pipeline had to be changed to properly use the toolsheets args, otherwise rare behaviours raise...

However, if it makes the review process easier for you, I can split an subPR with the changes related to params.differential_method and params.functional_method, which are more independent to the rest.

@suzannejin suzannejin marked this pull request as ready for review March 20, 2025 11:57
@pinin4fjords
Copy link
Member

Thanks @suzannejin - another big one. Will review thoroughly when I can, just give me some time :-)

Unfortunately many parts of the pipeline had to be changed to properly use the toolsheets args, otherwise rare behaviours raise...

However, if it makes the review process easier for you, I can split an subPR with the changes related to params.differential_method and params.functional_method, which are more independent to the rest.

That's OK, I just need some dedicated time with it to understand and look for simplifications, as in the previous PRs.

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduling some time in hackathon hours next week. Just putting a block here so others know I really want a hand in this one :-)

@@ -224,6 +224,16 @@ To override the above options, you may also supply your own features table as a

By default, if you don't provide features, for non-array data the workflow will fall back to attempting to use the matrix itself as a source of feature annotations. For this to work you must make sure to set the `features_id_col`, `features_name_col` and `features_metadata_cols` parameters to the appropriate values, for example by setting them to 'gene_id' if that is the identifier column on the matrix. This will cause the gene ID to be used everywhere rather than more accessible gene symbols (as can be derived from the GTF), but the workflow should run. Please use this option for MaxQuant analysis, i.e. do not provide features.

## Toolsheet

We provide a set of toolsheet files that define the tools that make sense to run for a given study type. These files are in the `assets` directory of the pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a link to the directory on github


We provide a set of toolsheet files that define the tools that make sense to run for a given study type. These files are in the `assets` directory of the pipeline.

Each row defines a combination of differential analysis tool and functional analysis tool (optional), with the respective arguments. Note that the arguments defined in the toolsheet have highest priority, meaning that they will overwrite any other arguments defined in the command line or in the configuration files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each row defines a combination of differential analysis tool and functional analysis tool (optional), with the respective arguments. Note that the arguments defined in the toolsheet have highest priority, meaning that they will overwrite any other arguments defined in the command line or in the configuration files.
Each row defines a combination of differential analysis tool and functional analysis tool (optional), with the respective arguments.
> [!WARNING]
> Note that the arguments defined in the toolsheet have highest priority, meaning that they will overwrite any other arguments defined in the command line or in the configuration files.

@@ -90,9 +90,13 @@ params {
exploratory_log2_assays = 'raw,normalised'
exploratory_palette_name = 'Set1'

// Tools options
analysis_name = null
toolsheet_custom = null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be named only toolsheet? I think it is more intuitive. Or do you think it's confusing?

} else if (params.study_type == 'maxquant') {
ch_toolsheet = Channel.fromList(samplesheetToList("${projectDir}/assets/toolsheet_maxquant.csv", "${projectDir}/assets/schema_tools.json"))
} else {
error("Please make sure to mention the correct study_type")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error("Please make sure to mention the correct study_type")
error("Please make sure to mention the correct study_type. The available options are: 'rnaseq', 'affy_array', 'geo_soft_file' or 'maxquant'")

def meta = [
analysis_name: it[0].analysis_name,
diff_method : it[0].diff_method,
diff_args : getParams('differential', it[0].diff_method) + parseArgs(it[0].diff_args) + [differential_method: it[0].diff_method],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what is getParams used for, and why do you give the string 'differential' or 'functional'?
It's also not clear to me why are you adding the [differential_method: it[0].diff_method] map to the arguments.
Thanks!

// args_functional as a map to stay consistent. We also remove null values
// from files list.
def meta = (it[0].method_functional) ? it[0] : it[0] + [args_functional: [:]]
[it[0], it.tail().grep().flatten()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment describing this channel like you did with the ones before?

def pattern_with_tools = params_pattern + "|${meta.method_differential}" + (meta.method_functional ? "|functional|${meta.method_functional}" : "")
// return params for report
params_with_tools.findAll{ k,v -> k.matches(~/(${pattern_with_tools}).*/) } +
[report_file_names, files.collect{ f -> f.name}].transpose().collectEntries()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a description of the channel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants