Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make trimming input reliant on fq lint output #1514

Open
dmalzl opened this issue Mar 17, 2025 · 2 comments · May be fixed by #1523
Open

make trimming input reliant on fq lint output #1514

dmalzl opened this issue Mar 17, 2025 · 2 comments · May be fixed by #1523

Comments

@dmalzl
Copy link

dmalzl commented Mar 17, 2025

Description of feature

First of all thanks for the great work. It is really a breeze working with this pipeline and having so much QC and other perks like the auto strandedness at hand. However, I recently found a missing peace that I think comes in very handy in situations where one processes many samples downloaded from a public repository (like I currently do).

The problem:
So my main workflow nowadays is (i) generate a sample list, (ii) download with fetchngs (sratools), (iii) process data with a modified version of rnaseq (only wanted the auto strandedness and did my own alignment processing). Especially with public repositories and sratools it sometimes seems to happen that there are non-terminating exceptions which make the fetchngs process seem to complete normally but actually the files miss some reads or whatever (I encountered a bunch of different situations). This subsequently leads to some problems with processing with rnaseq where either FQ_LINT or trimming fails because readfiles (especially paired) don't match. First I thought that this may be easily solved by just ignoring FQ_LINT errors which would saveguard trimming end everything downstream because in my mind the data flow was FQ_LINT -> TRIM -> everything else. Unfortunately, I found FQ_LINT is not feeding into the trimming processes which results in trimming errors on the same samples that also fail linting.

Expected behaviour:
ignoring linting errors should saveguard trimming by simply ignoring all samples that fail at linting

Observed behaviour:
linting and trimming are completely independent processes feed from the same channel so even though linting fails trimming commences on the same sample

Solution:
My solution for this is simply to join the output of FQ_LINT with the input channel of the trimming stage like so (file subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/main.nf:

  ch_filtered_reads
      .join ( FQ_LINT.out.lint )
      .map { it[0..-2] }
      .set { ch_linted_reads }

This construct basically filters all samples that fail at linting and prevents trimming errors later on. I personally find this to be the correct behaviour and I think it would help other people dealing with failing pipelines when processing a large number of samples.

@dmalzl
Copy link
Author

dmalzl commented Mar 18, 2025

just saw that this actually already in the code but just not used. So maybe someone forgot to use the right channel downstream

@pinin4fjords
Copy link
Member

pinin4fjords commented Mar 21, 2025

Yes, you're correct, it's just a bad channel usage (by me). Here's the fix in nf-core/modules. This will need merging and then updating in the workflow: nf-core/modules#7881

@pinin4fjords pinin4fjords linked a pull request Mar 21, 2025 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants