make trimming input reliant on fq lint output #1514

dmalzl · 2025-03-17T15:25:55Z

Description of feature

First of all thanks for the great work. It is really a breeze working with this pipeline and having so much QC and other perks like the auto strandedness at hand. However, I recently found a missing peace that I think comes in very handy in situations where one processes many samples downloaded from a public repository (like I currently do).

The problem:
So my main workflow nowadays is (i) generate a sample list, (ii) download with fetchngs (sratools), (iii) process data with a modified version of rnaseq (only wanted the auto strandedness and did my own alignment processing). Especially with public repositories and sratools it sometimes seems to happen that there are non-terminating exceptions which make the fetchngs process seem to complete normally but actually the files miss some reads or whatever (I encountered a bunch of different situations). This subsequently leads to some problems with processing with rnaseq where either FQ_LINT or trimming fails because readfiles (especially paired) don't match. First I thought that this may be easily solved by just ignoring FQ_LINT errors which would saveguard trimming end everything downstream because in my mind the data flow was FQ_LINT -> TRIM -> everything else. Unfortunately, I found FQ_LINT is not feeding into the trimming processes which results in trimming errors on the same samples that also fail linting.

Expected behaviour:
ignoring linting errors should saveguard trimming by simply ignoring all samples that fail at linting

Observed behaviour:
linting and trimming are completely independent processes feed from the same channel so even though linting fails trimming commences on the same sample

Solution:
My solution for this is simply to join the output of FQ_LINT with the input channel of the trimming stage like so (file subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/main.nf:

  ch_filtered_reads
      .join ( FQ_LINT.out.lint )
      .map { it[0..-2] }
      .set { ch_linted_reads }

This construct basically filters all samples that fail at linting and prevents trimming errors later on. I personally find this to be the correct behaviour and I think it would help other people dealing with failing pipelines when processing a large number of samples.

The text was updated successfully, but these errors were encountered:

dmalzl · 2025-03-18T09:37:52Z

just saw that this actually already in the code but just not used. So maybe someone forgot to use the right channel downstream

pinin4fjords · 2025-03-21T14:37:54Z

Yes, you're correct, it's just a bad channel usage (by me). Here's the fix in nf-core/modules. This will need merging and then updating in the workflow: nf-core/modules#7881

dmalzl added the enhancement label Mar 17, 2025

pinin4fjords linked a pull request Mar 21, 2025 that will close this issue

Update preprocessing subworkflow to fix linting block on trimming #1523

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make trimming input reliant on fq lint output #1514

make trimming input reliant on fq lint output #1514

dmalzl commented Mar 17, 2025

dmalzl commented Mar 18, 2025

pinin4fjords commented Mar 21, 2025 •

edited

Loading

make trimming input reliant on fq lint output #1514

make trimming input reliant on fq lint output #1514

Comments

dmalzl commented Mar 17, 2025

Description of feature

dmalzl commented Mar 18, 2025

pinin4fjords commented Mar 21, 2025 • edited Loading

pinin4fjords commented Mar 21, 2025 •

edited

Loading