You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thanks for the great work. It is really a breeze working with this pipeline and having so much QC and other perks like the auto strandedness at hand. However, I recently found a missing peace that I think comes in very handy in situations where one processes many samples downloaded from a public repository (like I currently do).
The problem:
So my main workflow nowadays is (i) generate a sample list, (ii) download with fetchngs (sratools), (iii) process data with a modified version of rnaseq (only wanted the auto strandedness and did my own alignment processing). Especially with public repositories and sratools it sometimes seems to happen that there are non-terminating exceptions which make the fetchngs process seem to complete normally but actually the files miss some reads or whatever (I encountered a bunch of different situations). This subsequently leads to some problems with processing with rnaseq where either FQ_LINT or trimming fails because readfiles (especially paired) don't match. First I thought that this may be easily solved by just ignoring FQ_LINT errors which would saveguard trimming end everything downstream because in my mind the data flow was FQ_LINT -> TRIM -> everything else. Unfortunately, I found FQ_LINT is not feeding into the trimming processes which results in trimming errors on the same samples that also fail linting.
Expected behaviour:
ignoring linting errors should saveguard trimming by simply ignoring all samples that fail at linting
Observed behaviour:
linting and trimming are completely independent processes feed from the same channel so even though linting fails trimming commences on the same sample
Solution:
My solution for this is simply to join the output of FQ_LINT with the input channel of the trimming stage like so (file subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/main.nf:
This construct basically filters all samples that fail at linting and prevents trimming errors later on. I personally find this to be the correct behaviour and I think it would help other people dealing with failing pipelines when processing a large number of samples.
The text was updated successfully, but these errors were encountered:
Yes, you're correct, it's just a bad channel usage (by me). Here's the fix in nf-core/modules. This will need merging and then updating in the workflow: nf-core/modules#7881
Description of feature
First of all thanks for the great work. It is really a breeze working with this pipeline and having so much QC and other perks like the auto strandedness at hand. However, I recently found a missing peace that I think comes in very handy in situations where one processes many samples downloaded from a public repository (like I currently do).
The problem:
So my main workflow nowadays is (i) generate a sample list, (ii) download with fetchngs (sratools), (iii) process data with a modified version of rnaseq (only wanted the auto strandedness and did my own alignment processing). Especially with public repositories and sratools it sometimes seems to happen that there are non-terminating exceptions which make the fetchngs process seem to complete normally but actually the files miss some reads or whatever (I encountered a bunch of different situations). This subsequently leads to some problems with processing with rnaseq where either FQ_LINT or trimming fails because readfiles (especially paired) don't match. First I thought that this may be easily solved by just ignoring FQ_LINT errors which would saveguard trimming end everything downstream because in my mind the data flow was FQ_LINT -> TRIM -> everything else. Unfortunately, I found FQ_LINT is not feeding into the trimming processes which results in trimming errors on the same samples that also fail linting.
Expected behaviour:
ignoring linting errors should saveguard trimming by simply ignoring all samples that fail at linting
Observed behaviour:
linting and trimming are completely independent processes feed from the same channel so even though linting fails trimming commences on the same sample
Solution:
My solution for this is simply to join the output of FQ_LINT with the input channel of the trimming stage like so (file
subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/main.nf
:This construct basically filters all samples that fail at linting and prevents trimming errors later on. I personally find this to be the correct behaviour and I think it would help other people dealing with failing pipelines when processing a large number of samples.
The text was updated successfully, but these errors were encountered: