Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Release v17.0.0 #1543

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

feat: Release v17.0.0 #1543

wants to merge 25 commits into from

Conversation

mathiasbio
Copy link
Collaborator

@mathiasbio mathiasbio commented Mar 3, 2025

Description

Big update with a few primary features:

Added:

Changed:

Removed:

Fixed:

Pre-Validation Checklist

Before proceeding with the validation process, ensure that the following tasks have been completed:

  • Install Balsamic in stage and production environments in hasta and build its cache.

    • BALSAMIC was installed on stage after making the BALSAMIC release_v17.0.0 branch, from the instructions given above:
    1. sudo <...>
    2. tmux new -s <...>
    3. Activate stage and conda environments.
    4. pip uninstall balsamic
    5. pip install --no-cache-dir -U git+https://github.com/Clinical-Genomics/BALSAMIC@release_v17.0.0
    6. ``balsamic init --out-dir /home/proj/stage/cancer/balsamic_cache --account development --cosmic-key ${COSMIC_KEY} --genome-version hg19 --cache-version 17.0.0 --run-mode local --snakemake-opt "--cores 8" -r`
    7. balsamic init --out-dir /home/proj/stage/cancer/balsamic_cache --account development --cosmic-key ${COSMIC_KEY} --genome-version hg38 --cache-version 17.0.0 --run-mode local --snakemake-opt "--cores 8" -r
    8. Export most recent loqusDB databases by following: https://atlas.scilifelab.se/infrastructure/bioinformatic_pipelines/BALSAMIC/export_loqusdb_variants/
  • Confirm the availability of necessary resources, such as test cases. (Made script to verify this automatically: /home/proj/stage/cancer/validation/scripts/verify_presence_of_test_samples.sh)

  • Review the changelog and ensure all changes and updates are documented:

    Document Sections to Be Updated Pull Request
    Balsamic Documentation Already updated in development PRs NA
    Atlas Documentation QC tresholds, Sex check, New validation samples https://github.com/Clinical-Genomics/atlas/pull/3302
  • Set up the stage environment with the necessary software and configurations:

    Software Current Version Pull Request with the Required Updates
    CG 69.1.3 add soft-filter-normal argument to paired tga analyses cg#4157

Workflow integrity results

Workflow Integrity Verification Cases

More details here:

CaseID limsID AnalysisType T/TN SequencingType ExpectedQC Status(Pass/Fail) Warnings Warnings observed before (yes / no)
civilsole ACC7204A2 balsamic tumor-only WGS Fail (PCT_60X=0.004532) QC metric PCT_60X: 0.004522 validation has failed. 🟢
fleetjay ACC6307A1:ACC5821A7 balsamic tumor-normal WGS QC metric PCT_60X: 0.006589 validation has failed. (Condition: gt 0.8, ID: ACC5821A7) QC metric PCT_60X: 0.006589 validation has failed. (Condition: gt 0.8, ID: ACC5821A7), QC metric MEDIAN_COVERAGE: 25.0 validation has failed. (Condition: gt 26, ID: ACC5821A7) 🟢 * See comment below
setamoeba ACC8254A2 balsamic tumor-only TGA Pass Pass 🟢
unitedbeagle ACC6225A18:ACC6225A14 balsamic tumor-normal TGA Pass Pass 🟢
uphippo ACC5611A3 balsamic-umi tumor-only TGA Fail (GC dropout=1.650392) , QC metric PCT_TARGET_BASES_1000X: 0.915272 validation has failed. (Condition: gt 0.95, ID: ACC5611A3) QC metric PCT_TARGET_BASES_1000X: 0.915272 validation has failed. (Condition: gt 0.95, ID: ACC5611A3). QC metric GC_DROPOUT: 1.62827 validation has failed. (Condition: lt 1.0, ID: ACC5611A3), COMPARE_PREDICTED_TO_GIVEN_SEX: male validation has failed. (Condition: eq female, ID: uphippo_tumor) 🟢 Sex is set to unknown, which defaults to female so sex check failing is expected
equalbug ACC7363A2:ACC7356A4 balsamic-umi tumor-normal TGA Fail (GC_DROPOUT=1.087173 and RELATEDNESS=-0.524) QC metric GC_DROPOUT: 1.069459 validation has failed. (Condition: lt 1.0, ID: ACC7356A4)., QC metric COMPARE_PREDICTED_TO_GIVEN_SEX: male validation has failed. (Condition: eq female, ID: equalbug_tumor), QC metric COMPARE_PREDICTED_TO_GIVEN_SEX: male validation has failed. (Condition: eq female, ID: equalbug_normal). QC metric RELATEDNESS: -0.512 validation has failed. (Condition: gt 0.8, ID: equalbug). 🟢 Sex is set to unknown, which defaults to female so sex check failing is expected

Issues:
* This is an new QC threshold for normals which should not fail for tumor samples: I opened up an issue to fix: #1549 and solved it here: #1555

Hg38 integrity verifications:

AnalysisType T/TN SequencingType ExpectedQC Status(Pass/Fail)
balsamic tumor-normal WGS should complete without errors N/A

Issues discovered for HG38 previously in release 16.0.0 and which has not been solved here:

cg command: cg workflow balsamic start --genome-version hg38 wgscase collects loqusdb and other references specifically for hg19 and adds to the case.

Despite starting the case manually the analysis fails on multiple places. Copying the failed case to a folder to investigate after release: /home/proj/development/cancer/failed_cases/hg38_release16.0.0_failed

Release specific integration verifications:

AnalysisType T/TN SequencingType ExpectedQC Status(Pass/Fail)
N/A N/A N/A N/A
  • This section has been verified successfully

Storage, Delivery and Upload Integrity Verifications

Processes Affected in current version Affected workflows
New files to be stored N/A N/A
New files to be delivered N/A N/A
New files to be uploaded to Scout N/A N/A
Changes to Housekeeper IDs N/A N/A
Changes to Scout upload N/A N/A
AnalysisType T/TN Storage status Delivery status Upload status
balsamic wgs tumor-only pass pass pass
balsamic wgs tumor-normal pass pass pass
balsamic tumor-only pass pass pass
balsamic tumor-normal pass pass pass
balsamic-umi tumor-only pass pass pass
balsamic-umi tumor-normal pass pass pass
  • This section has been verified successfully, or been identified as irrelevant for the current verification

Validation and implementation plan status

Pull-request for validation-report made here: https://github.com/Clinical-Genomics/validations/pull/241

  • Validation report signed

Pull-request for implementation-plan here: LINK

  • Implementation plan signed

mathiasbio and others added 17 commits November 21, 2024 12:18
#### Added

- sex prediction tools and specified sex-verification for all workflows
#### Added

- new argument `--soft-filter-normal` to disable hard filtration of matched normal filters

#### Changed

- refactored variant-filter constants and reworked bcftools filters
- renamed high_normal_tumor_af_frac to in_normal
#### Removed

- WGS-level GC bias metric from TGA workflow
#### Added

- TNscope MNV merge script

#### Changed

- merge SNVs into MNVs in TNscope TGA
- change raw delivery SNV file for TGA to before any post-processing
#### Changed

- lowered threads for bcftools and CADD rules

#### Fixed

- changed name of benchmark files for annotation rules to avoid name conflicts
#### Changed

- Replaced bcftools concat with custom python script for merging VCFs from VarDict and TNscope
#### Changed

- VarDict java memory option and threads
#### Changed

- pinned arraymancer version in somalier docker build recipe
#### Added

- New ONC annotations from Clinvar 

#### Changed

- Added GT field to IGH-DUX4 variant 
- Changed QC thresholds for WGS normal and WES
#### Added

-  max SOR 4 filter to WGS TN SNV quality filter
### Added

- Adds an option to specify memory
- Adds memory to tnscope tn wgs

### Changed

- Set different resources for tnscope tga and wgs

### Changed

- Disabled SV calling in TNscope
#### Added

- max RPA 12 filter to TNscope TGA workflow
- max SOR 3 filter to TNscope TGA TN workflow
- filter to remove MERGED variants after MNV post-processing in TGA workflow

#### Fixed

- vcf sort step to TNscope VCF after MNV post-processing in TGA workflow
- set correct exome DP filter by refactoring some retrieve filter functions
Copy link

codecov bot commented Mar 3, 2025

Codecov Report

Attention: Patch coverage is 98.93617% with 1 line in your changes missing coverage. Please review.

Project coverage is 99.40%. Comparing base (3419936) to head (baa2e84).

Files with missing lines Patch % Lines
BALSAMIC/models/scheduler.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1543      +/-   ##
==========================================
- Coverage   99.44%   99.40%   -0.04%     
==========================================
  Files          40       40              
  Lines        1975     2019      +44     
==========================================
+ Hits         1964     2007      +43     
- Misses         11       12       +1     
Flag Coverage Δ
unittests 99.40% <98.93%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mathiasbio mathiasbio self-assigned this Mar 4, 2025
#### Removed

- remove parallelization per chromosome for VarDict
#### Changed

- renamed CLNACC to CLNVID for Clinvar annotations
@mathiasbio mathiasbio marked this pull request as ready for review March 11, 2025 16:20
@mathiasbio mathiasbio requested a review from a team as a code owner March 11, 2025 16:20
#### Fixed

- excluded MEDIAN_COVERAGE from tumor coverage qc threshold
@mathiasbio mathiasbio requested a review from fevac March 19, 2025 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

[Bug] Tumor WGS QC thresholds fails on new normal threshold
2 participants