Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add referencer utilities and constants for Enterobacteriaceae #201

Open
wants to merge 8 commits into
base: rc4.2.0
Choose a base branch
from

Conversation

karlnyr
Copy link
Contributor

@karlnyr karlnyr commented Mar 12, 2025

Description

This pull request includes several changes to the microSALT project, focusing on improving the handling of organism references and restructuring the codebase for better organization. The most important changes include updating import paths, adding utility functions for organism reference checks, and modifying methods to use these new utilities.

Codebase restructuring and improvements:

  • microSALT/utils/job_creator.py: Updated the import path for Referencer and added a new import for get_reference_if_enterobacteriaceae. Modified the create_blast_search method to use the new utility function for checking Enterobacteriaceae references. [1] [2]

  • microSALT/utils/referencer/constants.py: Introduced StrEnum and new enumerations for Escherichia and Klebsiella species, along with their corresponding references.

  • microSALT/utils/referencer/referencer.py: Renamed the file for better organization and updated the organism2reference method to streamline the reference matching process. [1] [2]

  • microSALT/utils/referencer/utils.py: Added utility functions is_escherichia, is_klebsiella, and get_reference_if_enterobacteriaceae to check organism names and return appropriate references.

  • microSALT/utils/scraper.py: Updated the import path for Referencer and added a new import for get_reference_if_enterobacteriaceae. Modified the scrape_blast method to use the new utility function for checking Enterobacteriaceae references. [1] [2]

Primary function of PR

  • Hotfix
  • Patch
  • Minor functionality improvement
  • New type of analysis
  • Backward-breaking functionality improvement
  • This change requires internal documents to be updated
  • This change requires another repository to be updated

Testing

If the update is a hotfix, it is sufficient to rely on the development testing along with the Travis self-test automatically applied to the PR.

Test routine to verify the stability of the PR:

  • bash /home/proj/production/servers/resources/hasta.scilifelab.se/install-microsalt-stage.sh BRANCHNAME
  • us
  • conda activate S_microSALT
  • (SITUATIONAL) export MICROSALT_CONFIG=/home/proj/dropbox/microSALT.json
  • Select a relevant subset of the following:
  • microSALT analyse project MIC3109
  • microSALT analyse project MIC4107
  • microSALT analyse project MIC4109
  • microSALT analyse project ACC5551

Verify that the results for projects MIC3109, MIC4107, MIC4109 & ACC5551 are consistent with the results attached to AMSystem doc 1490, Microbial_WGS.xlsx

Test results

These are the results of the tests, and necessary conclusions, that prove the stability of the PR.

Sign-offs

Sorry, something went wrong.

@karlnyr karlnyr requested a review from a team as a code owner March 12, 2025 08:53
@karlnyr karlnyr changed the title Add referencer utilities and constants for Escherichia and Klebsiella Add referencer utilities and constants for Enterobacteriaceae Mar 12, 2025
@karlnyr karlnyr linked an issue Mar 12, 2025 that may be closed by this pull request
Copy link
Contributor

@samuell samuell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, apart from some concerns I have with the clarity of some of the code constructs. See below.

Copy link
Contributor

@samuell samuell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it pretty much solved it, but I suggest using a list of tuples instead of a set of tuples, as sets are not sorted (if that matters), and can be a little more confusing for people not used to them.

"""Checks if the organism name is a proper Klebsiella."""
return any(species_name == member.value for member in Klebsiella)


def get_reference_if_enterobacteriaceae(organism_name: str) -> str:
"""Returns the reference for the organism name if it belongs to Enterobacteriaceae."""
species: str = get_species(organism_name)
GENUS_REFERENCE_MAP = {
Copy link
Contributor

@samuell samuell Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The surrounding {} also should probably preferably be changed, e.g. to [] to make it a list, since right now you actually got a "set" (consisting of tuples), since that is what you get with {} containing single items.

Well, a set might technically work here, but I think they also are not guaranteed to be in sorted order, if that matters...

@@ -43,8 +43,8 @@ def test_is_klebsiella(species_name, expected_result):
@pytest.mark.parametrize(
"organism_name, expected_reference",
[
(Escherichia.COLI, ESCHERICHIA_REFERENCE),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes related to the previous discussion, or was this something else?

(No opinions here, just curious)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

microSALT cannot update MLST profile databases due to outdated logic
2 participants