Implementation of fetch_pdb() #4943

jauy123 · 2025-03-03T18:28:48Z

Changes made in this Pull Request:

This is a still work in progress, but here's a implementation of @BradyAJohnston 's code wrapped into classes. I still need to write tests and docs for the entire thing.

Added two classes: DownloaderBase and 'PDBDownloader' in order to implement downloading structure file from online sources such as the PDB databank.
Added requests as a dependency
mda.fetch_pdb() is implemented as a wrapper to commonly used option in 'PDBDownloader'

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--4943.org.readthedocs.build/en/4943/

jauy123 · 2025-03-03T18:31:48Z

I'm not sure where to put this code in the codebase, so I create a new folder for it right now. I'm open to it being moved somewhere

Some stuff which I like to still add (besides tests and docs):

Verbose option for PdbDownloader.download() (I think tqdm was a dependency last time I saw?)
Integration with Mdanalysis' logger
Probably could wrap the cache logic into a separate function

BradyAJohnston · 2025-03-04T01:20:19Z

I think others will have to confirm, but likely we'll want to have requests be an optional dependency to reduce core library dependencies (as the fetching of structures won't be something that lot of users will be doing).

Additional it's not finalised yet but if the mmcif reader in #2367 gets finalised then the default download shouldn't be .pdb (but can remain for now).

…xt files

codecov · 2025-03-04T20:44:31Z

Codecov Report

Attention: Patch coverage is 91.86047% with 7 lines in your changes missing coverage. Please review.

Project coverage is 93.40%. Comparing base (dcaa087) to head (651bf26).

Files with missing lines	Patch %	Lines
package/MDAnalysis/web/downloaders.py	91.02%	3 Missing and 4 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4943      +/-   ##
===========================================
- Coverage    93.42%   93.40%   -0.02%     
===========================================
  Files          177      193      +16     
  Lines        21859    23014    +1155     
  Branches      3078     3091      +13     
===========================================
+ Hits         20422    21497    +1075     
- Misses         986     1062      +76     
- Partials       451      455       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jauy123 · 2025-03-05T02:49:30Z

I'm ok with that. I can make the code raise an exception if requests is not in the environment.

…since text files are just binary files with special encoding

Instead of Temporary File (which are slower), buffers are used instead!

jauy123 · 2025-03-10T17:41:23Z

Assuming that requests will be optional dependency, how would I exactly specify in the build files? Right now, I'm just hard coding it in, so that the github CLI tests can build successfully and run.

modified: pyproject.toml

BradyAJohnston · 2025-03-19T12:36:18Z

You've added it to one of the optional dependency categories which is all that should be required. For the actual files where it is used you'll need to have something setup like the usage of biopython:

mdanalysis/package/MDAnalysis/analysis/align.py

Lines 200 to 207 in dcaa087

    
           try: 
        
               import Bio.AlignIO 
        
               import Bio.Align 
        
               import Bio.Align.Applications 
        
           except ImportError: 
        
               HAS_BIOPYTHON = False 
        
           else: 
        
               HAS_BIOPYTHON = True

I'm not an expert on the pipelines so someone else would have to pitch in more on that.

jauy123 · 2025-03-19T23:14:58Z

Thanks for the comment!

jauy123 · 2025-03-19T23:15:52Z

I happen to have another question!

Is it normal for some of the tests to not be consistent across each commit? From what I understand, each github CLI has to get and build each MDAnalysis from source, and this instance can potentially timeout from what I observe across each commit.

The macOS (of the latest commit) failed at 97% of test because it reached the max wall time of two hours.

Even then the latest Azure tests failed because of other tests in the source code which I didn't write (namely due to other tests)

From Azure_Tests Win-Python310-64bit-full (commit 651bf267076d2d7da6491608b1b5136915caf2e2)

FAIL MDAnalysisTests/coordinates/test_h5md.py::TestH5MDReaderWithRealTrajectory::test_open_filestream - Issue #2884
XFAIL MDAnalysisTests/coordinates/test_h5md.py::TestH5MDWriterWithRealTrajectory::test_write_with_drivers[core] - occasional PermissionError on windows
XFAIL MDAnalysisTests/coordinates/test_memory.py::TestMemoryReader::test_frame_collect_all_same - reason: memoryreader allows independent coordinates
XFAIL MDAnalysisTests/coordinates/test_memory.py::TestMemoryReader::test_timeseries_values[slice0] - reason: MemoryReader uses deprecated stop inclusive indexing, see Issue #3893
XFAIL MDAnalysisTests/coordinates/test_memory.py::TestMemoryReader::test_timeseries_values[slice1] - reason: MemoryReader uses deprecated stop inclusive indexing, see Issue #3893
XFAIL MDAnalysisTests/coordinates/test_memory.py::TestMemoryReader::test_timeseries_values[slice2] - reason: MemoryReader uses deprecated stop inclusive indexing, see Issue #3893
XFAIL MDAnalysisTests/core/test_topologyattrs.py::TestResids::test_set_atoms
XFAIL MDAnalysisTests/lib/test_util.py::test_which - util.which does not get right binary on Windows
XFAIL MDAnalysisTests/converters/test_rdkit.py::TestRDKitFunctions::test_order_independant_issue_3339[C-[N+]#N] - Not currently tackled by the RDKitConverter
XFAIL MDAnalysisTests/converters/test_rdkit.py::TestRDKitFunctions::test_order_independant_issue_3339[C-N=[N+]=[N-]] - Not currently tackled by the RDKitConverter
XFAIL MDAnalysisTests/converters/test_rdkit.py::TestRDKitFunctions::test_order_independant_issue_3339[C-[O+]=C] - Not currently tackled by the RDKitConverter
XFAIL MDAnalysisTests/converters/test_rdkit.py::TestRDKitFunctions::test_order_independant_issue_3339[C-[N+]#[C-]] - Not currently tackled by the RDKitConverter
XFAIL MDAnalysisTests/coordinates/test_dcd.py::TestDCDReader::test_frame_collect_all_same - reason: DCDReader allows independent coordinates.This behaviour is deprecated and will be changedin 3.

orbeckst · 2025-03-20T18:17:46Z

In principle, tests should pass everywhere.

The Azure tests time out in the test

_________________________ Test_Fetch_Pdb.test_timeout _________________________

which looks like something that you added. I haven't looked at your code but it might simply be the case that some stuff needs to be written differently for windows.

jauy123 added 3 commits February 25, 2025 12:41

Added requests as a dependency

0d80b72

Merge remote-tracking branch 'upstream/develop' into downloads

2a6c6b2

Inital download code

b0d9d9c

jauy123 added 2 commits March 3, 2025 12:53

fixed typo

93eb5e9

cleaner convert_to_universe()

ea12a4c

BradyAJohnston self-assigned this Mar 4, 2025

jauy123 added 3 commits March 4, 2025 10:14

Added abc module and allowed closing of file stream for downloaded te…

f99fa11

…xt files

Merge remote-tracking branch 'upstream/develop' into downloads

d49c93b

Fixed __all__ -- should fixed pull request test on github

99fe8cc

jauy123 added 17 commits March 5, 2025 14:49

refactored cache logic

c1eb622

Initial tests

98880c2

Added __init.py to make tests work

5227226

typos fixed

3116310

Refactored Tests -- put them in classes!

cc4398f

PdbDownloader().download() now downloads in binary rather than text (…

ef1f73e

…since text files are just binary files with special encoding

Updated Tests to comply with pdb.gz

736cca0

Added Progress bar to PdbDownloader().download()

dbace5a

Added a few clarifications to _requests_progress_bar

4b63b0a

Added filename attribute() to BaseDownloader()

bcedd75

made _requests_progress_bar a private method of PdbDownloader

11ce34f

minor comments

0930a8c

Added Buffer as default option for PdbDownloader.download()

a3c9872

Instead of Temporary File (which are slower), buffers are used instead!

Renamed PdbDownloader to PDBDownloader to match PDBReader()

1f593c3

better __str__ method for BaseDownloader()

8cb4609

Enhanced tests

b2fa607

Added TODO list for future me

6d0a39c

jauy123 added 2 commits March 18, 2025 13:05

Merge remote-tracking branch 'upstream/develop' into downloads

7bac78f

Added requests as optional dep to pyproject.toml

651bf26

modified: pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of fetch_pdb() #4943

Implementation of fetch_pdb() #4943

jauy123 commented Mar 3, 2025 •

edited

Loading

jauy123 commented Mar 3, 2025 •

edited

Loading

BradyAJohnston commented Mar 4, 2025 •

edited

Loading

codecov bot commented Mar 4, 2025 •

edited

Loading

jauy123 commented Mar 5, 2025

jauy123 commented Mar 10, 2025

BradyAJohnston commented Mar 19, 2025

jauy123 commented Mar 19, 2025

jauy123 commented Mar 19, 2025

orbeckst commented Mar 20, 2025

Implementation of fetch_pdb() #4943

Are you sure you want to change the base?

Implementation of fetch_pdb() #4943

Conversation

jauy123 commented Mar 3, 2025 • edited Loading

PR Checklist

Developers Certificate of Origin

jauy123 commented Mar 3, 2025 • edited Loading

BradyAJohnston commented Mar 4, 2025 • edited Loading

codecov bot commented Mar 4, 2025 • edited Loading

Codecov Report

jauy123 commented Mar 5, 2025

jauy123 commented Mar 10, 2025

BradyAJohnston commented Mar 19, 2025

jauy123 commented Mar 19, 2025

jauy123 commented Mar 19, 2025

orbeckst commented Mar 20, 2025

jauy123 commented Mar 3, 2025 •

edited

Loading

jauy123 commented Mar 3, 2025 •

edited

Loading

BradyAJohnston commented Mar 4, 2025 •

edited

Loading

codecov bot commented Mar 4, 2025 •

edited

Loading