Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizes MDAnalysis.analysis.msd #4896

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

tanishy7777
Copy link
Contributor

@tanishy7777 tanishy7777 commented Jan 20, 2025

Fixes #4676

Changes made in this Pull Request:

  • Added the split-apply-combine technique to parallelize the MDAnalysis.analysis.msd.EinsteinMSD
  • Added boilerplate fixture(s) to testsuite/analysis/conftest.py, analogous with existing ones
  • Added a client_EinsteinMSD, fixtures to all tests using in testsuite/MDAnalysisTests/analysis/test_msd.py, and modified the way run() method is called to run(**client_EinsteinMSD)

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4896.org.readthedocs.build/en/4896/

@pep8speaks
Copy link

pep8speaks commented Jan 20, 2025

Hello @tanishy7777! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2025-01-20 21:03:12 UTC

Copy link

codecov bot commented Jan 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.41%. Comparing base (35d9d2e) to head (c0d855a).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4896      +/-   ##
===========================================
- Coverage    93.42%   93.41%   -0.01%     
===========================================
  Files          177      189      +12     
  Lines        21859    22940    +1081     
  Branches      3078     3078              
===========================================
+ Hits         20422    21430    +1008     
- Misses         986     1059      +73     
  Partials       451      451              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tanishy7777
Copy link
Contributor Author

Just wanted to remind you that this is ready to be merged I think. Please do so at your convenience.
@RMeli @orbeckst

@orbeckst
Copy link
Member

Thanks for your work. I'm currently quite busy, so might not be able to review in the next few days. Please be patient.

@orbeckst
Copy link
Member

@talagayev / @marinegor can you have a look at this PR, please?

@talagayev
Copy link
Member

Checked the code and ran locally, looks all good.

https://github.com/tanishy7777/mdanalysis/blob/18a2e516d914f6dc438b409b403be2a1a3429e77/testsuite/MDAnalysisTests/analysis/test_msd.py#L155

Here @tanishy7777 you could also add the **client_EinsteinMSD to cover the parallelization in test_simple_start_stop_step_all_dims and test_fft_start_stop_step_all_dims, but here I would rely on what @orbeckst suggests if it needs to have the **client_EinsteinMSD or not.

From my side it looks good, good job @tanishy7777 :)

@tanishy7777
Copy link
Contributor Author

Checked the code and ran locally, looks all good.

https://github.com/tanishy7777/mdanalysis/blob/18a2e516d914f6dc438b409b403be2a1a3429e77/testsuite/MDAnalysisTests/analysis/test_msd.py#L155

Here @tanishy7777 you could also add the **client_EinsteinMSD to cover the parallelization in test_simple_start_stop_step_all_dims and test_fft_start_stop_step_all_dims, but here I would rely on what @orbeckst suggests if it needs to have the **client_EinsteinMSD or not.

From my side it looks good, good job @tanishy7777 :)

Thanks a lot for reviewing my PR, will wait for the suggestions as you mentioned.

@tanishy7777
Copy link
Contributor Author

tanishy7777 commented Jan 25, 2025

From my side it looks good, good job @tanishy7777 :)

Also could you please review this PR #4884 its pretty similar or tell me if it needs any more work to be done. Thanks again

@talagayev
Copy link
Member

From my side it looks good, good job @tanishy7777 :)

Also could you please review this PR #4884 its pretty similar or tell me if it needs any more work to be done. Thanks again

Hey @tanishy7777, yes the PR is similar, I can take a look at it as well.

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking here, I just need to check the implementation IIRC there is a reason MSD algo itself is non-parallelisable, but may not apply if only the collection of particle positions is parallelised.

@tanishy7777
Copy link
Contributor Author

tanishy7777 commented Feb 1, 2025

Blocking here, I just need to check the implementation IIRC there is a reason MSD algo itself is non-parallelisable, but may not apply if only the collection of particle positions is parallelised.

Like the tests were passing so I thought it has been parallized

Copy link
Contributor

@marinegor marinegor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tanishy7777, sorry for long review -- many life things got in the way.

I think you're on a good path, I mentioned few minor things in the comments.

Main action items:

  • move out hacky @staticmethod def f(arrays): pass out of the class
  • using datafiles in MDAnalysisTests (imported on top of test_msd.py, check that parallelized run produces exactly the same results as non-parallelized one (add code snippet to comments that anyone can run to check, and its results)
  • add this check as a test (if it's too slow, we can always mark it this way and not run by default)

Please ask if you have questions, and ping me here if I don't reply for more than 48 hours.

Comment on lines +408 to +410
@staticmethod
def f(arrs):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this outside of the class and explicitly name it something like __noop to make sure no one uses that:

def __noop(arrs) -> None: pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are indeed concerns that the algorithm is indeed parallelizable, I would suggest you explicitly test for that: namely, ensure that all results.<something> attributes are exactly the same, regardless of how you run your analysis. You can do this yourself first, and than later add it as one of the tests here.

I should also say that if you make a convenient way to test that, it'd be nice to have it added to other parallelization tests, to ensure additional correctness.

@tanishy7777
Copy link
Contributor Author

hi @tanishy7777, sorry for long review -- many life things got in the way.

I think you're on a good path, I mentioned few minor things in the comments.

Main action items:

  • move out hacky @staticmethod def f(arrays): pass out of the class
  • using datafiles in MDAnalysisTests (imported on top of test_msd.py, check that parallelized run produces exactly the same results as non-parallelized one (add code snippet to comments that anyone can run to check, and its results)
  • add this check as a test (if it's too slow, we can always mark it this way and not run by default)

Please ask if you have questions, and ping me here if I don't reply for more than 48 hours.

Hey, sorry for the late response. I had semester exams so I was quite busy the last 2 weeks. Will start working on this soon!

@marinegor marinegor self-assigned this Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MDAnalysis.analysis.msd: Implement parallelization or mark as unparallelizable
6 participants