-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protein function prediction with GO - Part 3 #64
base: dev
Are you sure you want to change the base?
Conversation
- migration from deep go format to chebai->go_uniprot format
- +migration structure changes
I have made the suggested changes for migration. Please check. Config for DeepGO1: class_path: chebai.preprocessing.datasets.go_uniprot.DeepGO1MigratedData
init_args:
go_branch: "MF"
max_sequence_length: 1002
reader_kwargs: {n_gram: 3} Config for DeepGO2: class_path: chebai.preprocessing.datasets.go_uniprot.DeepGO2MigratedData
init_args:
go_branch: "MF"
max_sequence_length: 1000
reader_kwargs: {n_gram: 3} |
- consider proteins domain in the dataset which maps to any selected node irrespective of the hierarchy level
- prop annotations has both direct and transitive annotations
@sfluegel05, I have made the suggested changes for scope. Please check. For DeepGO2, I re-checked the code and didn't find any discrepancy between our implementation and theirs. |
I have generated a new SCOPe50 dataset, but there still seem to be labels which have 0 protein sequences assigned to them. Could you have a look at that? |
@sfluegel05, I have resolved this issue, Please check. |
Now, the number of instances per label is at least 1, but still less than 50 in many cases. The main issue seems to be that the threshold is applied before most of the processing. In the function This should be the other way round:
I hope this helps! |
Thanks for the suggestion. I have fixed the issue and now all labels have more than or equal to 50 true instances for SCOPe50. Also, I have made suggested changes for scope notebook. Please check. |
My first guess is that you have to change |
@sfluegel05, I increased the I have already started the training, but the issue now is that only 5 epochs have been completed in 17 hours. |
Please check here the results after 24hrs of training, only 6 epochs completed. The batch file has maximum 24hrs as timeout. |
PR for the Issue Protein function prediction with GO #36
Note: The above issue will be implemented in 3 PRs:
Protein function prediction with GO #39 (Merged)
Protein function prediction with GO - Part 2 #57 (Merged)
Protein function prediction with GO - Part 3 #64
PR for the issue Add SCOPe dataset to our pipeline #67
Changes to be done in this PR
From comment #36 (comment)