Add ensemble model #72

sfluegel05 · 2025-02-07T12:47:37Z

Problem

Currently, we have a range of different approaches for classifying molecules in ChEBI (ELECTRA-based, GNN-based (https://github.com/ChEB-AI/python-chebai-graph) and algorithmic / logic-based (https://github.com/sfluegel05/chemlog2).

All approaches have specific strengths and weaknesses. The goal of an ensemble is to take different methods and aggregate their predictions so that the final result is better than the individual results.

Task

The architecture of the ensemble methods should take the following input:

For each model and ChEBI class: A prediction (possible values: true, false, error / don't know, out of scope), possibly also a confidence value (real number that indicates how sure the model is about a prediction, e.g. from 0 - no confidence to 1 - very confident)

It should aggregate these values into a single prediction (for each class), taking into account the predictions of each model and the "trustworthiness" of the model (this score is specific to each class, and possibly different for positive and negative predictions).

Example:

Given a ChEBI class, we have received the following predictions:

model A	model B	model C	model D
true	false	true	out of scope

The simplest approach would be to weight all models equally and return true for this class (with a 2-1 vote). However, we should also take the trustworthiness into account. These values might come from the precision / true predictive value (TPV; TP / (TP + FP)) and negative predictive value (NPV; TN / (TN + FN)) of a model on a test set.

metric	model A	model B	model C	model D
TPV	0.7	0.8	0.6	0.7
NPV	0.9	0.99	0.8	1

In other words: If model A and model C predict "true" for this class, they are correct in 70% and 60% of cases (according to their TPV). If model model B predicts "false" for this class, it is correct in 99% of cases (according to the NPV).

An aggregation method would then weight two predictions with "trustworthiness" of 0.7 and 0.6 against one with 0.99. Depending on the aggregation method used, it might decide to trust model B.

Future work

Extend this method towards a hierarchical ontology-based approach
Use bagging and boosting to improve perfomance further

The text was updated successfully, but these errors were encountered:

aditya0by0 · 2025-02-25T12:01:10Z

How to combine multiple lightning module and save hyperparameters Lightning-AI/pytorch-lightning#7249

aditya0by0 self-assigned this Feb 7, 2025

aditya0by0 added the priority: medium label Mar 4, 2025

aditya0by0 linked a pull request Mar 17, 2025 that will close this issue

Ensemble Models #77

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ensemble model #72

Add ensemble model #72

sfluegel05 commented Feb 7, 2025

aditya0by0 commented Feb 25, 2025

Add ensemble model #72

Add ensemble model #72

Comments

sfluegel05 commented Feb 7, 2025

Problem

Task

Example:

Future work

aditya0by0 commented Feb 25, 2025