Here are some notes about this testing data. This is accurate on 2025-02-03, so be weary if this is far in the future.
I ran these abstracts against the models on this day, and this is what they came back with:
Model | Abstract 1 | Abstract 2 | Abstract 3 |
---|---|---|---|
deepseek-r1:latest | 25% | 0% | 0% |
granite3-dense:latest | 50% | 80% | 80% |
granite3.1-dense:latest | 80% | 80% | 80% |
llama3.1:latest | 80% | 23% | 82% |
As we add more test data we should keep this overview off different models tracked here.