Minions document OCR Parser with Gemma3 #26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements an OCR parser for Minions with Gemma3. Currently, granite3.2-vision is used for OCR. Inspired by strong vision capability of the recent Gemma3 model and a streamlit demo showcasing the use of Gemma3 for OCR, I added a similar OCR parser to Minions. I copied
app.py
intominions_ocr.py
and updated theextract_text_from_image
function to use Gemma3 with a prompt instructing it to parse the image into structured markdown to be included in the Minions context.I tested the new OCR parser locally on my macBook by selecting
gemma3:4b
as the local model and using theMinion
protocol tosummarize all items purchased
on a restaurant receipt image. Gemma3 successfully summarized all the food items in a restaurant receipt.Here's the receipt I uploaded:

And here's the output trace:
For reference, here's the current behavior with Granite, which also correctly completed the task: