Skip to content

Activity

ENH: Improve PDFium text extraction (#11)

Pull request merge
MartinThomapushed 1 commit to main • 4f14b3c…24c51dd • 
on Oct 31, 2023

Update: pypdf got faster

MartinThomapushed 1 commit to main • ce340e8…4f14b3c • 
on Aug 26, 2023

Add header/footer removal for pypdf

MartinThomapushed 1 commit to main • 267a925…ce340e8 • 
on Aug 2, 2023

Update after changing the ground truth

MartinThomapushed 1 commit to main • 8e94c02…267a925 • 
on Aug 1, 2023

MAINT: Fix hyphenation

MartinThomapushed 1 commit to main • e7fb117…8e94c02 • 
on Aug 1, 2023

MAINT: pypdf now applies post-processing

MartinThomapushed 1 commit to main • 38a4fa6…e7fb117 • 
on Aug 1, 2023

BUG: Fix read ground truth

MartinThomapushed 1 commit to main • 9633ada…38a4fa6 • 
on Aug 1, 2023

MAINT: pypdf==3.14.0 improved math extraction

MartinThomapushed 1 commit to main • a9b8e27…9633ada • 
on Jul 29, 2023

BUG: Fix ground truth

MartinThomapushed 1 commit to main • 5d37d6b…a9b8e27 • 
on Jul 29, 2023

pypdf==3.12.1 update

MartinThomapushed 1 commit to main • 7345129…5d37d6b • 
on Jul 9, 2023

ENH: Add pdfrw for watermarking

MartinThomapushed 1 commit to main • a78f609…7345129 • 
on Jul 2, 2023

Apply compression for pypdf when watermarking

MartinThomapushed 1 commit to main • 38bdc80…a78f609 • 
on Jul 2, 2023

MAINT: Refactor benchmark script into modules

MartinThomapushed 1 commit to main • 3689773…38bdc80 • 
on Jul 2, 2023

Reflect the fact that it contains several different benchmarks

MartinThomapushed 1 commit to main • dec01bc…3689773 • 
on Jul 2, 2023

DOC: pdfplumber uses pdfminer.six

MartinThomapushed 1 commit to main • 8000a53…dec01bc • 
on Jul 2, 2023

Fix borb text extraction code

MartinThomapushed 1 commit to main • 6e2ba09…8000a53 • 
on Jul 2, 2023

ENH: Re-run benchmark with latest libraries

MartinThomapushed 1 commit to main • a2611a6…6e2ba09 • 
on Jul 1, 2023

ENH: Add watermarking resulting file size

MartinThomapushed 1 commit to main • 6397bd0…a2611a6 • 
on Jul 1, 2023

Add table extraction benchmark

MartinThomapushed 1 commit to main • aa40d4c…6397bd0 • 
on Apr 21, 2023

Ensure tika server is running

MartinThomapushed 1 commit to main • 6684872…aa40d4c • 
on Apr 21, 2023