Community

Current best cost-effective way to extract structured data from semi-structured book review PDFs into CSV?

Via r/LocalLlama

Monday, Mar 23, 2026 · 1:20AM

Summary

I’m trying to extract structured data from PDFs that look like old book review/journal pages. Each entry has fields like: author book title publisher year review text etc. The layout is semi-structured, as you can see, and a typical entry looks like a block of text where the bibliographic info comes

Continue reading the full article

Read at r/LocalLlama

www.reddit.com