I’m trying to extract structured data from PDFs that look like old book review/journal pages. Each entry has fields like: author book title publisher year review text etc. The layout is semi-structured, as you can see, and a typical entry looks like a block of text where the bibliographic info comes