Stefan Karcher, « OpenMethods introduction to: An end-to-end approach for extracting and segmenting high-variance references from pdf documents », OpenMethods: Highlighting Digital Humanities Methods and Tools, ID : 10670/1.rh0mow
Introduction: Digital text analysis depends on one important thing: text that can be processed with little effort. Working with PDFs often leads to great difficulties, as Zeyd Boukhers Shriharsh Ambhore and Steffen Staab describe in their paper. Their goal is to extract references from PDF documents. Highlight of their described workflow are very impressive precision rates. The paper thereby encourages to a further development of the process and its application as a "method" in the humanities.