Upcoming Games

(UTC times)


Full list
Add a game

Upcoming Events

No events to display

Who's Online

Roger Pleasant, TUT, Zecs, 0D07, Andrew G, BenWright, Meld, Steamer, pl, JamesN (10 users seen recently)

OCR of tables (e.g. WTTs)

You are here: Home > Forum > Miscellaneous > Open mic (non-railway) > OCR of tables (e.g. WTTs)

Page 1 of 1

OCR of tables (e.g. WTTs) 12/01/2023 at 14:18 #150131
DonRiver
Avatar
151 posts
Was wondering if anyone's had a go at using OCR to parse scanned timetables, e.g. those in Network Rail's archive?

Just looking at Tesseract OCR's documentation (tesseract-ocr.github.io) - it's designed for reading paragraphs of text, not tables - wondering if there's off-the-shelf image processing techniques for recognising each column by its borders, cropping it out of the image, and OCR'ing it in isolation… it _might_ not actually be difficult in Python

(named for the one in Tasmania, not in Russia)
Log in to reply
OCR of tables (e.g. WTTs) 12/01/2023 at 16:08 #150132
bill_gensheet
Avatar
1343 posts
No, but just tried to see how it would go:

https://www.onlineocr.net/pdftoexcel

Seemed quite good except for dealing with times ending ½ which went to % or 1/2.
While fixing the % is easy, 11/221/2 is more complicated to get to 11/22 ½

However that was a 2015 file, which looked like it was printed to pdf rather than scanned.

Log in to reply
The following user said thank you: DonRiver