Tesseract is a cross-OS optical character recognition (OCR) engine developed by HP in the 1980s, and since 2006, maintained by Google as an open-source project with high marks for accuracy in reading raw image data into digital characters. The project has been continuously developed and now offers OCR supported by LSTM neural networks for highly improved results.
In this session, we’ll use the Python wrapper for Tesseract to first test drive OCR on images through code before connecting our solution to a live IP video feed from your smartphone processed through OpenCV, and then translating the resultant text stream into audible form with gTTS (Google Text-To-Speech), enabling our mashup program to automatically read out loud from any script it ‘sees’.
Real-time OCR Text-to-Speech with Tesseract
Tesseract is a cross-OS optical character recognition (OCR) engine developed by HP in the 1980s, and since 2006, maintained by Google as an open-source project with high marks for accuracy in reading raw image data into digital characters. The project has been continuously developed and now offers OCR supported by LSTM neural networks for highly improved results.
In this session, we’ll use the Python wrapper for Tesseract to first test drive OCR on images through code before connecting our solution to a live IP video feed from your smartphone processed through OpenCV, and then translating the resultant text stream into audible form with gTTS (Google Text-To-Speech), enabling our mashup program to automatically read out loud from any script it ‘sees’.
Prerequisites:
— Python IDE such as PyCharm
— The Tesseract engine
— A smartphone configured as an IP Webcam