When you work from home, the pajama commute is the same even when it snows. Unless, of course, the power goes out. Gotta have power to work!
I regularly test PDF conversion software. Today, I have a new project to organize. It has to do with scanned PDF documents and how words are detected and converted to editable text. In this project, I will be testing the accuracy of OCR functionality in an application. OCR?!? What the heck is OCR?
OCR is a software technology that enables you to convert scanned documents into documents with “live text,” aka readable, searchable text that you can change, copy, edit and basically do anything you regularly do to text.
For more detailed information, visit this resource:
https://en.wikipedia.org/wiki/Optical_character_recognition
This project testing includes converting the scanned PDF files to DOCX and/or TEXT output and then comparing the words and characters to the original source file. Through the test process, I develop the what is called a “TRUTH” file, which outlines the exact characters expected in the converted output. Our test lab uses automation where tests are farmed out to Virtual Machines of various configurations. We use the “TRUTH” file to create automated tests that will be executed in the lab on the various VMs (with every single build), to insure output regressions are caught in a timely manner when changes to the code base are made.
This OCR project will be focused on CJK languages. Chinese, Japanese and Korean.
Yippe-ki-yay! That was my snow day. How was yours?