How do I perform OCR on documents?
How do I convert image-based documents into text-searchable documents?
You can perform OCR with PDF-XChange Editor, PDF-Tools or even the discontinued product PDF-XChange Viewer:
Note that two optical character recognition engines are available in PDF-XChange Editor: the default OCR engine and the enhanced OCR engine, which is available when PDF-XChange Editor Plus is purchased (either as a stand-alone product or as part of the PDF-XChange PRO bundle). The enhanced OCR engine is faster, more accurate and more dynamic than the default OCR engine, and it also contains some extra features. Further information about the enhanced OCR engine is available here. You can use the OCR preferences (available via the preferences option in the file tab) to switch between default and enhanced OCR:
Default OCR Engine
The default engine's OCR process in PDF-XChange Editor analyzes image-based documents, recognizes text and then places a duplicate, invisible text layer on top of it, which makes the source text selectable and searchable in the same manner as ordinary text. This means that the original, image-based text in documents can effectively be searched and selected via the invisible text layer, which is the main benefit of OCR. However, it should be noted that the document text cannot be edited in the same manner as normal, text-based documents - as it remains an image-based document, despite the invisible text layer. In order to convert image-based text into editable text the enhanced OCR engine must be used.
Click the Convert tab, then click OCR Pages to perform OCR on documents:
The OCR Pages dialog box will open:
Use the Page Range settings to determine the page range for OCR:
Use the Subset options to specify a subset of selected pages. Select All, Odd or Even as desired.
Use the Recognition Options to determine the language and accuracy of the OCR process. Please note that increasing the accuracy also increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document contains imperfections. This is because the software will search to a greater depth and may attempt to recognize imperfections as text. Click Add/Update Languages to add/update the language packs used for OCR.
Click OK to OCR documents.
Enhanced OCR Engine
The Enhanced OCR dialog box appears as detailed below:
The options in this dialog box are the same as those detailed above but with additional Output Options:
Please note that in some cases (for example documents that contain one large graphic zone that takes up the whole page area and has some text zones over it) the visual output for Editable Text and Images and Fine Page Content will be very similar.
Click OK to OCR documents.
Note that it is also possible to OCR documents when scanned content or images are used to create PDF documents, and to perform OCR on only a selected area of documents, as detailed below.
1. Click the File tab, then click New Document and click From Images:
The Image to PDF dialog box will open:
2. Add files and determine settings as detailed here.
3. Click Options for further options. The Image to PDF Options dialog box will open. Click Image Post-Processing to view OCR options when images are converted to PDF:
4. Select the Run OCR box to OCR images when they are converted to PDF. Click OCR Settings to determine language and accuracy options, as detailed above.
1. Click File, then click New Document.
2. Click From Scanner, then click Custom Scan:
3. The Scan Properties dialog box will open:
4. Determine settings as detailed here.
5. Click Images Insertion Options to determine options for inserted images. The Image to PDF Options dialog box will open. Click Image Post-Processing to view OCR options when scanned content is converted to PDF:
6. Select the Run OCR box to OCR images when they are converted to PDF. Click OCR Settings to determine language and accuracy options, as detailed above.
It is also possible to perform OCR on selected regions of documents when either the Snapshot Tool or the Crop Page Tool has been used to define a page area. For example, click Other Tools in the Organize tab, then click Snapshot Tool and click and drag the mouse to define a snapshot area:
When the area has been defined, right-click it and then click OCR Selected Region in the shortcut menu:
The OCR Options dialog box will open. Determine parameters as detailed above and then click OK to perform OCR on the selected region of the document.
Note that two optical character recognition engines are available in PDF-Tools: the default OCR engine and the enhanced OCR engine, which is available when PDF-Tools is purchased (as part of the PDF-XChange PRO bundle). The enhanced OCR engine is faster, more accurate and more dynamic than the default OCR engine, and it also contains some extra features. Further information about the enhanced OCR engine is available here. You can use the OCR preferences (available via the preferences option in the Options tab) to switch between default and enhanced OCR.
Follow the steps below to perform OCR with PDF-Tools:
1. Open PDF-Tools and double-click the OCR Pages tool to run it:
2. Select the files/folders to be processed.
3. The OCR Pages dialog box will open:
Use the Page Range settings to determine the page range for OCR:
Use the Subset options to specify a subset of selected pages. Select All, Odd or Even as desired.
Use the Recognition Options to determine the language and accuracy of the OCR process. Please note that increasing the accuracy also increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document contains imperfections. This is because the software will search to a greater depth and may attempt to recognize imperfections as text. Click Add/Update Languages to add/update the language packs used for OCR.
Use the Output Options to determine the output of OCR:
Please note that in some cases (for example documents that contain one large graphic zone that takes up the whole page area and has some text zones over it) the visual output for Editable Text and Images and Fine Page Content will be very similar.
Default OCR Engine
The default engine's OCR process in PDF-Tools analyzes image-based documents, recognizes text and then places a duplicate, invisible text layer on top of it, which makes the source text selectable and searchable in the same manner as ordinary text. This means that the original, image-based text in documents can effectively be searched and selected via the invisible text layer, which is the main benefit of OCR. However, it should be noted that the document text cannot be edited in the same manner as normal, text-based documents - as it remains an image-based document, despite the invisible text layer. In order to convert image-based text into editable text the enhanced OCR engine must be used.
Additionally, please note that you can create custom tools that include OCR functionality, as detailed here.
1. Click Document in the Menu Toolbar, then click OCR Pages in the submenu (or press Ctrl+Shift+C). The OCR Pages dialog box will open:
2. Click OK to OCR documents.
You can contact us by phone, email or our social media accounts — we are here to assist you.