Home | Products | Purchase | Support | Blog | Company

Optical Character Recognition, usually abbreviated to OCR, is a complex and advanced technology that converts images, scanned books or photos with text into editable formats. Many documents are stored as images in PDF format and the only way to extract text from such files is to perform OCR. Enolsoft PDF Converter with OCR for Mac is embedded with this masterpiece. It is premium software designed for Mac users to turn scanned PDFs and images to other writable document formats such as Word, Text Html, and Excel etc. It applies special pre-processing functions to extract text, image and tables.

These are some of the types of files suitable for Enolsoft PDF Converter OCR:

  • Image PDF files obtained using flatbed scanners
  • Photos (BMP, PNG, TIFF, JPG, GIF) taken with digital cameras or mobile phones
  • Adobe PDF files

These are the output file types supported by Enolsoft PDF Converter with OCR:

  • Microsoft Office Word (*.docx)
  • Microsoft Office PowerPoint (*.pptx)
  • Plain Text file (*.txt)
  • Microsoft Office Excel (*.xlsx)
  • EPUB Format (*.epub)
  • HyperText Markup Language (*.html)
  • Rich Text Format Directory (*.rtfd)
  • BMP, PNG, TIFF, JPG, GIF image file

How does Enolsoft PDF Converter OCR work? The process of converting an image to editable document is complicated and every step is a set of related algorithms which do a piece of OCR job. The Enolsoft PDF Converter OCR process is separated to 3 parts:

Part 1, Detection of Pictures

Once a picture is selected manually or automatically from a scanned PDF or an image , the OCR system will understand them as a bitmap and detect the resolution and inversion of the picture area. However, some of the pictures can be skewed or noised. Then the OCR engine will call the deskew and denoising algorithms to improve the image quality. This process is called “binarization” and it is very important step because incorrect binarization will cause a lot of problems.

Part 2, Detection of Text

Detection of text lines and words is not an easy task because of different font sizes, languages and small spaces between words. Firstly, it’ll make combined-broken characters analysis to find correct position of every character in case that some characters are broken to several parts, or a few characters touch each one. Then, the OCR will call the main algorithm to recognize characters so that every character can be converted to appropriate character code in a right language. But this algorithm may produce several similar character codes for uncertain words. For instance, recognition of the image of “C” character can produce “C” and “G” codes and the final character code will be selected later by dictionary which can improve recognition quality and help to make the decision to choose a more precise character code.

Part 3, Detection of Tables

Select the “Table Area” icon from the OCR Applied Zone, then you are able to edit the lines as following screenshot showing. Then the OCR system will begin the lines detection soon. This step does a big favor to improve line analysis and achieve better recognition quality for a more accurate table.



It is with these well-organised OCR algorithms that Enolsoft PDF Converter with OCR has so many powerful features in a nutshell. So does the others PDF OCR apps like PDF Converter to Word with OCR or PDF to PowerPoint with OCR etc. If you need to cope with scanned PDF or images, these high-tech software may be helpful to you. Click here to learn more tutorials of how to use them.

Related posts:

  1. Enolsoft PDF Converter with OCR Updated with More Accurate Conversion
  2. Enolsoft Launches Mac PDF Converter to Convert PDF into Other Document Formats
  3. Enolsoft PDF Converter with OCR for Mac Awarded with 5-star on Boffin Review
  4. Enolsoft PDF Converter for Mac Adds Support for OS X Mountain Lion
  5. Enolsoft Upgrades PDF to TEXT for Mac with OCR Capability
Tagged with:  
Share →

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>