This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.
-
Notifications
You must be signed in to change notification settings - Fork 0
This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.
jmsquare/optical-character-recognition
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published