-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Description
Current Behavior
When using Tesseract OCR to extract text from an image containing asterisks (*), the output does not preserve the asterisk character. Instead, it is replaced with seemingly random characters or repeated letters.
This issue can be seen in the attached screenshot, where the expected asterisk is missing and the extracted text contains unexpected sequences.
Expected Behavior
The OCR output should accurately preserve all characters from the original image, including asterisks (*). When an image contains an asterisk, the extracted text should include the asterisk in the correct position, matching the source content exactly. No unexpected or random characters should appear in place of the asterisk.
Suggested Fix
No response
tesseract -v
5.4.1 and 5.5.1 In both versions
Operating System
No response
Other Operating System
Linux 20.04.6
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
Do OCR on attached Image to reproduce the issue