LSTM models recognize random characters instead of asterisk (*)

Current Behavior

When using Tesseract OCR to extract text from an image containing asterisks (*), the output does not preserve the asterisk character. Instead, it is replaced with seemingly random characters or repeated letters.
This issue can be seen in the attached screenshot, where the expected asterisk is missing and the extracted text contains unexpected sequences.

Expected Behavior

The OCR output should accurately preserve all characters from the original image, including asterisks (*). When an image contains an asterisk, the extracted text should include the asterisk in the correct position, matching the source content exactly. No unexpected or random characters should appear in place of the asterisk.

Suggested Fix

No response

tesseract -v

5.4.1 and 5.5.1 In both versions

Operating System

No response

Other Operating System

Linux 20.04.6

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

Do OCR on attached Image to reproduce the issue

dxz00000001.tif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LSTM models recognize random characters instead of asterisk (*) #4458

Current Behavior

Expected Behavior

Suggested Fix

tesseract -v

Operating System

Other Operating System

uname -a

Compiler

CPU

Virtualization / Containers

Other Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LSTM models recognize random characters instead of asterisk (*) #4458

Description

Current Behavior

Expected Behavior

Suggested Fix

tesseract -v

Operating System

Other Operating System

uname -a

Compiler

CPU

Virtualization / Containers

Other Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions