Skip to content

Consider not using emojis in print, which encounterred encoding error. #41945

@acane77

Description

@acane77

System Info

transformer version: latest commit

in file: https://github.com/huggingface/transformers/blob/main/src/transformers/utils/auto_docstring.py#L1121 (and any other files using emoji symbols)

This "🚨" symbol cause encoding error as the system charset is not UTF-8 encoded. (especially on Windows, the UTF-8 charset support is default disabled, instead the system charset is GBK, cp1252, etc)

For compatibility, it's better to not use such non-ASCII, UTF-8 charset dependent multi-word emojis.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

On windows system without UTF-8 charset enabled, run code to reach any print statement containing emoji characters. like "🚨"

Expected behavior

Program crashed with the following error message.

[6424] [8464]   File "transformers\models\bert\modeling_bert.py", line 778, in <module>
[6424] [8464]   File "transformers\utils\auto_docstring.py", line 2048, in auto_docstring
[6424] [8464]   File "transformers\utils\auto_docstring.py", line 2045, in auto_docstring_decorator
[6424] [8464]   File "transformers\utils\auto_docstring.py", line 1787, in auto_class_docstring
[6424] [8464]   File "transformers\utils\auto_docstring.py", line 1728, in auto_method_docstring
[6424] [8464]   File "transformers\utils\auto_docstring.py", line 1243, in _get_model_info
[6424] [8464]   File "transformers\utils\auto_docstring.py", line 1124, in get_model_name
[6424] [8464]   File "encodings\cp1252.py", line 19, in encode
[6424] [8464] UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f6a8' in position 0: character maps to <undefined>

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions