-
Notifications
You must be signed in to change notification settings - Fork 31k
Open
Labels
Description
System Info
transformer version: latest commit
in file: https://github.com/huggingface/transformers/blob/main/src/transformers/utils/auto_docstring.py#L1121 (and any other files using emoji symbols)
This "🚨" symbol cause encoding error as the system charset is not UTF-8 encoded. (especially on Windows, the UTF-8 charset support is default disabled, instead the system charset is GBK, cp1252, etc)
For compatibility, it's better to not use such non-ASCII, UTF-8 charset dependent multi-word emojis.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
On windows system without UTF-8 charset enabled, run code to reach any print statement containing emoji characters. like "🚨"
Expected behavior
Program crashed with the following error message.
[6424] [8464] File "transformers\models\bert\modeling_bert.py", line 778, in <module>
[6424] [8464] File "transformers\utils\auto_docstring.py", line 2048, in auto_docstring
[6424] [8464] File "transformers\utils\auto_docstring.py", line 2045, in auto_docstring_decorator
[6424] [8464] File "transformers\utils\auto_docstring.py", line 1787, in auto_class_docstring
[6424] [8464] File "transformers\utils\auto_docstring.py", line 1728, in auto_method_docstring
[6424] [8464] File "transformers\utils\auto_docstring.py", line 1243, in _get_model_info
[6424] [8464] File "transformers\utils\auto_docstring.py", line 1124, in get_model_name
[6424] [8464] File "encodings\cp1252.py", line 19, in encode
[6424] [8464] UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f6a8' in position 0: character maps to <undefined>