Skip to content

Decode piped stdin as UTF-8 and strip any BOM (PowerShell 5.1 compat)#433

Open
Jovenkemp wants to merge 1 commit into
browser-use:mainfrom
Jovenkemp:fix/stdin-bom
Open

Decode piped stdin as UTF-8 and strip any BOM (PowerShell 5.1 compat)#433
Jovenkemp wants to merge 1 commit into
browser-use:mainfrom
Jovenkemp:fix/stdin-bom

Conversation

@Jovenkemp

@Jovenkemp Jovenkemp commented Jun 12, 2026

Copy link
Copy Markdown

Problem

On Windows PowerShell 5.1, piped text often arrives with a UTF-8 BOM (EF BB BF): the stock [Text.Encoding]::UTF8 $OutputEncoding emits its preamble when piping to native commands, and files written by PS 5.1 (Out-File -Encoding utf8) carry BOMs that survive re-piping. sys.stdin.read() keeps that BOM in the decoded source — U+FEFF, or  when decoded via the locale code page — so exec() fails on line 1 of every piped script:

SyntaxError: invalid non-printable character U+FEFF (<string>, line 1)

Since piping code into browser-harness is the documented (and only) usage mode, this breaks the harness entirely for PowerShell 5.1 callers.

Fix

Read stdin as raw bytes and decode with utf-8-sig, which strips one leading UTF-8 BOM when present and is byte-for-byte identical to utf-8 otherwise.

Compatibility notes (each verified against CPython)

  • bash / cmd / heredoc input is unchanged. With no BOM present, utf-8-sig decodes exactly like utf-8.
  • Newline translation is not lost. Text-mode stdin used to translate \r\n to \n; compile()/exec() normalize newlines themselves, including inside string literals (exec('s="""a\r\nb"""') yields 'a\nb'), so CRLF input behaves identically.
  • UTF-16 stdin: deliberately out of scope. CPython rejects UTF-16 source files (SyntaxError: Non-UTF-8 code starting with '\xff' ... see PEP 263), so the harness keeps the same contract for piped source. A UTF-16 pipe now fails loudly at decode time (UnicodeDecodeError, or ValueError: source code string cannot contain null bytes) instead of producing mojibake.
  • Side benefit: non-ASCII piped source now decodes as UTF-8 on Windows instead of the locale code page (cp1252), matching how Python defines source encoding (PEP 3120).

This mirrors the existing UTF-8 normalization the file already does for stdout (run.py lines 3–8).

🤖 Generated with Claude Code


Summary by cubic

Fix stdin decoding so piped scripts work on Windows PowerShell 5.1. We now read bytes and decode as UTF‑8 while stripping a BOM, preventing the first-line SyntaxError when piping into browser-harness.

  • Bug Fixes
    • Read from sys.stdin.buffer and decode with utf-8-sig to drop a leading BOM from PowerShell 5.1 pipes.
    • No change for non-BOM input; newline handling unchanged. UTF‑16 stdin remains unsupported and now fails early.

Written for commit 29655f1. Summary will update on new commits.

Review in cubic

Windows PowerShell 5.1 commonly prepends a UTF-8 BOM when piping text
to a native command (its UTF8 $OutputEncoding emits a preamble, and
files written by PS 5.1 carry BOMs that survive re-piping).
sys.stdin.read() leaves that BOM in the source -- decoded as U+FEFF,
or as mojibake under the locale code page -- so exec() fails with a
SyntaxError on the first line of every piped script.

Read raw bytes and decode with utf-8-sig instead: it strips one
leading UTF-8 BOM when present and is byte-for-byte identical to
utf-8 otherwise, so piped input from bash/cmd decodes exactly as
before. Newline handling is unchanged too: compile() normalizes
\r\n itself, including inside string literals. Side benefit:
non-ASCII source now decodes as UTF-8 on Windows rather than the
locale code page.

UTF-16 stdin stays unsupported, matching CPython, which rejects
UTF-16 source files outright; it now fails loudly at decode time
instead of producing mojibake.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant