URLRead ignores content-encoding:gzip header

Question

Background

Taking an example from Decoding GZIP encoded Body, BodyBytes (ByteArray) and BodyBytesArray from URLRead:

URLRead[
 "https://api.stackexchange.com/2.2/info?site=mathematica", "Headers"
]

{ ... 
, "content-type" -> "application/json; charset=utf-8"
, "content-encoding" -> "gzip"
, ...
}

Import @ "https://api.stackexchange.com/2.2/info?site=mathematica"

returns string json which can be later put to ImportString.

Problem

URLRead though throws a bunch of decoding errors suggesting it ignores "gzip" spec and goes directly to charset specified in content-type.

URLRead[
 "https://api.stackexchange.com/2.2/info?site=mathematica", "Body"
]

Workaround

is already shown in linked topic:

URLRead[
    "https://api.stackexchange.com/2.2/info?site=mathematica"
  , "BodyBytes"
] // FromCharacterCode // ImportString[#, {"gzip", "RawJSON"}] &

Question

Should that be the case? Is it a bug or am I missing the purpose of URLRead Body?

URLFetch behaves the same so I'm surprised it wasn't asked before.

URLFetch[
    "https://api.stackexchange.com/2.2/info?site=mathematica"
  , "Content"
]

Note that Import actually does not need the "content-encoding" -> "gzip" header for recognizing gzip compressed data. You can check it with file=URLDownload["https://api.stackexchange.com/2.2/info?site=mathematica"];Import@file. The created file is a binary gzip-compressed file and Import recognizes the compression method from the first few bytes of the file, the HTTP "content-encoding" header isn't necessary at all. — Alexey Popkov
– Alexey Popkov, Commented Aug 29, 2017 at 14:20
@AlexeyPopkov good point, so no one cares about content-encoding :) — Kuba
– Kuba, Commented Aug 29, 2017 at 14:21
I would like to understand this too (+1). We knew that Import worked, from the quoted question.. BTW, "As of Version 11, URLFetch has been superseded by URLRead and URLExecute." — rhermans
– rhermans, Commented Aug 29, 2017 at 14:23
@rhermans and Alexey, now it works, does it mean it was a bug? — Kuba
– Kuba, Commented Oct 17, 2017 at 21:04
@Kuba I think it was a bug since ability to recognize gzip is of crucial importance for such function as URLRead. As I wrote above, I would expect it to recognize gzip even without the explicit "content-encoding" -> "gzip". Probably the latter can be checked using the file: protocol. — Alexey Popkov
– Alexey Popkov, Commented Oct 18, 2017 at 9:52

Kuba · Accepted Answer · 2017-10-17 21:04:06Z

2

As of V11.2 Content-Encoding specification is respected by URLRead and e.g.

URLRead[
  "https://api.stackexchange.com/2.2/info?site=mathematica", "Body"
]

Works well.

For earlier versions one needs to use a workaround like one shown above.

answered Oct 17, 2017 at 21:04

community wiki

Kuba

Add a comment |

Stack Exchange Network

URLRead ignores content-encoding:gzip header

Background

Problem

Workaround

Question

Related

1 Answer 1

Linked

Hot Network Questions