Python 3 - Encode/Decode vs Bytes/Str [duplicate]

Question

I am new to python3, coming from python2, and I am a bit confused with unicode fundamentals. I've read some good posts, that made it all much clearer, however I see there are 2 methods on python 3, that handle encoding and decoding, and I'm not sure which one to use.

So the idea in python 3 is, that every string is unicode, and can be encoded and stored in bytes, or decoded back into unicode string again.

But there are 2 ways to do it:
u'something'.encode('utf-8') will generate b'something', but so does bytes(u'something', 'utf-8').
And b'bytes'.decode('utf-8') seems to do the same thing as str(b'bytes', 'utf-8').

Now my question is, why are there 2 methods that seem to do the same thing, and is either better than the other (and why?) I've been trying to find answer to this on google, but no luck.

>>> original = '27岁少妇生孩子后变老'
>>> type(original)
<class 'str'>
>>> encoded = original.encode('utf-8')
>>> print(encoded)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded)
<class 'bytes'>
>>> encoded2 = bytes(original, 'utf-8')
>>> print(encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded2)
<class 'bytes'>
>>> print(encoded+encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x8127\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> decoded = encoded.decode('utf-8')
>>> print(decoded)
27岁少妇生孩子后变老
>>> decoded2 = str(encoded2, 'utf-8')
>>> print(decoded2)
27岁少妇生孩子后变老
>>> type(decoded)
<class 'str'>
>>> type(decoded2)
<class 'str'>
>>> print(str(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81', 'utf-8'))
27岁少妇生孩子后变老
>>> print(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'.decode('utf-8'))
27岁少妇生孩子后变老

That's why we need bytes() in the community. Any weird text can be like \xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81 :P — JSmyth
– JSmyth, Commented Jan 12, 2014 at 16:39

Lennart Regebro · Accepted Answer · 2013-01-23 06:09:07Z

73

Neither is better than the other, they do exactly the same thing. However, using .encode() and .decode() is the more common way to do it. It is also compatible with Python 2.

answered Jan 23, 2013 at 6:09

Lennart Regebro

173k45 gold badges230 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Tcll Over a year ago

the only incompatibility with py2 is .encode() returns a bytes object instead of a str object (or unicode object in the OP's case).

Lennart Regebro Over a year ago

Encode() returns an 8-bit string in both cases. It's called "str" in Python 2 and "bytes" in Python 3, but both are 8-bit strings.

Tcll Over a year ago

str and bytes are 2 different classes, bytes is an 8bit array represented as a string, which is useful, but not co-converted between the two in this particular circumstance. (py2 was less of a headache here)

Lennart Regebro Over a year ago

"str and bytes are 2 different classes" - In Python 3, yes. In Python 2, no. I repeat: Encode() returns an 8-bit string both under Python 2 and Python 3. It's called "str" in Python 2 and "bytes" in Python 3, but both are 8-bit strings.

Lennart Regebro Over a year ago

In fact, Python 3 is less of a headache than Python 2. One of the big reasons for Python 3 was that unicode was a big pain on Python 2.

|

Alex S · Accepted Answer · 2017-05-25 21:10:52Z

To add to Lennart Regebro's answer There is even the third way that can be used:

encoded3 = str.encode(original, 'utf-8')
print(encoded3)

Anyway, it is actually exactly the same as the first approach. It may also look that the second way is a syntactic sugar for the third approach.

A programming language is a means to express abstract ideas formally, to be executed by the machine. A programming language is considered good if it contains constructs that one needs. Python is a hybrid language -- i.e. more natural and more versatile than pure OO or pure procedural languages. Sometimes functions are more appropriate than the object methods, sometimes the reverse is true. It depends on mental picture of the solved problem.

Anyway, the feature mentioned in the question is probably a by-product of the language implementation/design. In my opinion, this is a nice example that show the alternative thinking about technically the same thing.

In other words, calling an object method means thinking in terms "let the object gives me the wanted result". Calling a function as the alternative means "let the outer code processes the passed argument and extracts the wanted value".

The first approach emphasizes the ability of the object to do the task on its own, the second approach emphasizes the ability of an separate algoritm to extract the data. Sometimes, the separate code may be that much special that it is not wise to add it as a general method to the class of the object.

Yes, actually now that I think of this it makes perfect sense.
it is the same as the first way: 'a'.encode('utf-8') calls str.encode('a', 'utf-8').

Colin Enstone · Accepted Answer · 2014-05-09 06:36:31Z

12

To add to add to the previous answer, there is even a fourth way that can be used

import codecs
encoded4 = codecs.encode(original, 'utf-8')
print(encoded4)

answered May 9, 2014 at 6:36

Colin Enstone

1211 silver badge2 bronze badges

2 Comments

jfs Over a year ago

note: unlike other variants, it works for arbitrary encodings e.g., for bytes -> bytes encodings: codecs.encode(b'a','hex') -> b'61'

rassa45 Over a year ago

I get that it can't convert bytes to string implicitly

Collectives™ on Stack Overflow

Python 3 - Encode/Decode vs Bytes/Str [duplicate]

3 Answers 3

7 Comments

3 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

3 Comments

2 Comments

Linked

Related