How to split a string of multiple words into a list with strings of a certain number of words?

Question

I have a string which has characters from multiple languages:

'죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요'

I am trying to chunk this single string into a list of strings based on the number of words in the string and the result should be this if the chunk size is 7 i.e. there are at maximum 7 words in a string:

['죄송합니다 how are you doing? My name', 'is Yudhiesh and I am 아니 doing', 'good 저기요']

My current attempt which is based on how you would chunk a list which is not working:

s = '죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요'
>>> parts = [str(s[i:i+7]) for i in range(0, len(s), 7)]
>>> parts
['죄송합니다 h', 'ow are ', 'you doi', 'ng? My ', 'name is', ' Yudhie', 'sh and ', 'I am 아니', ' doing ', 'good 저기', '요']

If overlaps are allowed, these are called ngrams stackoverflow.com/questions/13423919/… — OneCricketeer
– OneCricketeer, Commented Feb 11, 2021 at 15:23

Dorian Turba · Accepted Answer · 2021-02-11 15:48:12Z

1

First, you can create a list of words, and then, create chunks and join them.

Here is what you need in a function:

def split_max_num(string, max_words):
    """
    >>> split_max_num('죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요', 7)
    ['죄송합니다 how are you doing? My name', 'is Yudhiesh and I am 아니 doing', 'good 저기요']
    """
    words = string.split()
    len_words = len(words)

    res = list()
    for index in range(0, len_words, max_words):
        res.append(' '.join(words[index:index+max_words]))
    return res

edited Feb 11, 2021 at 15:48

answered Feb 11, 2021 at 15:30

Dorian Turba

4,0413 gold badges30 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

yudhiesh Over a year ago

Thanks this worked on my other test cases as well.

Ricardo Lohmann · Accepted Answer · 2021-02-11 15:47:29Z

1

How about the following ?

def split_max(words, n): 
    words = words.split()
    words = [words[i:i + n] for i in range(0, len(words), n)]
    return [' '.join(l) for l in words]


split_max(data, 7)

answered Feb 11, 2021 at 15:47

Ricardo Lohmann

26.4k7 gold badges86 silver badges84 bronze badges

Comments

bibblybobbly · Accepted Answer · 2021-02-11 15:28:06Z

0

Are you sure you're not setting s=['죄송합니다 how are you doing? My name', 'is 도와 주세요 and I am doing', 'good 저기요'] before producing parts in this example ?

You may want to split your original string with "somestring".split(" ") i.e split on spaces to get a list of all words, then you can chop the list with indexing like you've tried to do.

answered Feb 11, 2021 at 15:28

bibblybobbly

1

1 Comment

bibblybobbly Over a year ago

you've tried: ` somestring = "yoursentence" wordlist = somestring.split(" ") sevens = [wordlist[i:i+7].join() for i in range(0, len(wordlist), 7)] ` and it doesn't work?

yudhiesh · Accepted Answer · 2021-02-11 15:34:33Z

0

Use .split() on the string and chunk it:

from typing import List

def chunk_list(lst: List[str], chunk_size: int) -> List[List[str]]: 
    return [
        lst[i:i + chunk_size] 
        for i in range(0, len(lst), chunk_size)
    ]
     

def chunk_string(string: str, chunk_size: int) -> List[str]:
    return chunk_list(string.split(), chunk_size)

edited Feb 11, 2021 at 15:34

yudhiesh

6,8774 gold badges25 silver badges56 bronze badges

answered Feb 11, 2021 at 15:27

Jonathan Herrera

6,3845 gold badges31 silver badges58 bronze badges

1 Comment

yudhiesh Over a year ago

Thanks for the answer but it produces

[['죄송합니다', 'how', 'are', 'you', 'doing?', 'My', 'name'], ['is', 'Yudhiesh', 'and', 'I', 'am', '아니', 'doing'], ['good', '저기요']]

OneCricketeer · Accepted Answer · 2021-02-11 15:36:20Z

0

You're converting a list into a string representation of a list.

I think you meant to rejoin the words

s = '죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요'
s = s.split()
print([" ".join(s[i:i+7]) for i in range(0, len(s), 7)])

edited Feb 11, 2021 at 15:36

answered Feb 11, 2021 at 15:25

OneCricketeer

193k20 gold badges146 silver badges276 bronze badges

4 Comments

yudhiesh Over a year ago

Hi thanks for the answer but this produces

['죄 송 합 니 다   h', 'o w   a r e  ', 'y o u   d o i', 'n g   m y   n', 'a m e   i s  ', 'Y u d h i e s', 'h   a n d   I', '  a m   아 니  ', 'd o i n g   g', 'o o d   저 기 요']

OneCricketeer Over a year ago

What is s? The string, or the list of words?

yudhiesh Over a year ago

Its the entire string

OneCricketeer Over a year ago

Okay, then you'll still have to split it for s[i:i+7] to get words rather than characters

Collectives™ on Stack Overflow

How to split a string of multiple words into a list with strings of a certain number of words?

5 Answers 5

1 Comment

Comments

1 Comment

1 Comment

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

1 Comment

1 Comment

4 Comments

Linked

Related