52

I have a list of stopwords. And I have a search string. I want to remove the words from the string.

As an example:

stopwords=['what','who','is','a','at','is','he']
query='What is hello'

Now the code should strip 'What' and 'is'. However in my case it strips 'a', as well as 'at'. I have given my code below. What could I be doing wrong?

for word in stopwords:
    if word in query:
        print word
        query=query.replace(word,"")

If the input query is "What is Hello", I get the output as:
wht s llo

Why does this happen?

2
  • If you want to do full word match, you should be splitting the query to a list, and search. query.split() Commented Aug 17, 2014 at 3:28
  • Is there a way to do this using a regex? Commented Aug 15, 2020 at 3:53

7 Answers 7

83

This is one way to do it:

query = 'What is hello'
stopwords = ['what', 'who', 'is', 'a', 'at', 'is', 'he']
querywords = query.split()

resultwords  = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)

print(result)

I noticed that you want to also remove a word if its lower-case variant is in the list, so I've added a call to lower() in the condition check.

Sign up to request clarification or add additional context in comments.

7 Comments

This does work! Can you please explain what the 4th line does? I am getting a bit confused with it.
It creates a new list with all the words of which the lower-case variant is not found in stopwords.
@RohitShinde: It's a list comprehension.
@RobbyCornelissen works just with words but if you had a stopword which would be like ['what about this','what about that'] and your query would be "Hello just try to remove what about this or what about that", it won't work.
this works well in languages that use space to separate vocabularies. For some languages like Chinese, this does not work.
|
16

the accepted answer works when provided a list of words separated by spaces, but that's not the case in real life when there can be punctuation to separate the words. In that case re.split is required.

Also, testing against stopwords as a set makes lookup faster (even if there's a tradeoff between string hashing & lookup when there's a small number of words)

My proposal:

import re

query = 'What is hello? Says Who?'
stopwords = {'what','who','is','a','at','is','he'}

resultwords  = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
print(resultwords)

output (as list of words):

['hello','Says','']

There's a blank string in the end, because re.split annoyingly issues blank fields, that needs filtering out. 2 solutions here:

resultwords  = [word for word in re.split("\W+",query) if word and word.lower() not in stopwords]  # filter out empty words

or add empty string to the list of stopwords :)

stopwords = {'what','who','is','a','at','is','he',''}

now the code prints:

['hello','Says']

Comments

11

building on what karthikr said, try

' '.join(filter(lambda x: x.lower() not in stopwords,  query.split()))

explanation:

query.split() #splits variable query on character ' ', e.i. "What is hello" -> ["What","is","hello"]

filter(func,iterable) #takes in a function and an iterable (list/string/etc..) and
                      # filters it based on the function which will take in one item at
                      # a time and return true.false

lambda x: x.lower() not in stopwords   # anonymous function that takes in variable,
                                       # converts it to lower case, and returns true if
                                       # the word is not in the iterable stopwords


' '.join(iterable) #joins all items of the iterable (items must be strings/chars)
                   #using the string/char in front of the dot, i.e. ' ' as a joiner.
                   # i.e. ["What", "is","hello"] -> "What is hello"

5 Comments

I tried what you said. It is giving me an error saying that argument of type function is not iterable.
Can you explain me what the line does?
why are you using list and split(' ')?
i didn't originally use list, it was an edit, i wasn't sure why he was getting an error. put it in thinking maybe his filter object couldn't be used as an iterable in an older version or something. split is there to seperate the string into words
yes but you don't need to specify ' ', try splitting 'What is hello'
8

Looking at the other answers to your question I noticed that they told you how to do what you are trying to do, but they did not answer the question you posed at the end.

If the input query is "What is Hello", I get the output as:

wht s llo

Why does this happen?

This happens because .replace() replaces the substring you give it exactly.

for example:

"My, my! Hello my friendly mystery".replace("my", "")

gives:

>>> "My, ! Hello  friendly stery"

.replace() is essentially splitting the string by the substring given as the first parameter and joining it back together with the second parameter.

"hello".replace("he", "je")

is logically similar to:

"je".join("hello".split("he"))

If you were still wanting to use .replace to remove whole words you might think adding a space before and after would be enough, but this leaves out words at the beginning and end of the string as well as punctuated versions of the substring.

"My, my! hello my friendly mystery".replace(" my ", " ")
>>> "My, my! hello friendly mystery"

"My, my! hello my friendly mystery".replace(" my", "")
>>> "My,! hello friendlystery"

"My, my! hello my friendly mystery".replace("my ", "")
>>> "My, my! hello friendly mystery"

Additionally, adding spaces before and after will not catch duplicates as it has already processed the first sub-string and will ignore it in favor of continuing on:

"hello my my friend".replace(" my ", " ")
>>> "hello my friend"

For these reasons your accepted answer by Robby Cornelissen is the recommended way to do what you are wanting.

Comments

1
" ".join([x for x in query.split() if x not in stopwords])

Comments

0

here is my the best i get

function removeCommonWords($input) {

$input=strtolower($input);
$commonWords = array('a','able','about','above','abroad','according','accordingly','across','actually','adj','after','afterwards','again','against','ago','ahead','ain\'t','all','allow','allows','almost','alone','along','alongside','already','also','although','always','am','amid','amidst','among','amongst','an','and','another','any','anybody','anyhow','anyone','anything','anyway','anyways','anywhere','apart','appear','appreciate','appropriate','are','aren\'t','around','as','a\'s','aside','ask','asking','associated','at','available','away','awfully','b','back','backward','backwards','be','became','because','become','becomes','becoming','been','before','beforehand','begin','behind','being','believe','below','beside','besides','best','better','between','beyond','both','brief','but','by','c','came','can','cannot','cant','can\'t','caption','cause','causes','certain','certainly','changes','clearly','c\'mon','co','co.','com','come','comes','concerning','consequently','consider','considering','contain','containing','contains','corresponding','could','couldn\'t','course','c\'s','currently','d','dare','daren\'t','definitely','described','despite','did','didn\'t','different','directly','do','does','doesn\'t','doing','done','don\'t','down','downwards','during','e','each','edu','eg','eight','eighty','either','else','elsewhere','end','ending','enough','entirely','especially','et','etc','even','ever','evermore','every','everybody','everyone','everything','everywhere','ex','exactly','example','except','f','fairly','far','farther','few','fewer','fifth','first','five','followed','following','follows','for','forever','former','formerly','forth','forward','found','four','from','further','furthermore','g','get','gets','getting','given','gives','go','goes','going','gone','got','gotten','greetings','h','had','hadn\'t','half','happens','hardly','has','hasn\'t','have','haven\'t','having','he','he\'d','he\'ll','hello','help','hence','her','here','hereafter','hereby','herein','here\'s','hereupon','hers','herself','he\'s','hi','him','himself','his','hither','hopefully','how','howbeit','however','hundred','i','i\'d','ie','if','ignored','i\'ll','i\'m','immediate','in','inasmuch','inc','inc.','indeed','indicate','indicated','indicates','inner','inside','insofar','instead','into','inward','is','isn\'t','it','it\'d','it\'ll','its','it\'s','itself','i\'ve','j','just','k','keep','keeps','kept','know','known','knows','l','last','lately','later','latter','latterly','least','less','lest','let','let\'s','like','liked','likely','likewise','little','look','looking','looks','low','lower','ltd','m','made','mainly','make','makes','many','may','maybe','mayn\'t','me','mean','meantime','meanwhile','merely','might','mightn\'t','mine','minus','miss','more','moreover','most','mostly','mr','mrs','much','must','mustn\'t','my','myself','n','name','namely','nd','near','nearly','necessary','need','needn\'t','needs','neither','never','neverf','neverless','nevertheless','new','next','nine','ninety','no','nobody','non','none','nonetheless','noone','no-one','nor','normally','not','nothing','notwithstanding','novel','now','nowhere','o','obviously','of','off','often','oh','ok','okay','old','on','once','one','ones','one\'s','only','onto','opposite','or','other','others','otherwise','ought','oughtn\'t','our','ours','ourselves','out','outside','over','overall','own','p','particular','particularly','past','per','perhaps','placed','please','plus','possible','presumably','probably','provided','provides','q','que','quite','qv','r','rather','rd','re','really','reasonably','recent','recently','regarding','regardless','regards','relatively','respectively','right','round','s','said','same','saw','say','saying','says','second','secondly','see','seeing','seem','seemed','seeming','seems','seen','self','selves','sensible','sent','serious','seriously','seven','several','shall','shan\'t','she','she\'d','she\'ll','she\'s','should','shouldn\'t','since','six','so','some','somebody','someday','somehow','someone','something','sometime','sometimes','somewhat','somewhere','soon','sorry','specified','specify','specifying','still','sub','such','sup','sure','t','take','taken','taking','tell','tends','th','than','thank','thanks','thanx','that','that\'ll','thats','that\'s','that\'ve','the','their','theirs','them','themselves','then','thence','there','thereafter','thereby','there\'d','therefore','therein','there\'ll','there\'re','theres','there\'s','thereupon','there\'ve','these','they','they\'d','they\'ll','they\'re','they\'ve','thing','things','think','third','thirty','this','thorough','thoroughly','those','though','three','through','throughout','thru','thus','till','to','together','too','took','toward','towards','tried','tries','truly','try','trying','t\'s','twice','two','u','un','under','underneath','undoing','unfortunately','unless','unlike','unlikely','until','unto','up','upon','upwards','us','use','used','useful','uses','using','usually','v','value','various','versus','very','via','viz','vs','w','want','wants','was','wasn\'t','way','we','we\'d','welcome','well','we\'ll','went','were','we\'re','weren\'t','we\'ve','what','whatever','what\'ll','what\'s','what\'ve','when','whence','whenever','where','whereafter','whereas','whereby','wherein','where\'s','whereupon','wherever','whether','which','whichever','while','whilst','whither','who','who\'d','whoever','whole','who\'ll','whom','whomever','who\'s','whose','why','will','willing','wish','with','within','without','wonder','won\'t','would','wouldn\'t','x','y','yes','yet','you','you\'d','you\'ll','your','you\'re','yours','yourself','yourselves','you\'ve','z','zero');

$input = preg_replace('/\b('.implode('|',$commonWords).')\b/','',$input);

$input = trim(preg_replace('/\s\s+/', ' ', str_replace("\n", " ", $input)));

return $input;

}

2 Comments

Are you sure you've posted this on the correct question? This is a Python question not a PHP one.
This code looks like a PHP code. Did you intend to answer that way?
-1
stopwords=['for','or','to']
p='Asking for help, clarification, or responding to other answers.'
for i in stopwords:
  n=p.replace(i,'')
  p=n
print(p)

2 Comments

Although this code might solve the problem, a good answer should explain what the code does and how it helps
This code will do exactly the same thing the question is asking about. It neither resolves the problem nor explains why it is not doing what they want.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.