6,086 questions
0
votes
1
answer
64
views
BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction
I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs.
The goal is to extract the Memory ...
0
votes
2
answers
203
views
Beautiful Soup, children are clearly inside but can't get it
From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>.
<...
0
votes
0
answers
56
views
Issue With Jsoup Document Selector
I'm using java spring boot and jsoup and recently I upgraded jsoup version to 1.21.1.
My code creates search query and searches for it in the document
Elements targetElements = document.select(...
Advice
1
vote
0
replies
96
views
Parsing with Python html.parser: accessing and using raw tags
I'm not a Python specialist, so bear with me. I'm trying to replace a Perl HTML::TokeParser based parser that I use for template foreign language translation to use Python html.parser. Here's the ...
3
votes
1
answer
61
views
Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working
I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc.
The ...
0
votes
1
answer
47
views
Can we combine '[class="a"]' and '[id="c d"]' in the same command?
I have a html where I want to get elements with class="a" and id="c d". If I have only one of them, I can use soup.select('[class="a"]') and soup.select('[id="c d&...
0
votes
0
answers
99
views
parse marked customize for list
I've seen the docs https://marked.js.org/using_pro#renderer and it has no example for the list i want to customize
more detail https://github.com/markedjs/marked/blob/master/src/Tokens.ts#L137 as the ...
4
votes
5
answers
184
views
How to extract links from an html page
I have an html page that has data like so:
<td><a href="test-2025-03-24_17-05.log">test-2025-03-24_17-05.log</a></td>
<td><a href="PASS_report_test_2025-...
3
votes
1
answer
93
views
Why isn't the end tag included in an ASIDE.OuterHTML
My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace command.
But somehow the OuterHTML ...
-1
votes
1
answer
52
views
why is my html parser not outputting wanted number
my programming teacher made us program in python a calculator for calculating fuel consummation in L/100KM and i decided to go further and even have it calculate the price per 100km but heres the ...
1
vote
0
answers
34
views
Passing CSRF token through Dart html parsing
I'm making an app where students can login to there portal website and it shows their data, however I'm having trouble authenticated users, when I did this project on another website I used ...
1
vote
2
answers
93
views
Extracting text from Wikisource using BeautifulSoup returns empty result
I'm trying to extract the text of a book from a Wikisource page using BeautifulSoup, but the result is always empty. The page I'm working on is Le Père Goriot by Balzac.
Here's the code I'm using:
...
-1
votes
2
answers
87
views
Parser on python returns an empty list (i guess its an HTML class selection issue)
The idea is: i wanna collect the name of the flat and its price as a list for every flat on the website.
Ive made a simple parser on python, but looks like i cant get any values, since it returns an ...
1
vote
1
answer
151
views
How can I scrape a table from baseball reference using pandas and beautiful soup? [duplicate]
I am trying to scrape the pitching stats on this url and then save the dataframe to a csv file.
https://www.baseball-reference.com/boxes/ARI/ARI202204070.shtml
My current code is below (Python 3.9.7)
...
0
votes
1
answer
54
views
Duplicate extra data when webscraping fbref.com
I am trying to webscrape the league table for the EPL, but when I do that I am getting duplicate links as well as links to the teams that are not even in the premier league which makes no sense.
Here ...