Newest 'web-scraping' Questions

0 votes

1 answer

56 views

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs. The goal is to extract the Memory ...

zeromiedo

1

asked yesterday

-2 votes

0 answers

56 views

To Run firefox browser using launch_persistent_context of playwright python [closed]

from playwright.sync_api import sync_playwright profile_path = r"C:\Users\kdutt\AppData\Roaming\Mozilla\Firefox\Profiles\p283dicx.default-release" firefox_path = r"C:\Program Files\...

Krishnendu Dutta

37

asked Nov 27 at 3:22

3 votes

1 answer

45 views

Nodriver does not take exception if element not found?

I am trying to search for elements on a webpage and have used various methods, including text and XPath. It seems that the timeout option does not work the way I expected, and no exception is raised ...

Shankboy

33

asked Nov 26 at 13:42

0 votes

2 answers

203 views

Beautiful Soup, children are clearly inside but can't get it

From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>. <...

Emby

1

asked Nov 25 at 18:27

0 votes

0 answers

46 views

UPS fuel surcharge history extracting [closed]

I previously extracted the US fuel surcharge history using this JSON endpoint: https://www.ups.com/assets/resources/fuel-surcharge/us.json But, it stopped updating data after 9/22/2025. How can I ...

maxi

1

asked Nov 25 at 16:07

0 votes

0 answers

77 views

URL Targeted web crawler [closed]

I have a bit of code I am trying to build to take a specific tumblr page and then iteratively scan by post # sequentially and check to see if a page exists. If it does it will print that full URL to ...

Kyle Campbell

53

asked Nov 23 at 6:25

2 votes

0 answers

101 views

How to stop/kill achieved Scrapy spider instance within RStudio

I'm making a tutorial on how to scrape with Scrapy. For that, I use Quarto/RStudio and the website https://quotes.toscrape.com/. For pedagogic purposes, I need to run a first crawl on the first page, ...

Didier mac cormick

227

asked Nov 20 at 10:00

Advice

0 votes

4 replies

47 views

How to fetch a realTime news Data feed

I wanted to know how I can get live news feed data (INDIAN), without any or like minimal latency (30-40 seconds). I tried using some RSS feeds, but all they do is provide the data as some latency, so ...

its m

49

asked Nov 18 at 16:50

0 votes

0 answers

49 views

Camoufox browser window remains visible in WSL even when `headless` is set to `virtual`

Camoufox browser window remains visible in WSL even when headless is set to virtual Description When headless is set to "virtual", the Camoufox browser window still appears on the screen in ...

exlead

1

asked Nov 17 at 10:30

1 vote

0 answers

86 views

Invoke-WebRequest URL encoding

I want to retrieve content from web page. However, I tried above method but the error still come when the query string contain Chinese character. code $json = Get-Content -Encoding utf8 -Path "./...

Akira

33

asked Nov 12 at 3:07

-4 votes

2 answers

75 views

How can I get BBFC ratings in python? [closed]

I am trying to write code to give me BBFC film ratings. I am using selenium to do this but would be happy with any solution that works reliably. After a lot of work I finally came up with this code: #...

Simd

21.5k

asked Nov 9 at 14:42

0 votes

1 answer

214 views

Fetch data from https://www.sofascore.com/?

This is my python code using on ubuntu to try fetch and extract data from https://www.sofascore.com/ I create this test code before using on E2 device in my plugin # python3 -m venv venv # source venv/...

RR-EB

55

asked Nov 4 at 0:15

0 votes

1 answer

72 views

Using HTTPkerberosauth with a javascript enabled web scraper

I'm working on integration tests for a web application that's running in a Docker container within our GitLab CI/CD pipeline. The application is a frontend that requires Kerberos/SPNEGO authentication ...

ben green

33

asked Oct 30 at 17:47

0 votes

1 answer

65 views

Scrapy handle status 202

I'm quite new to web scraping, and in particular in using Scrapy's spiders, pipelines... I'm getting some 202 status from some spider requests' response, hence the page content is not available yet ...

Manu310

178

asked Oct 28 at 11:27

-1 votes

1 answer

48 views

How to loop an Apps Script / Cheerio web scraper over multiple urls? [closed]

I have this Apps Script / Cheerio function that successfully scrapes the data I want from the url. The site only displays 25 entries at this url. I can find additional entries on subsequent pages (by ...

zambonidude

9

asked Oct 24 at 6:28

Collectives™ on Stack Overflow

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

To Run firefox browser using launch_persistent_context of playwright python [closed]

Nodriver does not take exception if element not found?

Beautiful Soup, children are clearly inside but can't get it

UPS fuel surcharge history extracting [closed]

URL Targeted web crawler [closed]

How to stop/kill achieved Scrapy spider instance within RStudio

How to fetch a realTime news Data feed

Camoufox browser window remains visible in WSL even when `headless` is set to `virtual`

Invoke-WebRequest URL encoding

How can I get BBFC ratings in python? [closed]

Fetch data from https://www.sofascore.com/?

Using HTTPkerberosauth with a javascript enabled web scraper

Scrapy handle status 202

How to loop an Apps Script / Cheerio web scraper over multiple urls? [closed]

Hot Network Questions