51,743 questions
0
votes
1
answer
56
views
BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction
I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs.
The goal is to extract the Memory ...
-2
votes
0
answers
56
views
To Run firefox browser using launch_persistent_context of playwright python [closed]
from playwright.sync_api import sync_playwright
profile_path = r"C:\Users\kdutt\AppData\Roaming\Mozilla\Firefox\Profiles\p283dicx.default-release"
firefox_path = r"C:\Program Files\...
3
votes
1
answer
45
views
Nodriver does not take exception if element not found?
I am trying to search for elements on a webpage and have used various methods, including text and XPath. It seems that the timeout option does not work the way I expected, and no exception is raised ...
0
votes
2
answers
203
views
Beautiful Soup, children are clearly inside but can't get it
From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>.
<...
0
votes
0
answers
46
views
UPS fuel surcharge history extracting [closed]
I previously extracted the US fuel surcharge history using this JSON endpoint:
https://www.ups.com/assets/resources/fuel-surcharge/us.json
But, it stopped updating data after 9/22/2025.
How can I ...
0
votes
0
answers
77
views
URL Targeted web crawler [closed]
I have a bit of code I am trying to build to take a specific tumblr page and then iteratively scan by post # sequentially and check to see if a page exists. If it does it will print that full URL to ...
2
votes
0
answers
101
views
How to stop/kill achieved Scrapy spider instance within RStudio
I'm making a tutorial on how to scrape with Scrapy. For that, I use Quarto/RStudio and the website https://quotes.toscrape.com/. For pedagogic purposes, I need to run a first crawl on the first page, ...
Advice
0
votes
4
replies
47
views
How to fetch a realTime news Data feed
I wanted to know how I can get live news feed data (INDIAN), without any or like minimal latency (30-40 seconds). I tried using some RSS feeds, but all they do is provide the data as some latency, so ...
0
votes
0
answers
49
views
Camoufox browser window remains visible in WSL even when `headless` is set to `virtual`
Camoufox browser window remains visible in WSL even when headless is set to virtual
Description
When headless is set to "virtual", the Camoufox browser window still appears on the screen in ...
1
vote
0
answers
86
views
Invoke-WebRequest URL encoding
I want to retrieve content from web page. However, I tried above method but the error still come when the query string contain Chinese character.
code
$json = Get-Content -Encoding utf8 -Path "./...
-4
votes
2
answers
75
views
How can I get BBFC ratings in python? [closed]
I am trying to write code to give me BBFC film ratings. I am using selenium to do this but would be happy with any solution that works reliably. After a lot of work I finally came up with this code:
#...
0
votes
1
answer
214
views
Fetch data from https://www.sofascore.com/?
This is my python code using on ubuntu to try fetch and extract data from
https://www.sofascore.com/
I create this test code before using on E2 device in my plugin
# python3 -m venv venv
# source venv/...
0
votes
1
answer
72
views
Using HTTPkerberosauth with a javascript enabled web scraper
I'm working on integration tests for a web application that's running in a Docker container within our GitLab CI/CD pipeline. The application is a frontend that requires Kerberos/SPNEGO authentication ...
0
votes
1
answer
65
views
Scrapy handle status 202
I'm quite new to web scraping, and in particular in using Scrapy's spiders, pipelines...
I'm getting some 202 status from some spider requests' response, hence the page content is not available yet
...
-1
votes
1
answer
48
views
How to loop an Apps Script / Cheerio web scraper over multiple urls? [closed]
I have this Apps Script / Cheerio function that successfully scrapes the data I want from the url. The site only displays 25 entries at this url. I can find additional entries on subsequent pages (by ...