Skip to content

Feat: Capture screenshots while scraping, not in a separate revisit pass #1105

Description

@amhsirak

The problem
When a crawl or search robot is set to also take screenshots of each page, here's what currently happens:
The robot visits every page once and scrapes the text/data from it.
After that's all done, it goes back and visits every single page again, just to take a screenshot.
That means every page gets loaded twice - once to scrape, once to screenshot. For a robot covering dozens or hundreds of pages, this roughly doubles the time the job takes, for no good reason, since the page was already fully loaded and visible the first time.

What needs to change
While the robot is scraping a page (the first and only pass it should need), it should also take the screenshot right there, if screenshots were requested.
The separate "go back and revisit every page to take a screenshot" step should then be removed for crawl and search jobs, since it becomes unnecessary.

What "done" looks like
Run a crawl or search robot with screenshots turned on.
Each page should only be visited once during the run.
The final results should still include a screenshot for each page, same as before just captured faster, in one pass.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions