The problem
When a crawl or search robot is set to also take screenshots of each page, here's what currently happens:
The robot visits every page once and scrapes the text/data from it.
After that's all done, it goes back and visits every single page again, just to take a screenshot.
That means every page gets loaded twice - once to scrape, once to screenshot. For a robot covering dozens or hundreds of pages, this roughly doubles the time the job takes, for no good reason, since the page was already fully loaded and visible the first time.
What needs to change
While the robot is scraping a page (the first and only pass it should need), it should also take the screenshot right there, if screenshots were requested.
The separate "go back and revisit every page to take a screenshot" step should then be removed for crawl and search jobs, since it becomes unnecessary.
What "done" looks like
Run a crawl or search robot with screenshots turned on.
Each page should only be visited once during the run.
The final results should still include a screenshot for each page, same as before just captured faster, in one pass.
The problem
When a crawl or search robot is set to also take screenshots of each page, here's what currently happens:
The robot visits every page once and scrapes the text/data from it.
After that's all done, it goes back and visits every single page again, just to take a screenshot.
That means every page gets loaded twice - once to scrape, once to screenshot. For a robot covering dozens or hundreds of pages, this roughly doubles the time the job takes, for no good reason, since the page was already fully loaded and visible the first time.
What needs to change
While the robot is scraping a page (the first and only pass it should need), it should also take the screenshot right there, if screenshots were requested.
The separate "go back and revisit every page to take a screenshot" step should then be removed for crawl and search jobs, since it becomes unnecessary.
What "done" looks like
Run a crawl or search robot with screenshots turned on.
Each page should only be visited once during the run.
The final results should still include a screenshot for each page, same as before just captured faster, in one pass.