The National Archives

Congress logo

Congressional & Federal

Government Web Harvests

About Web Harvests

Web harvesting is the process of automatically copying and organizing unstructured information from pages and data on the World Wide Web. It is also known as web mining, web scraping and web crawling. Websites are identified with a "seed list" of URLs which are "harvested" so that content within, or linked to an identified site, is captured and copied. This work is conducted for public interest purposes.

White House Harvest

In 2024, as part of its transition activities, NARA conducted a web harvest of Presidential Records Act related websites to determine the viability of utilizing a web harvest to replace or accompany the “frozen in time” websites with harvest content.

Congressional Web Harvest

Starting in 2006, NARA began conducting web harvests of Congressional websites at the end of each Congress.The harvest includes Members, committees, organizational offices, and leadership websites. Recent Congressional harvests have expanded in scope to capture not only content hosted and stored on Member and committee websites, but also content hosted on a number of social media sites.

2004 Presidential Term Harvest

The National Archives and Records Administration (NARA) conducted a web harvest of Federal Agency public websites in 2004. In January 2005, NARA issued Guidance on Managing Web Records to address agencies' responsibilities for identifying, managing, and scheduling web materials they identify as Federal records. Accordingly, each agency is responsible, in coordination with NARA, for determining how to manage its web records, including whether to preserve a periodic snapshot of its entire web page.

NARA also maintains an overview on Web Records at the National Archives.

Accuracy of Harvests

The accuracy of each harvest was affected by these factors:

NARA has made every reasonable effort to ensure that websites' code and programming were captured accurately. NARA is not responsible for any websites' compliance with Federal laws, regulations, and requirements. NARA is responsible for providing public access to these copied websites but is not responsible for maintaining code such as links, accessibility features, search or site maps, or other functionality that may have been true of the sites before they were copied.

Mention of commercial products, services, or resources within this notice does not constitute an endorsement by the National Archives and Records Administration or the United States Government.

Learn more: To find current Federal agency websites, please use https://www.usa.gov/agency-index. For technical issues, contact info@archive.org. For questions about these Federal records, contact cer@nara.gov. For questions about Presidential records, contact presidential.libraries@nara.gov.