Skip to content
This repository was archived by the owner on Apr 9, 2021. It is now read-only.

The deed metadata scraper provides a simple web interface for extracting RDFa metadata from a given URL

Notifications You must be signed in to change notification settings

cc-archive/metadata_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

243 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deed Metadata Scraper

Date: $LastChangedDate$
Version: $LastChangedRevision$
Author: Nathan R. Yergler <nathan@creativecommons.org>
Organization: Creative Commons
Copyright: 2006-2007, Nathan R. Yergler, Creative Commons; licensed to the public under the MIT license.

Overview

The deed metadata scraper provides a simple web interface for extracting RDFa metadata from a given URL. The metadata is returned in JSON format, which is used by the referrer-metadata.js script to update a Creative Commons deed with attribution links.

Installation

The server and dependencies may be deployed using a combination of Subversion and zc.buildout. For example:

$ svn co https://svn.sourceforge.net/svnroot/cctools/metadata_scraper
$ cd metadata_scraper
$ python bootstrap/bootstrap.py
$ ./bin/buildout

Running the buildout command will download any uninstalled dependencies as Python Eggs and place them in an ./eggs sub-directory. It will also create a script in ./bin, paster, which can be used to run the metadata scaper as an independent server process.

> note:: zc.buildout is intended for use as an installation > construction tool, and as such "bakes in" explicit paths to eggs it > downloads. If it is necessary to move an installation of the > software, you must run buildout again.

Testing

cc.deedscraper ships with a test runner:

$ ./bin/python cc/deedscraper/tests.py

Note that ./bin/python is a script created by zc.buildout which starts a Python interpreter with the application dependency eggs on the PYTHONPATH.

Running the Server

cc.deedscraper is a WSGI application, and can be run in a variety of containers. The buildout will install and create a script for running the application using CherryPy's built-in web server, with the process managed by zdaemon.

Before running the server a configuration file must be created. A sample file, local.conf.sample, is provided in the cc.deedscraper package. Simply copy this file to local.conf for the default settings.

To start the application run:

$ ./bin/zdaemon -C scraper_zd.conf start

Stopping the application is predictably done with:

$ ./bin/zdaemon -C scraper_zd.conf stop

An interactive console can be access by running:

$ ./bin/zdaemon -C scraper_zd.conf

Type help within the console to see a full list of available commands.

Support

Problems? Write to us at hackers@creativecommons.org or send us a pull request...

Updated: April 2014.

About

The deed metadata scraper provides a simple web interface for extracting RDFa metadata from a given URL

Resources

Stars

Watchers

Forks

Contributors