Cheers

Scrape a website efficiently, block by block, page by page.

Motivations

This is a Cheerio based scraper, useful to extract data from a website using CSS selectors.
The motivation behind this package is to provide a simple cheerio-based scraping tool, able to divide a website into blocks, and transform each block into a JSON object using CSS selectors.

Built on top of the excellents :

https://github.com/cheeriojs/cheerio
https://github.com/chriso/curlrequest
https://github.com/kriskowal/q

CSS mapping syntax inspired by :

https://github.com/dharmafly/noodle

Getting Started

Install the module with: npm install cheers

Usage

Configuration options:

config.url : the URL to scrape
config.blockSelector : the CSS selector to apply on the page to divide it in scraping blocks. This field is optional (will use "body" by default)
config.scrape : the definition of what you want to extract in each block. Each key has two mandatory attributes : selector (a CSS selector or . to stay on the current node) and extract. The possible values for extract are text, html, outerHTML or the name of an attribute of the html element (e.g. "href")

var cheers = require('cheers');

//let's scrape this excellent JS news website
var config = {
    url: "http://www.echojs.com/",
    blockSelector: "article",
    scrape: {
        title: {
            selector: "h2 a",
            extract: "text"
        },
        link: {
            selector: "h2 a",
            extract: "href"
        },
        articleInnerHtml: {
            selector: ".",
            extract: "html"
        },
        articleOuterHtml: {
            selector: ".",
            extract: "outerHTML"
        }
    }
};

cheers.scrape(config).then(function (results) {
    console.log(JSON.stringify(results));
}).catch(function (error) {
    console.error(error);
});

Roadmap

Website pagination
Option to use a headless browser
Unit tests

Cheers!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
example		example
lib		lib
.gitignore		.gitignore
MIT-LICENSE.md		MIT-LICENSE.md
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cheers

Motivations

Built on top of the excellents :

CSS mapping syntax inspired by :

Getting Started

Usage

Roadmap

License

About

Uh oh!

Releases

Packages

License

quicktoolbox/cheers

Folders and files

Latest commit

History

Repository files navigation

Cheers

Motivations

Built on top of the excellents :

CSS mapping syntax inspired by :

Getting Started

Usage

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages