Manual Reference Source Test

pop-api-scraper

Build Status Windows Build Coverage Status Dependency Status devDependencies Status

Features

The pop-api-scraper project aims to provide the core modules for the popcorn-api scraper, but can also be used for other purposes by using middleware.

  • Strategy pattern with providers
  • Cronjobs
  • Scraper wrapper class
  • HttpService with got

Installation

 $ npm install --save pop-api-scraper pop-api

Documentation

Usage

For the basic setup you need to create a Provider (strategy) the PopApiScraper instance can use. The PopApiScraper implements the strategy pattern, where the providers are the strategies.

The example below makes a HTTP GET request to a web service or website. from there on you are free to implement how and what data you want to get from it.

// ./ExampleProvider.js
import { AbstractProvider, HttpService } from 'pop-api-scraper'

// Extend from the internal AbstractProvider.
export default class ExampleProvider extends AbstractProvider {

  constructor(PopApiScraper, {name, configs, maxWebRequests = 2}) {
    super(PopApiScraper, {name, configs, maxWebRequests})
  }

  // Override the `scrapeConfig` method to get the content from one
  // configuration.
  scrapeConfig(config) {
    // A HTTP service to send HTTP requests.
    this.httpService = new HttpService({
      baseUrl: config.baseUrl
    })

    // HTTP  GET request to: https://jsonplaceholder.typicode.com/posts?foo=bar
    return this.httpService.get('/posts', config.httpOptions)
      .then(res => res.data)
  }

}

Bundle it all up together with pop-api:

// ./index.js
import os from 'os'
import { PopApi } from 'pop-api'
import { join } from 'path'
import { Cron, PopApiScraper } from 'pop-api-scraper'

import ExampleProvider from './ExampleProvider'

(async () => {
  try {
    // Let the PopApiScraper use the ExampleProvider o scrape data.
    PopApiScraper.use(ExampleProvider, {
      name: 'example-provider',
      configs: [{
        baseUrl: 'https://jsonplaceholder.typicode.com',
        httpOptions: {
          query: {
            foo: 'bar'
          }
        }
      }],
      maxWebRequests: 2
    })

    // Register the PopApiScraper middleware to the pop-api instance.
    PopApi.use(PopApiScraper, {
      statusPath: join(...[os.tmpdir(), 'status.json']),
      updatedPath: join(...[os.tmpdir(), 'updated.json'])
    })
    // Optionally you can use the Cron middleware to scrape for content on a
    // regulat basis.
    PopApi.use(Cron, {
      cronTime: '0 0 */6 * * *',
      start: false
    })

    // PopApi now has a `scraper` instance.
    const res = await PopApi.scraper.scrape()
    console.info(res[0])
  } catch (err) {
    console.error(err)
  }
})()

License

MIT License

Usage

For the basic setup you need to create a Provider (strategy) the PopApiScraper instance can use. The PopApiScraper implements the strategy pattern, where the providers are the strategies.

The example below makes a HTTP GET request to a web service or website. from there on you are free to implement how and what data you want to get from it.

// ./ExampleProvider.js
import { AbstractProvider, HttpService } from 'pop-api-scraper'

// Extend from the internal AbstractProvider.
export default class ExampleProvider extends AbstractProvider {

  constructor(PopApiScraper, {name, configs, maxWebRequests = 2}) {
    super(PopApiScraper, {name, configs, maxWebRequests})
  }

  // Override the `scrapeConfig` method to get the content from one
  // configuration.
  scrapeConfig(config) {
    // A HTTP service to send HTTP requests.
    this.httpService = new HttpService({
      baseUrl: config.baseUrl
    })

    // HTTP  GET request to: https://jsonplaceholder.typicode.com/posts?foo=bar
    return this.httpService.get('/posts', config.httpOptions)
      .then(res => res.data)
  }

}

Bundle it all up together with pop-api:

// ./index.js
import os from 'os'
import { PopApi } from 'pop-api'
import { join } from 'path'
import { Cron, PopApiScraper } from 'pop-api-scraper'

import ExampleProvider from './ExampleProvider'

(async () => {
  try {
    // Let the PopApiScraper use the ExampleProvider o scrape data.
    PopApiScraper.use(ExampleProvider, {
      name: 'example-provider',
      configs: [{
        baseUrl: 'https://jsonplaceholder.typicode.com',
        httpOptions: {
          query: {
            foo: 'bar'
          }
        }
      }],
      maxWebRequests: 2
    })

    // Register the PopApiScraper middleware to the pop-api instance.
    PopApi.use(PopApiScraper, {
      statusPath: join(...[os.tmpdir(), 'status.json']),
      updatedPath: join(...[os.tmpdir(), 'updated.json'])
    })
    // Optionally you can use the Cron middleware to scrape for content on a
    // regulat basis.
    PopApi.use(Cron, {
      cronTime: '0 0 */6 * * *',
      start: false
    })

    // PopApi now has a `scraper` instance.
    const res = await PopApi.scraper.scrape()
    console.info(res[0])
  } catch (err) {
    console.error(err)
  }
})()

Middleware

Scraper

The PopApiScraper middleware implements a strategy pattern where you can use your own Providers (strageties) for scraping content from the web.

import os from 'os'
import { PopApi } from 'pop-api'
import { PopApiScraper } from 'pop-api-scraper'
import { join } from 'path'

import ExampleProvider  from './ExampleProvider'

const providerOpts = {
  name: 'example-provider',  // The name of the provider.
  configs: [{                // The configurations to scrape with.
    key: 'value'             // Put anything you like into the configuration.
  }],
  maxWebRequests: 2          // The maximum concurrent web requests at a time.
}
PopApiScraper.use(ExampleProvider, providerOpts)

// Join paths for the scraper options.
const tmpDir = join(...[os.tmpdir(), name])
const statusPath = join(...[tmpDir, 'status.json'])
const updatedPath = join(...[tmpDir, 'updated.json'])

const scraperOpts = {
  statusPath,  // The path to the status file where the scraper status is
               // saved.
  updatedPath  // The path to the updated file where the time of the scraping
               // process is saved.
}
PopApi.use(PopApiScraper, scraperOpts)

// Start the scraping process by calling the `scrape` method.
PopApi.scraper.scrape()

Cron

The Cron middleware allows for the scraping process to be started regularly.

import { PopApi } from 'pop-api'
import { Cron } from 'pop-api-scraper'

const cronOpts = {
  cronTime: '0 0 */6 * * *',  // The ctron time for the cronjob.
  start: false                // Start the cron job on creation.
}
PopApi.use(Cron, cronOpts)

// PopApi.cron will be an instance of: https://github.com/merencia/node-cron

0.1.0 (2017-12-27)

Features

  • initial-release: Initial relase for npm registery (e8deafd)

Contributing

So you're interested in giving us a hand? That's awesome! We've put together some brief guidelines that should help you get started quickly and easily.

There are lots and lots of ways to get involved, this document covers:

Raising Issues

If you're about to raise an issue because you think that you've found a problem with the application, or you'd like to make a request for a new feature in the codebase, or any other reason… please read this first.

The GitHub issue tracker is the preferred channel for bug reports, feature requests, and pull requests but respect the following restrictions:

  • Please do not use the issue tracker for personal support requests.
  • Please do not derail or troll issues. Keep the discussion on topic and respect the opinions of others.

Report A Bug

A bug is a demonstrable problem that is caused by the code in the repository. Good bug reports are extremely helpful - thank you!

Guidelines for bug reports:

  1. Use the GitHub issue search — check if the issue has already been reported.
  2. Check if the issue has been fixed — try to reproduce it using the latest master or look for closed issues.
  3. Include a screencast if relevant - Is your issue about a design or front end feature or bug? The most helpful thing in the world is if we can see what you're talking about. Just drop the picture after writing your issue, it'll be uploaded and shown to the developers.
  4. Use the Issue tab on GitHub to start creating a bug report. A good bug report shouldn't leave others needing to chase you up for more information. Be sure to include all the possible required details and the steps to take to reproduce the issue.

Feature Requests

Feature requests are welcome. Before you submit one be sure to:

  1. Use the GitHub Issues search and check the feature hasn't already been requested.
  2. Take a moment to think about whether your idea fits with the scope and aims of the project, or if it might better fit being an app/plugin.
  3. Remember, it's up to you to make a strong case to convince the project's leaders of the merits of this feature. Please provide as much detail and context as possible, this means explaining the use case and why it is likely to be common.
  4. Clearly indicate whether this is a feature request for the application itself, or for packages like Providers, Metadatas, or other.

Pull Requests

Pull requests are awesome. If you're looking to raise a PR for something which doesn't have an open issue, please think carefully about raising an issue which your PR can close, especially if you're fixing a bug. This makes it more likely that there will be enough information available for your PR to be properly tested and merged. To make sure your PR is accepted as quickly as possible, you should be sure to have read all the guidelines on:

Commit Messages

This project uses the Conventional Commits convention. If you are not familiar with this convention please read about it first before creating a commit message or a PR.

Styleguides

JavaScript Styleguide

All JavaScript must adhere to JavaScript Standard Style.

  • Inline exports with expressions whenever possible

    // Use this:
    export default class ClassName {
    
    }
    
    // Instead of:
    class ClassName {
    
    }
    export default ClassName
    

Tests Styleguide

  • Include thoughtfully-worded, well-structured Mocha tests in the ./test folder.
  • Treat describe as a noun or situation.
  • Treat it as a statement about state or how an operation changes state.

Documentation Styleguide

  • Use Markdown.
  • Reference methods and classes in markdown with the custom {} notation:
    • Reference classes with {ClassName}
    • Reference instance methods with {ClassName.methodName}
    • Reference class methods with {ClassName#methodName}

Setting up for development

To setup your local machine to start working on the project you can follow these steps:

  1. Install NodeJS (at least Node v7.10.1 or greater)
  2. Clone the repository with: git clone https://github.com/popcorn-official/pop-api-scraper.git
  3. Install dependencies npm i
  4. Install the flow-typed libraries with npm run flow-typed

npm scripts

The following npm-scripts are available in order to help you with the development of the project.

 $ npm run build    # Transform the code with 'babel'
 $ npm run docs     # Generate the documentation with 'esdoc'
 $ npm run debug    # Run the applicaiton in debug mode
 $ npm run dev      # Run the application in development mode
 $ npm run flow     # Check flow typings
 $ npm run lint     # Check javascript style
 $ npm run test     # Run unit tests

Git hooks

The following git hooks are available to ensure the changes you are about to make follow the styleguides and make sure your changes pass the tests.

pre-commit          # npm run lint && npm run flow
pre-push            # npm run test

Contributor Covenant Code of Conduct

Our Pledge

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards

Examples of behavior that contributes to creating a positive environment include:

  • Using welcoming and inclusive language
  • Being respectful of differing viewpoints and experiences
  • Gracefully accepting constructive criticism
  • Focusing on what is best for the community
  • Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

  • The use of sexualized language or imagery and unwelcome sexual attention or advances
  • Trolling, insulting/derogatory comments, and personal or political attacks
  • Public or private harassment
  • Publishing others' private information, such as a physical or electronic address, without explicit permission
  • Other conduct which could reasonably be considered inappropriate in a professional setting

Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

Scope

This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.

Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at hello@popcorntime.sh. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.

Attribution

This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at http://contributor-covenant.org/version/1/4