pop-api-scraper
Features
The pop-api-scraper project aims to provide the core modules for the
popcorn-api
scraper, but
can also be used for other purposes by using middleware.
- Strategy pattern with providers
- Cronjobs
- Scraper wrapper class
- HttpService with
got
Installation
$ npm install --save pop-api-scraper pop-api
Documentation
Usage
For the basic setup you need to create a Provider
(strategy) the
PopApiScraper
instance can use. The PopApiScraper
implements the strategy
pattern, where the providers are the strategies.
The example below makes a HTTP GET request to a web service or website. from there on you are free to implement how and what data you want to get from it.
// ./ExampleProvider.js
import { AbstractProvider, HttpService } from 'pop-api-scraper'
// Extend from the internal AbstractProvider.
export default class ExampleProvider extends AbstractProvider {
constructor(PopApiScraper, {name, configs, maxWebRequests = 2}) {
super(PopApiScraper, {name, configs, maxWebRequests})
}
// Override the `scrapeConfig` method to get the content from one
// configuration.
scrapeConfig(config) {
// A HTTP service to send HTTP requests.
this.httpService = new HttpService({
baseUrl: config.baseUrl
})
// HTTP GET request to: https://jsonplaceholder.typicode.com/posts?foo=bar
return this.httpService.get('/posts', config.httpOptions)
.then(res => res.data)
}
}
Bundle it all up together with
pop-api
:
// ./index.js
import os from 'os'
import { PopApi } from 'pop-api'
import { join } from 'path'
import { Cron, PopApiScraper } from 'pop-api-scraper'
import ExampleProvider from './ExampleProvider'
(async () => {
try {
// Let the PopApiScraper use the ExampleProvider o scrape data.
PopApiScraper.use(ExampleProvider, {
name: 'example-provider',
configs: [{
baseUrl: 'https://jsonplaceholder.typicode.com',
httpOptions: {
query: {
foo: 'bar'
}
}
}],
maxWebRequests: 2
})
// Register the PopApiScraper middleware to the pop-api instance.
PopApi.use(PopApiScraper, {
statusPath: join(...[os.tmpdir(), 'status.json']),
updatedPath: join(...[os.tmpdir(), 'updated.json'])
})
// Optionally you can use the Cron middleware to scrape for content on a
// regulat basis.
PopApi.use(Cron, {
cronTime: '0 0 */6 * * *',
start: false
})
// PopApi now has a `scraper` instance.
const res = await PopApi.scraper.scrape()
console.info(res[0])
} catch (err) {
console.error(err)
}
})()
License
MIT License
Usage
For the basic setup you need to create a Provider
(strategy) the
PopApiScraper
instance can use. The PopApiScraper
implements the strategy
pattern, where the providers are the strategies.
The example below makes a HTTP GET request to a web service or website. from there on you are free to implement how and what data you want to get from it.
// ./ExampleProvider.js
import { AbstractProvider, HttpService } from 'pop-api-scraper'
// Extend from the internal AbstractProvider.
export default class ExampleProvider extends AbstractProvider {
constructor(PopApiScraper, {name, configs, maxWebRequests = 2}) {
super(PopApiScraper, {name, configs, maxWebRequests})
}
// Override the `scrapeConfig` method to get the content from one
// configuration.
scrapeConfig(config) {
// A HTTP service to send HTTP requests.
this.httpService = new HttpService({
baseUrl: config.baseUrl
})
// HTTP GET request to: https://jsonplaceholder.typicode.com/posts?foo=bar
return this.httpService.get('/posts', config.httpOptions)
.then(res => res.data)
}
}
Bundle it all up together with
pop-api
:
// ./index.js
import os from 'os'
import { PopApi } from 'pop-api'
import { join } from 'path'
import { Cron, PopApiScraper } from 'pop-api-scraper'
import ExampleProvider from './ExampleProvider'
(async () => {
try {
// Let the PopApiScraper use the ExampleProvider o scrape data.
PopApiScraper.use(ExampleProvider, {
name: 'example-provider',
configs: [{
baseUrl: 'https://jsonplaceholder.typicode.com',
httpOptions: {
query: {
foo: 'bar'
}
}
}],
maxWebRequests: 2
})
// Register the PopApiScraper middleware to the pop-api instance.
PopApi.use(PopApiScraper, {
statusPath: join(...[os.tmpdir(), 'status.json']),
updatedPath: join(...[os.tmpdir(), 'updated.json'])
})
// Optionally you can use the Cron middleware to scrape for content on a
// regulat basis.
PopApi.use(Cron, {
cronTime: '0 0 */6 * * *',
start: false
})
// PopApi now has a `scraper` instance.
const res = await PopApi.scraper.scrape()
console.info(res[0])
} catch (err) {
console.error(err)
}
})()
Middleware
Scraper
The PopApiScraper
middleware implements a strategy pattern where you can
use your own Providers
(strageties) for scraping content from the web.
import os from 'os'
import { PopApi } from 'pop-api'
import { PopApiScraper } from 'pop-api-scraper'
import { join } from 'path'
import ExampleProvider from './ExampleProvider'
const providerOpts = {
name: 'example-provider', // The name of the provider.
configs: [{ // The configurations to scrape with.
key: 'value' // Put anything you like into the configuration.
}],
maxWebRequests: 2 // The maximum concurrent web requests at a time.
}
PopApiScraper.use(ExampleProvider, providerOpts)
// Join paths for the scraper options.
const tmpDir = join(...[os.tmpdir(), name])
const statusPath = join(...[tmpDir, 'status.json'])
const updatedPath = join(...[tmpDir, 'updated.json'])
const scraperOpts = {
statusPath, // The path to the status file where the scraper status is
// saved.
updatedPath // The path to the updated file where the time of the scraping
// process is saved.
}
PopApi.use(PopApiScraper, scraperOpts)
// Start the scraping process by calling the `scrape` method.
PopApi.scraper.scrape()
Cron
The Cron
middleware allows for the scraping process to be started regularly.
import { PopApi } from 'pop-api'
import { Cron } from 'pop-api-scraper'
const cronOpts = {
cronTime: '0 0 */6 * * *', // The ctron time for the cronjob.
start: false // Start the cron job on creation.
}
PopApi.use(Cron, cronOpts)
// PopApi.cron will be an instance of: https://github.com/merencia/node-cron
Contributing
So you're interested in giving us a hand? That's awesome! We've put together some brief guidelines that should help you get started quickly and easily.
There are lots and lots of ways to get involved, this document covers:
Raising Issues
If you're about to raise an issue because you think that you've found a problem with the application, or you'd like to make a request for a new feature in the codebase, or any other reason… please read this first.
The GitHub issue tracker is the preferred channel for bug reports, feature requests, and pull requests but respect the following restrictions:
- Please do not use the issue tracker for personal support requests.
- Please do not derail or troll issues. Keep the discussion on topic and respect the opinions of others.
Report A Bug
A bug is a demonstrable problem that is caused by the code in the repository. Good bug reports are extremely helpful - thank you!
Guidelines for bug reports:
- Use the GitHub issue search — check if the issue has already been reported.
- Check if the issue has been fixed — try to reproduce it using the
latest
master
or look for closed issues. - Include a screencast if relevant - Is your issue about a design or front end feature or bug? The most helpful thing in the world is if we can see what you're talking about. Just drop the picture after writing your issue, it'll be uploaded and shown to the developers.
- Use the Issue tab on GitHub to start creating a bug report. A good bug report shouldn't leave others needing to chase you up for more information. Be sure to include all the possible required details and the steps to take to reproduce the issue.
Feature Requests
Feature requests are welcome. Before you submit one be sure to:
- Use the GitHub Issues search and check the feature hasn't already been requested.
- Take a moment to think about whether your idea fits with the scope and aims of the project, or if it might better fit being an app/plugin.
- Remember, it's up to you to make a strong case to convince the project's leaders of the merits of this feature. Please provide as much detail and context as possible, this means explaining the use case and why it is likely to be common.
- Clearly indicate whether this is a feature request for the application itself, or for packages like Providers, Metadatas, or other.
Pull Requests
Pull requests are awesome. If you're looking to raise a PR for something which doesn't have an open issue, please think carefully about raising an issue which your PR can close, especially if you're fixing a bug. This makes it more likely that there will be enough information available for your PR to be properly tested and merged. To make sure your PR is accepted as quickly as possible, you should be sure to have read all the guidelines on:
Commit Messages
This project uses the Conventional Commits convention. If you are not familiar with this convention please read about it first before creating a commit message or a PR.
Styleguides
JavaScript Styleguide
All JavaScript must adhere to JavaScript Standard Style.
Inline
export
s with expressions whenever possible// Use this: export default class ClassName { } // Instead of: class ClassName { } export default ClassName
Tests Styleguide
- Include thoughtfully-worded, well-structured Mocha tests in the
./test
folder. - Treat
describe
as a noun or situation. - Treat
it
as a statement about state or how an operation changes state.
Documentation Styleguide
- Use Markdown.
- Reference methods and classes in markdown with the custom
{}
notation:- Reference classes with
{ClassName}
- Reference instance methods with
{ClassName.methodName}
- Reference class methods with
{ClassName#methodName}
- Reference classes with
Setting up for development
To setup your local machine to start working on the project you can follow these steps:
- Install NodeJS (at least Node v7.10.1 or greater)
- Clone the repository with:
git clone https://github.com/popcorn-official/pop-api-scraper.git
- Install dependencies
npm i
- Install the flow-typed libraries with
npm run flow-typed
npm scripts
The following npm-scripts
are available in order to help you with the
development of the project.
$ npm run build # Transform the code with 'babel'
$ npm run docs # Generate the documentation with 'esdoc'
$ npm run debug # Run the applicaiton in debug mode
$ npm run dev # Run the application in development mode
$ npm run flow # Check flow typings
$ npm run lint # Check javascript style
$ npm run test # Run unit tests
Git hooks
The following git
hooks are available to ensure the changes you are about to
make follow the styleguides and make sure your changes pass the
tests.
pre-commit # npm run lint && npm run flow
pre-push # npm run test
Contributor Covenant Code of Conduct
Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
Our Standards
Examples of behavior that contributes to creating a positive environment include:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery and unwelcome sexual attention or advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or electronic address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a professional setting
Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
Scope
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at hello@popcorntime.sh
. The
project team will review and investigate all complaints, and will respond in a
way that it deems appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an
incident. Further details of specific enforcement policies may be posted
separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
Attribution
This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at http://contributor-covenant.org/version/1/4