Manual Reference Source Test

Middleware

Scraper

The PopApiScraper middleware implements a strategy pattern where you can use your own Providers (strageties) for scraping content from the web.

import os from 'os'
import { PopApi } from 'pop-api'
import { PopApiScraper } from 'pop-api-scraper'
import { join } from 'path'

import ExampleProvider  from './ExampleProvider'

const providerOpts = {
  name: 'example-provider',  // The name of the provider.
  configs: [{                // The configurations to scrape with.
    key: 'value'             // Put anything you like into the configuration.
  }],
  maxWebRequests: 2          // The maximum concurrent web requests at a time.
}
PopApiScraper.use(ExampleProvider, providerOpts)

// Join paths for the scraper options.
const tmpDir = join(...[os.tmpdir(), name])
const statusPath = join(...[tmpDir, 'status.json'])
const updatedPath = join(...[tmpDir, 'updated.json'])

const scraperOpts = {
  statusPath,  // The path to the status file where the scraper status is
               // saved.
  updatedPath  // The path to the updated file where the time of the scraping
               // process is saved.
}
PopApi.use(PopApiScraper, scraperOpts)

// Start the scraping process by calling the `scrape` method.
PopApi.scraper.scrape()

Cron

The Cron middleware allows for the scraping process to be started regularly.

import { PopApi } from 'pop-api'
import { Cron } from 'pop-api-scraper'

const cronOpts = {
  cronTime: '0 0 */6 * * *',  // The ctron time for the cronjob.
  start: false                // Start the cron job on creation.
}
PopApi.use(Cron, cronOpts)

// PopApi.cron will be an instance of: https://github.com/merencia/node-cron