Tutorial
In this tutorial section you can read about the scraping process of the API.
Scraper
The Scraper
class is the entry point to start the scraping process with the scrape
method. This method will iterate through an array of methods to scrape each individual content provider.
scrape() {
Scraper._util.setLastUpdated();
asyncq.eachSeries([
this._scrapeExtratorrentShows,
this._scrapeEZTVShows,
this._scrapeKATShows,
this._scrapeExtratorrentMovies,
this._scrapeKATMovies,
this._scrapeYTSMovies,
this._scrapeExtratorrentAnime,
this._scrapeKATAnime,
this._scrapeHorribelSubsAnime,
this._scrapeNyaaAnime
], scraper => scraper()).then(value => Scraper._util.setStatus())
.catch(err => Scraper._util.onError(`Error while scraping: ${err}`));
};
Content Providers
Popcorn API gets its torrent content from various sources. Here you can see where the content is coming from.
Anime | Movie | Show | |
---|---|---|---|
ExtraTorrent | X [1] | X | X |
EZTV | X | ||
Horriblesubs | X | ||
KAT [2] | X | X | X |
Nyaa | X | ||
YTS | X |
[1] Anime can be scraped from ExtraTorrent, but currently this is not done. The reason for this is because it is very ineffective to scrape anime torrents from ExtraTorrent. The ineffectiveness is due to a lack of good ExtraTorrent providers.
[2] The main website of KAT is down at the moment, but it was used for movie and tv show scraping. Around the development of the anime provider KAT got taken down. If KAT ever comes back in the state it was before it was taken down it can be used again. If this scenario happens the
baseUrl
of kat-api-pt needs to be changed, or theoptions
in the constructor need to change to override the defaultbaseUrl
.
ExtraTorrent
Content from extratorrent.cc is grabbed with so-called 'ExtraTorrent providers' which are defined in the
extratorrentAnimeProviders
, extratorrentMovieProviders
and extratorrentShowProviders
arrays. The ExtraTorrent providers will be converted to a search query to extratorrent.cc by the extratorrent-api module.
Each provider needs a name
property and a query
property. The name
property is a String
will be used for logging purposes so that issues with the provider can be figured out easier. The query
property is an Object
which can contain various properties. These properties will be converted into a search query to extratorrent.cc:
The following query
properties can be used:
- page # Number of the page you want to search
- with_words # With all of the words **REQUIRED!**
- extact # With the exact phrase
- without # Without the words
- category # See categories
- added # Number of last added 1 day (1), 3 days (3) or week (7)
- seeds_from # Seeds more than the number given
- seeds_to # Seeds less than the number given
- leechers_from # Leecher more than the number given
- leechers_to # Leechers less than the number given
- size_from # Torrent size more than the number given
- size_to # Torrent size less than the number given
- size_type # b for byte, kb for kilobyte etc
All three types of content can be scraped from extratorrent.cc through the ExtraTorrent
class in each folder of the providers. By default the ExtraTorrent
class adds a few default properties to the providers. The page
property does not need to be indicated since the algorithm for scraping extratorrent.cc will go through all the available pages (max of 200 pages/10.000 torrents due to site limitations). The category
property will also have a default value to its corresponding content.
An example of an ExtraTorrent provider:
{
name: "ETTV LOL",
query: {
with_words: "ettv hdtv x264 lol",
without: "720p 1080p"
}
}
If you want to make a provider for extratorrent.cc it is highly recommended you try it first in the browser by manually going to extratorrent.cc and search for the content. This is because the title of the torrent will be subjected to regular expressions by the Extractors
to 'extract' information about the torrent which will be used to find metadata.
EZTV
Content from eztv.ag is grabbed through the eztv-api-pt module. The module contains two methods getAllShows
and getShowData
.
getAllShows
This method returns a list of all the available shows listed here. Through regular expression it grabs the show title, the id used by eztv.ag and the slug.
[{
show: "10 O\"Clock Live",
id: "449",
slug: "10-o-clock-live"
}, {
show: "10 Things I Hate About You",
id: "308",
slug: "10-things-i-hate-about-you"
},
...
]
getShowData
Each show from the getAllShows
can be passed into the getShowData
method to get more data on the individual show. Through this process the slug can change to another slug or imdb id which is compatible with Trakt.tv. Torrents are being added to the episodes
property which is compatible with the Helper
class to insert the torrents into the MongoDB database. Nested within the episodes
property there is the season number
within the season number
is the episode number
and within the episode number
are the different qualities
of the torrent.
{ show: "10 O\'Clock Live",
id: "449",
slug: "tt1811399",
episodes:
dateBased: false,
{ "1":
{ "1":
{ "480p":
{ url: "magnet:?xt=urn:btih:LMJXHHNOW33Z3YGXJLCTJZ23WK2D6VO4&dn=10.OClock.Live.S01E01.WS.PDTV.XviD-PVR&tr=udp://tracker.openbittorrent.com:80&tr=udp://open.demonii.com:80&tr=udp://tracker.coppersurfer.tk:80&tr=udp://tracker.leechers-paradise.org:6969&tr=udp://exodus.desync.com:6969",
seeds: 0,
peers: 0,
provider: "EZTV" } },
...
}
}
}
Horriblesubs
Content from horriblesubs.info is grabbed through the horriblesubs-api module. The module contains two methods getAllAnime
and getAnimeData
. This module is based on eztv-api-pt module and the usage of the module within the API is very similar to the EZTV provider.
getAllAnime
This method returns a list of all the available shows listed here. Through the cheerio module it grabs the anime title, the slug, and the link to get more details about the anime.
[{
link: "/shows/91-days",
slug: "91-days",
title: "91 Days"
}, {
link: "/shows/absolute-duo",
slug: "absolute-duo",
title: "Absolute Duo"
}, ...]
getAnimeData
Each anime from the getAllAnime
can be passed into the getAnimeData
method to get more data on the individual anime. Through this process the slug can change to another slug which is compatible with Hummingbird.me. The hs_showid
is added and torrents are being added to the episodes
property which is compatible with the Helper
class to insert the torrents into the MongoDB database. Nested within the episodes
property there is the season number
within the season number
is the episode number
and within the episode number
are the different qualities
of the torrent.
{ link: "/shows/91-days",
slug: "ninety-one-days",
title: "91 Days",
hs_showid: "731",
episodes:
{ "1":
{ "1":
{ "480":
{ url: "magnet:?xt=urn:btih:AYIJKPLP5WVVF36O25JBB3FFPNJEBBPQ&tr=http://open.nyaatorrents.info:6544/announce&tr=udp://tracker.openbittorrent.com:80/announce&tr=udp://tracker.coppersurfer.tk:6969/announce",
seeds: 0,
peers: 0,
provider: "HorribleSubs" } },
...
}
}
}
KAT
Content from kat.cr is grabbed with so-called KAT providers
which are defined in the
katAnimeProviders
, katMovieProviders
and katShowProviders
arrays. The KAT providers will be converted to a search query to kat.cr by the kat-api-pt module.
The following query
properties can be used:
- query # Search for keywords
- category # The category to search for e.g. tv or movies
- uploader # The name of the uploader of the torrents
- min_seeds # The minimum amount of seeds
- age # The age of the torrents
- min_files # The minimum amount of files
- imdb # The imdb id for a tv show (only works with category:tv)
- tvrage # The tvrage id for a tv show (only works with category:tv)
- language # The language of the movie/tv show e.g. en or pl
- adult_filter # Filter out the adult torrents
- verified # Show only the verified torrents
- season # Season number of a tv show (only works with category:tv)
- episode # Episode number of a tv show (only works with category:tv)
- page # The page to search on.
- sort_by # Sort by property
- order # Order the list asc or desc
All three types of content can be scraped from kat.cr through the KAT
class in each folder of the providers. By default the KAT
class adds a few default properties to the providers. The page
property does not need to be indicated since the algorithm for scraping kat.cr will go through all the available pages (max of 400 pages/10.000 torrents due to site limitations). The category
property will also have a default value to its corresponding content. The adult_filter
and verified
properties are also turned on by default to filter out any potential harmful content.
An example of a provider:
{
name: "ZonerLOL",
query: {
query: "x264-LOL",
min_seeds: 3
}
}
If you want to make a provider for kat.cr it is highly recommended you try it first in the browser by manually going to kat.cr and search for the content. This is because the title of the torrent will be subjected to regular expressions by the Extractors
to 'extract' information about the torrent which will be used to find metadata.
Nyaa
Additional anime content can be scraped from nyaa.se, the method for scraping the content is similar to the ExtraTorrent and the KAT method. It uses the nyaaAnimeProviders
array, each Object
in the array will be converted to a search query to nyaa.se. This is done by the nyaa-api-pt module.
Each provider needs a name
property and a query
property. The name
property is a String
will be used for logging purposes so that issues with the provider can be figured out easier. The query
property is an Object
which can contain various properties. These properties will be converted into a search query to nyaa.se:
The following query
properties can be used:
- filter # Trusted uploader filter
- category # The category to filter
- sub_category # The sub category to filter
- term # A search term
- user # The id of the uploader
- offset # The page to search on
Only anime will be scraped on nyaa.se, this is because nyaa.se is focuses on East Asian content. The Nyaa
class will automaticly add the category
and sub_category
properties. The offset
does not to be indicated since the algorithm for scraping nyaa.se will go through all the available pages (max of 100 pages/10.500 torrent due to site limitations).
An example of a provider:
{
name: "Commie",
query: {
term: "mkv",
user: 76430,
filter: "trusted_only"
}
}
If you want to make a provider for nyaa.se it is highly recommended you try if first in the browser by manually to nyaa.se and search for the content. This is because the title of the torrent will be subjected to regular expression by the Extractor
for anime content. The information the extractor 'extracts' will be used by the metadata providers.
YTS
NOTE: This provider will most likely be changed to use a YTS API wrapper module. No API wrappers exists for YTS which are using promises, so one needs to be made.
Extractors
The extractors are made to get torrents from the content provider and extract content data from torrents.
Base Extractor
The base extractor is made to extract all the torrents from the ExtraTorrent, KAT and Nyaa content providers. It has a method to iterate through all the available pages from the content provider and return all the torrents it has found. All extractors will extend this class.
Anime Extractor
The regular expression needs to get a title
, episode
and a quality
property. A season
property is optional, if the season
is not in the episode title it will assume the torrent is from season
1. Down below you can see the method to get the needed data for an anime episode. If your content does not match any of these regular expressions, you can add the regular expression to the method.
_getAnimeData(torrent) {
const secondSeasonQuality = /\[.*\].(.*)\W+S(\d)...(\d{2,3})\W+(\d{3,4}p)/i; // [HorribleSubs] Fairy Tail S2 - 70 [1080p].mkv
const oneSeasonQuality = /\[.*\].(\D+)...(\d{2,3})\W+(\d{3,4}p)/i; // [HorribleSubs] Gangsta - 06 [480p].mkv
const secondSeason = /\[.*\].(\D+).S(\d+)...(\d{2,3}).*\.mkv/i; // [Commie] The World God Only Knows S2 - 12 [C0A4301E].mkv
const oneSeason = /\[.*\].(\D+)...(\d{2,3}).*\.mkv/i; // [Commie] Battery - 05 [38EC4270].mkv
if (torrent.title.match(secondSeasonQuality)) {
return this._extractAnime(torrent, secondSeasonQuality);
} else if (torrent.title.match(oneSeasonQuality)) {
return this._extractAnime(torrent, oneSeasonQuality);
} else if (torrent.title.match(secondSeason)) {
return this._extractAnime(torrent, secondSeason);
} else if (torrent.title.match(oneSeason)) {
return this._extractAnime(torrent, oneSeason);
} else {
logger.warn(`${this.name}: Could not find data from torrent: '${torrent.title}'`);
}
};
Movie Extractor
The regular expression for movies needs to get a title
, year
and a quality
property. Down below you can see the method to get the needed data for a movie. If your content does not match any of these regular expressions, you can add the regular expression to the method.
_getMovieData(torrent, language) {
const threeDimensions = /(.*).(\d{4}).[3Dd]\D+(\d{3,4}p)/; // Journey to Space 2015 3D 1080p BRRip Half-SBS x264 AAC-ETRG
const fourKay = /(.*).(\d{4}).[4k]\D+(\d{3,4}p)/; // Spider Man 2002 4K REMASTERED Bluray 1080p TrueHD x264-Grym
const withYear = /(.*).(\d{4})\D+(\d{3,4}p)/; // Batman Begins 2005 720p BluRay x264 AC3 - Ozlem
if (torrent.title.match(threeDimensions)) {
return this._extractMovie(torrent, language, threeDimensions);
} else if (torrent.title.match(fourKay)) {
return this._extractMovie(torrent, language, fourKay);
} else if (torrent.title.match(withYear)) {
return this._extractMovie(torrent, language, withYear);
} else {
console.warn(`${this.name}: Could not find data from torrent: '${torrent.title}'`);
}
};
Show Extractor
The regular expression for shows needs to get a title
, season
, episode
and a quality
property. Down below you can see the method to get the needed data for a show episode. If your content does not match any of these regular expressions, you can add the regular expression to the method.
_getShowData(torrent) {
const seasonBased = /(.*).[sS](\d{2})[eE](\d{2})/; // Dexter S08E09 720p HDTV x264-IMMERSE
const vtv = /(.*).(\d{1,2})[x](\d{2})/; // The Whispers 1x09 (HDTV-x264-KILLERS)[VTV]
const dateBased = /(.*).(\d{4}).(\d{2}.\d{2})/; // Jimmy Fallon 2016 08 02 Jonah Hill HDTV x264-CROOKS
if (torrent.title.match(seasonBased)) {
return this._extractShow(torrent, seasonBased, false);
} else if (torrent.title.match(vtv)) {
return this._extractShow(torrent, vtv, false);
} else if (torrent.title.match(dateBased)) {
return this._extractShow(torrent, dateBased, true);
} else {
console.warn(`${this.name}: Could not find data from torrent: '${torrent.title}'`);
}
};
Helpers
The helper.js
classes in each provider folder helps the providers to insert the scraped data into the MongoDB database. The providers need to call two methods.
Anime & Show Helpers
The first method to call is getHummingbirdInfo
for anime and getTraktInfo
for shows. These methods need a slug as a parameter (getTraktInfo
can also use an imdb id). These methods will fetch metadata from Hummingbird.me or Trakt.tv and return an object based on the schema of the mongoose model, but without any episodes.
getTraktInfo(slug);
getHummingbirdInfo(slug);
The second method to call is the addEpisodes
method to attach the episodes to the object returned by getHummingbirdInfo
or getTraktInfo
. This object is the first parameter, the second one is the episodes object and the third parameter is the slug again.
addEpisodes(anime/show, episodes, slug);
The episodes are structured in a particular way. In the episodes object you first have the seasons represented by a number. Nested in each season is another object which is the episode which is also represented by a number. In the episode object you have the qualities available for the episode. These qualities can be 480p
, 720p
or 1080p
. Finally inside the quality object you have the url
to the torrent or magnet link, the amount of seeds
and peers
and lastly the name of the provider.
{
"1": {
"1": {
"480p": {
url: "magnet:?xt=urn:btih:LMJXHHNOW33Z3YGXJLCTJZ23WK2D6VO4&dn=10.OClock.Live.S01E01.WS.PDTV.XviD-PVR&tr=udp://tracker.openbittorrent.com:80&tr=udp://open.demonii.com:80&tr=udp://tracker.coppersurfer.tk:80&tr=udp://tracker.leechers-paradise.org:6969&tr=udp://exodus.desync.com:6969",
seeds: 0,
peers: 0,
provider: "EZTV"
}
}
}
}
Movie Helper
The first method to call is getTraktInfo
. This method need a slug as a parameter, but can also use an imdb id). This method will fetch metadata from Trakt.tv and return an object based on the schema of the mongoose model, but without any torrents.
getTraktInfo(slug);
The second method to call is the addTorrents
method to attach the torrents to the object returned by getTraktInfo
. This object is the first parameter, the second one is the torrents for the movie.
addTorrents(movie, torrents);
The torrents are structured in a particular way. In the torrents object you first have the language of the torrents represented by a language code e.g. en
. Nested inside the language are the qualities of the torrents. These qualities can be 720p
or 1080p
. Finally inside the quality object you have the url
to the torrent or magnet link, the amount of seeds
and peers
, the size
of the torrent in bits, the fileSize
which is a more easily readable version of size
and lastly the name of provider.
{
"en": {
"720p": {
url: "magnet:?xt=urn:btih:1BEA4C992D1F7A765F3C943E627E881AC7FDAA35&tr=udp://glotorrents.pw:6969/announce&tr=udp://tracker.opentrackr.org:1337/announce&tr=udp://torrent.gresille.org:80/announce&tr=udp://tracker.openbittorrent.com:80&tr=udp://tracker.coppersurfer.tk:6969&tr=udp://tracker.leechers-paradise.org:6969&tr=udp://p4p.arenabg.ch:1337&tr=udp://tracker.internetwarriors.net:1337",
seed: 156,
peer: 44,
size: 819829146,
filesize: "781.85 MB",
provider: "YTS"
}
}
}
Metadata Providers
Metadata providers are providers which get data on a movie or get seasonal information from a show. Popcorn API uses two API services to get its metadata on anime, movies and shows.
Trakt.tv
Trakt.tv is the metadata provider for movies and shows. It uses a module from Jean van Kasteel called trakt.tv. For more information about the Trakt API you can click here.
TheTVDB.com
TheTVDB.com is the metadata provider for shows which have are datebased like '@Midnight'. It uses a module from Ed Wellbrook called node-tvdb. For more information about the TVBDB API you can click here.
Hummingbird.me
Hummingbird.me is the metadata provider for anime. It uses the hummingbird-api module. For more information about the Hummingbird API you can click here.
Fanart.tv
Fanart.tv is the provider of the images used by movies and shows. It uses the fanart.tv-api. For more information about the Fanart API you can click here.
OMDBapi.com
OMDBapi.com is the provider of the images used by movies. It uses the omdb-api-pt. For more information about the OMDB API you can click here.
TheMovieDB.org
TheMovieDB.org is the provider for the images used by movies and shows. It uses a module from sarathkcm called themoviedbclient. For more information about The MovieDB you can click here.