Package: Rcrawler 0.1.9-1
Rcrawler: Web Crawler and Scraper
Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <doi:10.1016/j.softx.2017.04.004>.
Authors:
Rcrawler_0.1.9-1.tar.gz
Rcrawler_0.1.9-1.zip(r-4.5)Rcrawler_0.1.9-1.zip(r-4.4)Rcrawler_0.1.9-1.zip(r-4.3)
Rcrawler_0.1.9-1.tgz(r-4.4-any)Rcrawler_0.1.9-1.tgz(r-4.3-any)
Rcrawler_0.1.9-1.tar.gz(r-4.5-noble)Rcrawler_0.1.9-1.tar.gz(r-4.4-noble)
Rcrawler_0.1.9-1.tgz(r-4.4-emscripten)Rcrawler_0.1.9-1.tgz(r-4.3-emscripten)
Rcrawler.pdf |Rcrawler.html✨
Rcrawler/json (API)
# Install 'Rcrawler' in R: |
install.packages('Rcrawler', repos = c('https://salimk.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/salimk/rcrawler/issues
crawlercrawlersscraperwebcrawlerwebscraperwebscrapingwebscrapping
Last updated 5 years agofrom:f9f403e2c3. Checks:OK: 1 NOTE: 6. Indexed: yes.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Nov 08 2024 |
R-4.5-win | NOTE | Nov 08 2024 |
R-4.5-linux | NOTE | Nov 08 2024 |
R-4.4-win | NOTE | Nov 08 2024 |
R-4.4-mac | NOTE | Nov 08 2024 |
R-4.3-win | NOTE | Nov 08 2024 |
R-4.3-mac | NOTE | Nov 08 2024 |
Exports:browser_pathContentScraperDrv_fetchpageGetencodinginstall_browserLinkExtractorLinkNormalizationLinkparametersLinkparamsfilterListProjectsLoadHTMLFilesLoginSessionRcrawlerRobotParserrun_browserstop_browser
Dependencies:askpassbase64enccallrclicodetoolscrayoncurldata.tabledebugmedoParallelforeachgluehttriteratorsjsonlitelifecyclemagrittrmimeopensslpngprocessxpsR6rlangselectrshowimagestringistringrsysvctrswebdriverwithrxml2
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Return browser (webdriver) location path | browser_path |
ContentScraper | ContentScraper |
Fetch page using web driver/Session | Drv_fetchpage |
Getencoding | Getencoding |
Install PhantomJS webdriver | install_browser |
LinkExtractor | LinkExtractor |
Link Normalization | LinkNormalization |
Get the list of parameters and values from an URL | Linkparameters |
Link parameters filter | Linkparamsfilter |
ListProjects | ListProjects |
LoadHTMLFiles @rdname LoadHTMLFiles | LoadHTMLFiles |
Open a logged in Session | LoginSession |
Rcrawler | Rcrawler |
RobotParser fetch and parse robots.txt | RobotParser |
Start up web driver process on localhost, with a random port | run_browser |
Stop web driver process and Remove its Object | stop_browser |