Skip to main content
Server path: /web-scraping | Type: Embedded | PCID required: No

Tools

ToolDescription
web-scraping_scrapeScrape content from one or more web pages. Returns clean markdown, HTML, or structured data. Supports browser actions like screenshots, clicks, and scrolling for dynamic content. Use this for extracting content from specific URLs.
web-scraping_crawlCrawl a website starting from one or more URLs to discover and scrape multiple pages. Follows links within the site with configurable depth limits and path filtering. Use this to extract content from entire websites or specific sections.
web-scraping_mapGenerate a map of all URLs on a website without scraping content. Discovers pages via links and sitemap. Use this to understand site structure, find specific pages, or plan what to crawl/scrape.
web-scraping_rssRead and parse RSS/Atom feeds from URLs. Supports checking feed validity, fetching all items, searching items by content, and getting the latest items sorted by date.

web-scraping_scrape

Scrape content from one or more web pages. Returns clean markdown, HTML, or structured data. Supports browser actions like screenshots, clicks, and scrolling for dynamic content. Use this for extracting content from specific URLs. Parameters:
ParameterTypeRequiredDefaultDescription
urlsstring[]YesArray of URLs to scrape (required). Can be full URLs or just domain names like “google.com”
formatsstring[]NoOutput formats: “markdown” (default), “html”, “rawHtml”, “links”, “summary”
onlyMainContentbooleanNoExtract only main content, excluding headers/footers/nav (default: true)
removeBase64ImagesbooleanNoRemove base64 encoded images from output (default: true)
waitFornumberNoMilliseconds to wait before scraping. Use for pages with dynamic content that loads after initial render. Example: 2000 for 2 seconds
actionsobject[]NoBrowser actions to perform before scraping. Actions execute in order. Examples:
  • Wait: {“type”: “wait”, “milliseconds”: 2000}
  • Click button: {“type”: “click”, “selector”: “button.load-more”}
  • Scroll down: {“type”: “scroll”, “selector”: “body”, “direction”: “down”}
  • Type in input: {“type”: “write”, “selector”: “#search”, “text”: “search query”}
  • Press Enter: {“type”: “press”, “key”: “Enter”}
  • Take screenshot: {“type”: “screenshot”, “fullPage”: true} | | includeTags | string[] | No | — | HTML tags to include (e.g., [“div”, “p”, “h1”]) | | excludeTags | string[] | No | — | HTML tags to exclude (e.g., [“script”, “style”]) | | location | object | No | — | Location/language settings for geo-specific content |

web-scraping_crawl

Crawl a website starting from one or more URLs to discover and scrape multiple pages. Follows links within the site with configurable depth limits and path filtering. Use this to extract content from entire websites or specific sections. Parameters:
ParameterTypeRequiredDefaultDescription
urlsstring[]YesStarting URLs to crawl from (required). Can be full URLs or just domain names like “google.com”
limitnumberNoMaximum number of pages to crawl (default: 10)
maxDepthnumberNoMaximum link depth to follow from starting URL
includePathsstring[]NoOnly crawl URLs matching these glob patterns (e.g., [“/blog/*”])
excludePathsstring[]NoSkip URLs matching these glob patterns (e.g., [“/admin/*”])
allowExternalLinksbooleanNoAllow crawling external domains
allowSubdomainsbooleanNoInclude subdomains in crawl
scrapeOptionsobjectNoOptions to apply when scraping each crawled page

web-scraping_map

Generate a map of all URLs on a website without scraping content. Discovers pages via links and sitemap. Use this to understand site structure, find specific pages, or plan what to crawl/scrape. Parameters:
ParameterTypeRequiredDefaultDescription
urlsstring[]YesStarting URLs to map from (required). Can be full URLs or just domain names like “google.com”
searchstringNoFilter results to URLs containing this search term
limitnumberNoMaximum number of URLs to return (default: 100)
includeSubdomainsbooleanNoInclude subdomains in the map
sitemapstringNoSitemap usage: “include” (default), “skip” (ignore sitemap), “only” (only use sitemap)

web-scraping_rss

Read and parse RSS/Atom feeds from URLs. Supports checking feed validity, fetching all items, searching items by content, and getting the latest items sorted by date. Parameters:
ParameterTypeRequiredDefaultDescription
actionstringYesAction to perform: “check” (validate feed and get basic info), “get” (fetch all feed items), “search” (search items by query), “get_latest” (get most recent items by date)
urlstringYesThe URL of the RSS/Atom feed
timeoutnumberNo10000Request timeout in milliseconds (default: 10000)
limitnumberNoFor “get” and “search” actions: maximum number of items to return
querystringNoFor “search” action: search query to match against item title, description, or content
caseSensitivebooleanNofalseFor “search” action: whether search should be case-sensitive (default: false)
countnumberNo10For “get_latest” action: number of latest items to return (default: 10)