web-scraping

Server path: /web-scraping | Type: Embedded | PCID required: No

Tools

Tool	Description
`web-scraping_scrape`	Scrape content from one or more web pages. Returns clean markdown, HTML, or structured data. Supports browser actions like screenshots, clicks, and scrolling for dynamic content. Use this for extracting content from specific URLs.
`web-scraping_crawl`	Crawl a website starting from one or more URLs to discover and scrape multiple pages. Follows links within the site with configurable depth limits and path filtering. Use this to extract content from entire websites or specific sections.
`web-scraping_map`	Generate a map of all URLs on a website without scraping content. Discovers pages via links and sitemap. Use this to understand site structure, find specific pages, or plan what to crawl/scrape.
`web-scraping_rss`	Read and parse RSS/Atom feeds from URLs. Supports checking feed validity, fetching all items, searching items by content, and getting the latest items sorted by date.

web-scraping_scrape

Scrape content from one or more web pages. Returns clean markdown, HTML, or structured data. Supports browser actions like screenshots, clicks, and scrolling for dynamic content. Use this for extracting content from specific URLs. Parameters:

Parameter	Type	Required	Default	Description
`urls`	string[]	Yes	—	Array of URLs to scrape (required). Can be full URLs or just domain names like “google.com”
`formats`	string[]	No	—	Output formats: “markdown” (default), “html”, “rawHtml”, “links”, “summary”
`onlyMainContent`	boolean	No	—	Extract only main content, excluding headers/footers/nav (default: true)
`removeBase64Images`	boolean	No	—	Remove base64 encoded images from output (default: true)
`waitFor`	number	No	—	Milliseconds to wait before scraping. Use for pages with dynamic content that loads after initial render. Example: 2000 for 2 seconds
`actions`	object[]	No	—	Browser actions to perform before scraping. Actions execute in order. Examples:

Wait: {“type”: “wait”, “milliseconds”: 2000}
Click button: {“type”: “click”, “selector”: “button.load-more”}
Scroll down: {“type”: “scroll”, “selector”: “body”, “direction”: “down”}
Type in input: {“type”: “write”, “selector”: “#search”, “text”: “search query”}
Press Enter: {“type”: “press”, “key”: “Enter”}
Take screenshot: {“type”: “screenshot”, “fullPage”: true} | | includeTags | string[] | No | — | HTML tags to include (e.g., [“div”, “p”, “h1”]) | | excludeTags | string[] | No | — | HTML tags to exclude (e.g., [“script”, “style”]) | | location | object | No | — | Location/language settings for geo-specific content |

Show inputSchema

{
  "type": "object",
  "properties": {
    "urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Array of URLs to scrape (required). Can be full URLs or just domain names like \"google.com\""
    },
    "formats": {
      "type": "array",
      "items": {
        "type": "string",
        "enum": [
          "markdown",
          "html",
          "rawHtml",
          "links",
          "summary"
        ]
      },
      "default": [
        "markdown"
      ],
      "description": "Output formats: \"markdown\" (default), \"html\", \"rawHtml\", \"links\", \"summary\""
    },
    "onlyMainContent": {
      "type": "boolean",
      "default": true,
      "description": "Extract only main content, excluding headers/footers/nav (default: true)"
    },
    "removeBase64Images": {
      "type": "boolean",
      "default": true,
      "description": "Remove base64 encoded images from output (default: true)"
    },
    "waitFor": {
      "type": "number",
      "description": "Milliseconds to wait before scraping. Use for pages with dynamic content that loads after initial render. Example: 2000 for 2 seconds"
    },
    "actions": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type": {
            "type": "string",
            "enum": [
              "wait",
              "click",
              "scroll",
              "screenshot",
              "write",
              "press"
            ],
            "description": "Action type to perform"
          },
          "milliseconds": {
            "type": "number",
            "description": "For \"wait\" action: milliseconds to wait"
          },
          "selector": {
            "type": "string",
            "description": "CSS selector for click/scroll/write. Examples: \"button.submit\", \"#search-input\", \"[data-testid=login]\""
          },
          "direction": {
            "type": "string",
            "enum": [
              "up",
              "down"
            ],
            "description": "For \"scroll\" action: direction to scroll"
          },
          "fullPage": {
            "type": "boolean",
            "description": "For \"screenshot\" action: true to capture entire page"
          },
          "text": {
            "type": "string",
            "description": "For \"write\" action: text to type into the selected element"
          },
          "key": {
            "type": "string",
            "description": "For \"press\" action: key to press (e.g., \"Enter\", \"Tab\", \"Escape\")"
          }
        }
      },
      "description": "Browser actions to perform before scraping. Actions execute in order. Examples:\n- Wait: {\"type\": \"wait\", \"milliseconds\": 2000}\n- Click button: {\"type\": \"click\", \"selector\": \"button.load-more\"}\n- Scroll down: {\"type\": \"scroll\", \"selector\": \"body\", \"direction\": \"down\"}\n- Type in input: {\"type\": \"write\", \"selector\": \"#search\", \"text\": \"search query\"}\n- Press Enter: {\"type\": \"press\", \"key\": \"Enter\"}\n- Take screenshot: {\"type\": \"screenshot\", \"fullPage\": true}"
    },
    "includeTags": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "HTML tags to include (e.g., [\"div\", \"p\", \"h1\"])"
    },
    "excludeTags": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "HTML tags to exclude (e.g., [\"script\", \"style\"])"
    },
    "location": {
      "type": "object",
      "properties": {
        "country": {
          "type": "string",
          "description": "ISO country code (US, GB, DE)"
        },
        "languages": {
          "type": "array",
          "items": {
            "type": "string"
          },
          "description": "Language preferences"
        }
      },
      "description": "Location/language settings for geo-specific content"
    }
  },
  "required": [
    "urls"
  ]
}

web-scraping_crawl

Crawl a website starting from one or more URLs to discover and scrape multiple pages. Follows links within the site with configurable depth limits and path filtering. Use this to extract content from entire websites or specific sections. Parameters:

Parameter	Type	Required	Default	Description
`urls`	string[]	Yes	—	Starting URLs to crawl from (required). Can be full URLs or just domain names like “google.com”
`limit`	number	No	—	Maximum number of pages to crawl (default: 10)
`maxDepth`	number	No	—	Maximum link depth to follow from starting URL
`includePaths`	string[]	No	—	Only crawl URLs matching these glob patterns (e.g., [“/blog/*”])
`excludePaths`	string[]	No	—	Skip URLs matching these glob patterns (e.g., [“/admin/*”])
`allowExternalLinks`	boolean	No	—	Allow crawling external domains
`allowSubdomains`	boolean	No	—	Include subdomains in crawl
`scrapeOptions`	object	No	—	Options to apply when scraping each crawled page

Show inputSchema

{
  "type": "object",
  "properties": {
    "urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Starting URLs to crawl from (required). Can be full URLs or just domain names like \"google.com\""
    },
    "limit": {
      "type": "number",
      "default": 10,
      "description": "Maximum number of pages to crawl (default: 10)"
    },
    "maxDepth": {
      "type": "number",
      "description": "Maximum link depth to follow from starting URL"
    },
    "includePaths": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Only crawl URLs matching these glob patterns (e.g., [\"/blog/*\"])"
    },
    "excludePaths": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Skip URLs matching these glob patterns (e.g., [\"/admin/*\"])"
    },
    "allowExternalLinks": {
      "type": "boolean",
      "default": false,
      "description": "Allow crawling external domains"
    },
    "allowSubdomains": {
      "type": "boolean",
      "default": false,
      "description": "Include subdomains in crawl"
    },
    "scrapeOptions": {
      "type": "object",
      "properties": {
        "formats": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "onlyMainContent": {
          "type": "boolean"
        },
        "proxy": {
          "type": "string"
        },
        "waitFor": {
          "type": "number"
        }
      },
      "description": "Options to apply when scraping each crawled page"
    }
  },
  "required": [
    "urls"
  ]
}

web-scraping_map

Generate a map of all URLs on a website without scraping content. Discovers pages via links and sitemap. Use this to understand site structure, find specific pages, or plan what to crawl/scrape. Parameters:

Parameter	Type	Required	Default	Description
`urls`	string[]	Yes	—	Starting URLs to map from (required). Can be full URLs or just domain names like “google.com”
`search`	string	No	—	Filter results to URLs containing this search term
`limit`	number	No	—	Maximum number of URLs to return (default: 100)
`includeSubdomains`	boolean	No	—	Include subdomains in the map
`sitemap`	string	No	—	Sitemap usage: “include” (default), “skip” (ignore sitemap), “only” (only use sitemap)

Show inputSchema

{
  "type": "object",
  "properties": {
    "urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Starting URLs to map from (required). Can be full URLs or just domain names like \"google.com\""
    },
    "search": {
      "type": "string",
      "description": "Filter results to URLs containing this search term"
    },
    "limit": {
      "type": "number",
      "default": 100,
      "description": "Maximum number of URLs to return (default: 100)"
    },
    "includeSubdomains": {
      "type": "boolean",
      "default": false,
      "description": "Include subdomains in the map"
    },
    "sitemap": {
      "type": "string",
      "enum": [
        "include",
        "skip",
        "only"
      ],
      "default": "include",
      "description": "Sitemap usage: \"include\" (default), \"skip\" (ignore sitemap), \"only\" (only use sitemap)"
    }
  },
  "required": [
    "urls"
  ]
}

web-scraping_rss

Read and parse RSS/Atom feeds from URLs. Supports checking feed validity, fetching all items, searching items by content, and getting the latest items sorted by date. Parameters:

Parameter	Type	Required	Default	Description
`action`	string	Yes	—	Action to perform: “check” (validate feed and get basic info), “get” (fetch all feed items), “search” (search items by query), “get_latest” (get most recent items by date)
`url`	string	Yes	—	The URL of the RSS/Atom feed
`timeout`	number	No	`10000`	Request timeout in milliseconds (default: 10000)
`limit`	number	No	—	For “get” and “search” actions: maximum number of items to return
`query`	string	No	—	For “search” action: search query to match against item title, description, or content
`caseSensitive`	boolean	No	`false`	For “search” action: whether search should be case-sensitive (default: false)
`count`	number	No	`10`	For “get_latest” action: number of latest items to return (default: 10)

Show inputSchema

{
  "type": "object",
  "properties": {
    "action": {
      "type": "string",
      "enum": [
        "check",
        "get",
        "search",
        "get_latest"
      ],
      "description": "Action to perform: \"check\" (validate feed and get basic info), \"get\" (fetch all feed items), \"search\" (search items by query), \"get_latest\" (get most recent items by date)"
    },
    "url": {
      "type": "string",
      "description": "The URL of the RSS/Atom feed"
    },
    "timeout": {
      "type": "number",
      "default": 10000,
      "description": "Request timeout in milliseconds (default: 10000)"
    },
    "limit": {
      "type": "number",
      "description": "For \"get\" and \"search\" actions: maximum number of items to return"
    },
    "query": {
      "type": "string",
      "description": "For \"search\" action: search query to match against item title, description, or content"
    },
    "caseSensitive": {
      "type": "boolean",
      "default": false,
      "description": "For \"search\" action: whether search should be case-sensitive (default: false)"
    },
    "count": {
      "type": "number",
      "default": 10,
      "description": "For \"get_latest\" action: number of latest items to return (default: 10)"
    }
  },
  "required": [
    "action",
    "url"
  ]
}

Triggers API

Platform API

Embedded MCP Servers

Application MCP Servers

Tools

web-scraping_scrape

web-scraping_crawl

web-scraping_map

web-scraping_rss

Triggers API

Platform API

Embedded MCP Servers

Application MCP Servers

​Tools

​web-scraping_scrape

​web-scraping_crawl

​web-scraping_map

​web-scraping_rss

Tools

web-scraping_scrape

web-scraping_crawl

web-scraping_map

web-scraping_rss