Generate XML Sitemaps

An intelligent site crawler that respects robots.txt, parses image tags, optimizes fetches with conditional HTTP caching, and writes compressed sitemaps.

100 pages

Capping crawler stops loops and limits server usage.

System Heuristics StatusReady

Incremental Caching & Gzip

Utilizes ETags and Last-Modified times to perform conditional fetches (304 Fast Path), and delivers XML and GZ files.

Image & shadow DOM parsing

Traverses shadow roots and extracts image elements to build rich Google Image schema sitemaps.

Robots.txt & Redirections

Ethically handles redirects, consolidates protocols, and parses wildcards and Allow rules using RFC 9309 criteria.

Asynchronous Redis Queue & Stability

Crawls run via BullMQ workers with automatic Chromium recycling and SIGKILL cleanups to mitigate leaks and CFG crashes.

Enter URL and click "Start Crawler" above to begin.
Smart SPA Detection

Automatically evaluates index source parameters to detect CSR apps. Restricts heavy Puppeteer launch threads solely to Javascript-rendered frameworks.

Robots.txt Compliance

Maintains strict crawl safety rules. Auto-discovers indexing sitemaps and bypasses disallow routes to crawl websites respectfully based on RFC 9309 standards.

Queue & Worker Engine

Decouples crawling execution from the main API thread. Uses BullMQ and Redis to queue and process long-running jobs reliably with stability guarantees.

SSE Progress Streams

Streams download details, crawling counters, and status updates back to user client layouts in real-time, removing REST polling loops.