Raclette Raclette

How It Works

Architecture

Raclette is composed of several cooperating modules:

Collector → extracts URIs from inputs (independent)

Raclette (facade)
  └── Client (dispatcher)
        ├── Filter → excludes URIs by pattern/IP/scheme
        └── Check by scheme:
              ├── file:// → FileChecker
              └── http(s):// → HostPool → WebsiteChecker

Request Pipeline

When you call raclette.check(uri):

  1. Filter: the URI is checked against exclude/include patterns, IP rules, and false positive detection. Excluded URIs return immediately with an Excluded status.

  2. Route: the URI is dispatched based on its scheme:

    • file:// goes to the FileChecker
    • http:// and https:// go through rate limiting, then to the WebsiteChecker
    • mailto: and tel: are handled by the filter
  3. Check: the appropriate checker validates the URI and returns a Status.

FileChecker

The FileChecker resolves local file paths and optionally validates fragments:

WebsiteChecker

The WebsiteChecker handles HTTP requests with:

Rate Limiting

HTTP requests go through the HostPool, which enforces per-host limits:

Both semaphores use Semaphore.acquire(), which is virtual-thread-friendly and avoids Thread.sleep() or synchronized blocks.

Virtual Threads

Raclette is built on Java 21 virtual threads. All blocking operations (HTTP requests, semaphore waits, file I/O) run on virtual threads, giving you lightweight concurrency without callback complexity.

The HtmlExtractor uses JSoup to parse HTML and extract URIs from elements like <a>, <img>, <link>, <script>, <source>, <form>, and srcset attributes. Bare email addresses in text content are also detected.

Quirks

Some websites need special handling to avoid false positives. The quirks system rewrites URLs before checking: