How does Innumbra discover and index dark web sites?

Innumbra uses a multi-source discovery pipeline with four phases: (1) bulk imports from public directories like Ahmia, (2) Automated link crawling that follows links between .onion sites to discover new services, (3) ransomware CTI feed integration from sources like deepdarkCTI and ransomwatch, and (4) user submissions. Each discovered site is periodically re-checked for uptime, with metadata extracted including server software, technology stack, content hashes, and OSINT entities.

What search filters and operators does Innumbra support?

Supported search operators include: status:online or status:offline for uptime filtering, cti:ransomware or cti:marketplace for threat type filtering, tech:nginx or tech:wordpress for technology stack filtering, lang:ru or lang:en for language filtering, and after:2025-01 or before:2026-01 for date range filtering. These can be combined freely with keyword searches. Users can also paste entity values directly — bitcoin addresses, email addresses, PGP fingerprints, or domain names — to find all sites sharing that identifier.

What OSINT entities does Innumbra extract from dark web sites?

Innumbra automatically extracts 17+ types of OSINT entities: Bitcoin addresses, Ethereum addresses, Monero addresses, Dogecoin addresses, Zcash addresses, email addresses, PGP key fingerprints, Jabber/XMPP identifiers, Telegram handles, Tox messenger IDs, I2P eepsite addresses, clearnet domain references, and more. The correlator uses shared entities across sites to detect common operators. Each entity type enables cross-site pivot analysis — searching for one bitcoin address reveals every indexed site that references it.

How does Innumbra detect mirrors and related dark web sites?

Innumbra uses content hashing of normalized page content to detect mirror sites serving identical content across different .onion addresses. It also uses operator correlation via shared entities (same crypto wallets, PGP keys, or email addresses across sites), favicon hash matching, and link topology analysis to identify sites likely operated by the same entity.

Innumbra

What Is Innumbra?

Innumbra is a free, metadata-only dark web meta-search engine that indexes .onion hidden services on the Tor network. It is designed for security researchers, journalists, and threat intelligence analysts who need to discover, monitor, and analyze Tor hidden services without accessing their content directly.

Innumbra extracts and indexes: site titles, HTTP server headers, technology stacks (CMS, frameworks, databases), page language, uptime history, content hashes for mirror detection, OSINT entities (cryptocurrency wallets, PGP keys, email addresses, messaging handles), link relationships between sites, warrant canaries, law enforcement seizure banners, and threat intelligence classifications. No actual page content is stored, cached, or proxied.

Last updated: February 27, 2026 · 5,583 services indexed · 1,174 currently online

How to Search the Dark Web With Innumbra

Innumbra supports keyword search, advanced filter operators, and direct entity lookups. You can combine any of these in a single query to narrow results precisely.

Keywords: marketplace bitcoin
Entity search: bc1qxy2k...xyz or user@mail.com
Status filter: status:online status:offline
CTI type: cti:ransomware cti:marketplace
Technology: tech:nginx tech:wordpress
Language: lang:ru lang:de lang:en
Date range: after:2025-01 before:2026-01
Combine freely: forum status:online lang:en

Entity pivot search: Paste any entity value directly into the search bar — a Bitcoin address, email, PGP fingerprint, Jabber ID, or clearnet domain. Innumbra will show every indexed .onion site that references that identifier, enabling cross-site pivot analysis for OSINT investigations.

What OSINT Entities Does Innumbra Extract?

The automated crawler extracts 17+ categories of identifiers from every indexed page. These entities enable cross-referencing between sites — for example, searching a single Bitcoin wallet address reveals every .onion service that shares it, a technique used in threat intelligence and law enforcement investigations.

💰 bitcoin — BTC wallet addresses

💎 ethereum — ETH wallet addresses

🔑 monero — XMR wallet addresses

🐕 dogecoin — DOGE addresses

🛡 zcash — ZEC transparent + shielded

📧 email — Email addresses

🔐 pgp_fingerprint — PGP key fingerprints

💬 jabber — Jabber/XMPP IDs

✈ telegram — Telegram handles

📞 tox_id — Tox messenger IDs

🌐 clearnet_ref — Clearnet dependencies

🧅 onion_link — Linked .onion services

🔗 i2p_address — I2P eepsite addresses

How Does Innumbra Discover Dark Web Sites?

Innumbra uses a four-phase discovery pipeline that combines public intelligence feeds, automated crawling, and community submissions:

Public directory imports — Bulk ingestion from Ahmia's hidden service index (18,000+ addresses), with CSAM blocklist filtering applied before any address enters the database.
Threat intelligence feeds — Automated imports from deepdarkCTI, ransomwatch, RansomLook, and curated CISA advisory sources covering ransomware data leak sites, dark web marketplaces, and forums.
automated link crawling — The crawler follows outbound links from indexed sites to discover new .onion addresses, building a link graph that maps relationships between hidden services.
User submissions — Community-submitted .onion addresses are queued for verification and indexing after blocklist screening.

Every indexed site undergoes periodic status checks with retry logic (3 attempts per check, fresh Tor circuit per attempt). Sites confirmed offline across 5 consecutive check cycles are pruned from the active index. Technology fingerprinting, entity extraction, content hashing, and operator correlation run as scheduled background tasks.

What Analysis Features Does Innumbra Provide?

Link graph visualization	Interactive force-directed graph showing how .onion sites link to each other, revealing clusters and hub sites.
Mirror detection	Content hashing identifies sites serving identical content across different .onion addresses.
Operator correlation	Identifies sites likely run by the same entity using shared crypto wallets, PGP keys, favicon hashes, and link topology.
Warrant canary tracking	Detects and monitors PGP-signed warrant canary statements across indexed sites.
Seizure detection	Identifies law enforcement seizure banners in 12 languages from 13+ agencies (FBI, Europol, BKA, etc.).
Technology fingerprinting	Identifies web servers, CMS platforms, frameworks, programming languages, and database technologies from HTTP headers and HTML signatures.
Change detection	Tracks title changes, status transitions (online↔offline), and content modifications with timestamped history.

Technical Deep Dive

How Does Operator Correlation Assign Confidence Scores?

Innumbra's operator correlator uses a proprietary probabilistic model to combine independent signals into a single confidence score. Each signal type has a base weight reflecting its discriminative power:

PGP key fingerprints	0.98	Nearly unique per operator
Email addresses	0.95	Strong identity signal
Bitcoin addresses	0.90	Payment infrastructure
Content hash	0.85	Identical page content
Server fingerprint combo	0.75	Same server + framework + language
Favicon hash	0.70	Same visual identity
Fuzzy content hash	0.65	Near-duplicate content

Default/common values (e.g., "admin@localhost", standard Apache error pages) are blocklisted to prevent false positives. Independent signal probabilities are combined into a single composite confidence score.

What Entity Types Does the Crawler Extract?

The entity extraction pipeline uses pattern matching optimized for each identifier format. Current coverage includes 17+ entity types across 5 categories:

Cryptocurrency: Bitcoin (1/3/bc1 prefixes), Ethereum (0x, 40 hex), Monero (4/8 prefix, 95 chars), Dogecoin (D prefix), Zcash (t1/zs transparent + shielded)
Communication: Email addresses, Jabber/XMPP JIDs, Telegram handles (@username), Tox IDs (64 hex chars)
Cryptographic: PGP key fingerprints (40 hex chars, space-separated groups)
Network: .onion links (v2 16-char and v3 56-char), I2P eepsite addresses (.i2p), clearnet domain references
Image-derived (OCR): All of the above extracted from images via automated OCR, catching entities hidden in screenshots and image-based text

How Is the OPSEC Score Calculated?

Each site starts at a base score of 50/100. The scoring engine applies penalties and rewards based on observable indicators:

Penalties
IP address in headers	-25
Clearnet domain references	-20
Server version exposed	-15
Framework/runtime exposed	-10
Insecure cookies (no Secure/HttpOnly)	-10
Missing each security header	-5 each
Rewards
Content-Security-Policy present	+8
X-Content-Type-Options present	+5
Secure + HttpOnly cookies	+5

Score ranges: 0–29 (poor), 30–69 (moderate), 70–100 (strong). The OPSEC badge appears on every site dossier page.

Tags all tags →

Frequently Asked Questions

What is Innumbra and who is it for?

Innumbra is a free dark web meta-search engine that indexes .onion hidden services on the Tor network. It is built for OSINT (Open Source Intelligence) professionals, cybersecurity researchers, journalists investigating dark web activity, and threat intelligence analysts. Unlike content search engines, Innumbra indexes only metadata — titles, technology stacks, uptime status, extracted entities, and structural relationships — without storing any page content.

How is Innumbra different from other dark web search engines?

Most dark web search engines index page content for full-text search. Innumbra takes a different approach: it indexes metadata only, focusing on structural intelligence. This includes technology fingerprinting (identifying web servers, CMS platforms, and frameworks), entity extraction (cryptocurrency wallets, PGP keys, email addresses), operator correlation (linking sites to the same operator via shared identifiers), mirror detection (identifying duplicate sites via content hashing), and uptime monitoring with historical tracking.

What data sources does Innumbra use?

Innumbra aggregates from multiple intelligence sources: Ahmia's public hidden service index, deepdarkCTI threat feeds (markets, forums, ransomware groups), ransomwatch and RansomLook ransomware tracking APIs, curated onion directories, curated CTI repositories, clearnet aggregator sites, and automated automated crawl discovery. All imported addresses are screened against the Ahmia CSAM blocklist before indexing.

Does Innumbra store or cache dark web page content?

No. Innumbra is strictly a metadata-only index. It stores site titles, HTTP response headers, technology fingerprints, uptime history, content hashes (for mirror detection), and extracted entity identifiers. It does not store, cache, proxy, or reproduce any actual page content from .onion hidden services.

How does operator correlation work?

Operator correlation identifies .onion sites likely operated by the same person or group. Innumbra analyzes multiple signals with weighted confidence scores: shared cryptocurrency wallet addresses (0.90 confidence), shared PGP key fingerprints (0.98), shared email addresses (0.95), matching favicon image hashes (0.70), identical content hashes (0.85), matching server fingerprints (0.75), and link topology patterns. Sites exceeding the confidence threshold are grouped into operator clusters.

Is Innumbra free to use?

Yes. Innumbra is completely free to use with no accounts, rate limits, or paywalls. The search engine, entity explorer, link graph visualization, site comparison tool, and all analysis features are openly accessible. The platform also provides an API, RSS feeds, an llms.txt file for AI systems, and a sitemap for search engine indexing.

How does Innumbra detect mirror sites and clones?

Innumbra uses two complementary methods for mirror detection: exact content hashing creates fingerprints — two sites with identical hashes are confirmed mirrors. Fuzzy hashing detects near-duplicates even when sites differ slightly (e.g., different headers but same body content). Sites sharing the same content hash are automatically grouped on the mirror detection page. Near-duplicate detection runs efficiently across the entire index.

What is an OPSEC score and how is it calculated?

Innumbra rates each site's operational security on a 0–100 scale. The score starts at 50 and applies penalties for security failures: exposing server software versions (-15), clearnet domain references (-20), leaking IP addresses in headers (-25), missing security headers (-5 each), insecure cookies (-10), and exposing framework details (-10). Rewards are given for good practices: security headers like Content-Security-Policy (+8), X-Content-Type-Options (+5), and proper cookie flags (+5). Sites with scores below 30 are considered high-risk; above 70 indicates strong operational security.

Can I use Innumbra to trace a Bitcoin address across multiple dark web sites?

Yes. Innumbra extracts cryptocurrency wallet addresses (Bitcoin, Ethereum, Monero, Dogecoin, Zcash) from every indexed page. Paste any wallet address directly into the search bar to find every .onion site that references it. This entity pivot search is one of the most powerful features for threat intelligence and financial investigations — if the same Bitcoin address appears on three different dark web markets, those services may share an operator or payment processor. The entity explorer shows all extracted entities with cross-site relationships.

How does Innumbra handle illegal content and CSAM?

Innumbra applies the Ahmia CSAM blocklist to every address before it enters the index. Known child abuse sites are permanently excluded at the import stage. Innumbra does not store, cache, or proxy any page content — it indexes only metadata (titles, headers, technology fingerprints, entity identifiers). Users can report sites for removal via the content removal page. The platform is designed exclusively for legitimate OSINT research, journalism, and threat intelligence work.

Innumbra — Dark Web Meta-Search Engine for OSINT Research