How We Got Here
Thousands of improvements. Every one driven by a real document that deserved better protection.
Day One
It started with a simple question: what if you could strip sensitive data from a document before it ever left your desk? We picked a country, picked a document type, and started writing the rules. We had no idea how deep this would go.
Not Good Enough
Five days in, we ran a real insurance claim through the system. It found the email address and the phone number. It missed the policy number, the claim reference, and three people’s names. So we tore the whole thing apart and started over. The second version caught what the first one couldn’t.
Nothing Leaves Your Machine
Most redaction tools send your documents to a server somewhere. We refused. We moved everything into the browser itself. Your files never leave your computer. Not to us, not to anyone. This was the moment the product became what it is.
See our privacy architecture →50 Countries. One Tool.
A Tax File Number in Australia looks nothing like a Social Security number in the US or an NHS number in the UK. Every country has its own ID formats, its own regulations, its own edge cases. We built recognition for 50+ of them, because your documents don’t stay inside one border.
How Sure Is Sure Enough?
Is "123-456-789" a phone number, a case reference, or a government ID? A blunt tool just highlights everything and leaves you to sort it out. We added confidence scoring. Now the system tells you how certain it is about each detection, so you can make the call. Your judgement, informed by ours.
Teaching It What to Ignore
The system was catching too much. Street addresses in the middle of paragraphs. Common words flagged as names. "Victoria" the state, not "Victoria" the person. We spent weeks on over a thousand targeted improvements, teaching it the difference between sensitive data and ordinary text. We broke it twice along the way. Worth it.
Knowing, Not Guessing
A random 9-digit number is just a number. But a real Tax File Number follows a specific mathematical formula. We built validators that check whether an ID is structurally genuine, not just whether it looks like one. The difference matters. This single change cut false flags in half.
8 Layers of Verification
A single scan catches maybe 40% of the sensitive data in a document. That’s not good enough when your client’s Medicare number is on the line. So we built eight verification layers. Each one catches what the last one missed. By the eighth pass, what’s flagged is real, and what’s clean is actually clean.
See how our 8 layers work →3,000 Things It Recognises
Medicare numbers. BSBs. ABNs. Passport numbers. Licence plates. Medical record IDs. Insurance claim references. Internal case numbers. Each one researched against real documents, each one validated. The library quietly crossed 3,000 recognised data types.
UK. Canada. New Zealand. Ireland.
An Australian medical referral is not a British one. A Canadian tax return has different identifiers than a New Zealand one. We added dedicated country configurations, each one tested against real document formats from that jurisdiction. Not textbook examples. The real thing.
Starting Over. Again.
Good enough wasn’t good enough. We redesigned the entire detection system from scratch. Industry-specific configurations. Smarter analysis. A completely new approach. The fourth major version. The one that finally works the way documents actually work.
29 Industry-Specific Engines
An insurance claim has different sensitive data than a legal contract. A medical referral has nothing in common with a construction invoice. We built 29 industry-specific engines: legal, healthcare, government, finance, HR, insurance, and more. Each one tuned for exactly the documents your team handles every day.
Explore all 29 industries →Back to the Drawing Board. One Last Time.
We tore the whole thing apart again. Five detection layers, each doing one thing ruthlessly well. Three detection layers for speed and precision. A contextual recognition layer for the patterns rules can’t catch. And 29 industry-trained engines, each scoring above 0.997 F1. 600+ entity types. This is the architecture we were always building toward.
Why This Exists
We watched a colleague paste production server logs straight into ChatGPT. They needed help writing a SQL query. Reasonable enough, except those logs contained customer names, IP addresses, and database credentials. All of it, sent to a cloud server, stored who-knows-where.
The next week, a director composed a customer response using AI. The entire complaint email (name, account number, billing address, the whole thing) pasted in as context. They weren't being reckless. They were trying to do their job faster, and the AI was genuinely good at it.
That's the thing. They weren't careless. They needed AI to help with real work, and real work contains real data. There was no practical way to strip the sensitive parts first. We come from infrastructure. We know exactly where that data goes.
There had to be a better way. One that runs entirely in the browser. One where nothing, not a single character, ever leaves your machine.
The First Attempt
The first commit. Country selection, document type patterns, and a detection engine that could find phone numbers and email addresses. It was basic. It was slow. But it worked. Kind of.
The first real test was humbling. We ran a genuine insurance claim through the engine. It found the obvious things: the email, the phone number. It completely missed the policy number, the claim reference, the BSB, the Medicare number, and three people's names. Pattern matching alone catches maybe 40% of the sensitive data in a real document. The other 60% blends in, hiding in plain text where pattern matching will never find it.
Two days later, we ran a legal contract through the updated engine and it flagged a Tax File Number that would have gone straight into an AI prompt. That was the first moment we knew this mattered. Not because the technology was impressive. Because the absence of this tool was genuinely dangerous.
The Hard Problems
Every breakthrough was preceded by a wall. These are the questions that kept us up at night, and the ones that, eventually, made the product what it is.
How do you detect a person's name without sending it to a server?
How do you tell the difference between a random 9-digit number and a Tax File Number?
How do you handle 50+ countries, each with their own formats?
How do you stop false positives from making the tool useless?
How do you make a detection engine that understands insurance documents differently from legal documents?
How do you handle the things no engine can predict?
What We Believe
Nothing leaves your machine.
Not the document, not the detections, not the metadata. Everything runs in your browser. We don't have a server to send data to even if we wanted to.
Detection without compromise.
A single detection pass is guesswork. Eight verification layers is rigour. We chose rigour.
Your industry, your rules.
A legal firm and a hospital handle completely different data. The detection engine should know that.
The engine proposes.
Every detection is surfaced for review. Accept it, reject it, reclassify it, or add your own. Manual redaction for anything the engine misses. Custom patterns for your business-specific identifiers. Your document, your rules.
The Final Iteration
We went back to the drawing board. Again. Because close enough isn't good enough when you're handling someone's most sensitive data.
The previous architecture worked. It caught most things. But “most things” is a dangerous phrase when the thing you missed is someone's Medicare number sitting in paragraph four of an insurance claim. We kept asking the same question: is this the best we can do? And the answer kept being no.
So we scrapped the single-pass approach entirely and rebuilt around a layered screening system. Each checkpoint does one thing, and does it well. The checkpoints don't compete — they compound. What one misses, the next catches.
This is not another pivot. This is the architecture we were always building toward. Every false positive we fought, every edge case we debugged, every domain we studied — it all led here.
The first three checkpoints are pure rules — format validators, checksum checks, and jurisdiction-specific registries. They run instantly. They don't guess. If a Tax File Number passes the mathematical check, it's a Tax File Number. No ambiguity.
Checkpoint four handles the grey area — the names, the addresses, the entities that only make sense in context. “Victoria” the state vs “Victoria” the person. 600+ entity types that format rules alone can't resolve.
Checkpoint five is where it all comes together. Twenty-nine industry screening bundles, each built from real documents in its field. Healthcare. Legal. Construction. Finance. Not a general-purpose tool hoping for the best — twenty-nine specialists, each one built for its domain.
Everything still runs in your browser. Nothing leaves your machine. The architecture changed. The promise didn't.
Where We're Going
More industries. More languages. More countries. The list of sensitive data types grows every time a new regulation drops or a new industry reaches out. We're building for all of them.
What won't change: everything stays in the browser. Everything stays private. The detection gets better every week, but the architecture stays true to the promise we made on day one. Your data is yours. Period.
We know exactly where the trail leads. Here's what's over the horizon.
Want to help shape what comes next? We're listening.