Rubric bets that verification systems become the asymmetric opportunity as AI gets cheaper – hallucination is structural, not a bug to be fixed.
ENTRY ANGLES
Automated verification procedures built on codified expert knowledge from practitioners · Training specialists in AI output verification with foundational skepticism toward AI · Certification processes for specialized AI platforms
VERTICALS
CAPABILITIES
Expert knowledge extraction and codification from domain practitioners, AI output verification and validation methodology, Certification and compliance framework development
RUBRIC FOUNDER
“Building reliable AI for medicine is much harder than it looks”
Over the past few years, more than 10,000 "vertical" companies have emerged applying AI to build niche products. The problem: general-purpose AI models aren't well suited to most of them.
Demos look impressive – but in production, these products fail too often. The root issue is that general-purpose models are too generic. They produce answers "from first principles" because they lack the ability to reason at the depth of a genuine domain expert.
Rubric is building the infrastructure to fix that: tooling that extracts specialized knowledge from real experts and uses it to produce reliable, expert-calibrated AI systems. The infrastructure includes:
- A network of domain experts across different niches
- A platform for capturing their knowledge and using it to train specialized AI models
- Platforms and operational processes for ongoing monitoring of those trained models in production
The platform serves research labs, AI model developers, and vertical AI product builders – particularly in healthcare, finance, and legal.
Rubric is currently going through Y Combinator and published its platform description on the YC website a few days ago.
A [related review](/review/u-tvoego-ii-agenta-est-sertifikat) covered Amigo a few days ago – a startup that raised $11 million for a platform building AI agents for medical clinics.
Amigo's core finding: "Building reliable AI for medicine is much harder than it looks" Any developer can spin up a general-purpose chatbot that answers medical questions using an off-the-shelf AI model.
The hard part is that any error can have fatal consequences. So the training, debugging, and ongoing monitoring processes must be exceptionally rigorous – and that's where almost all the time and effort actually go.
This turns out to be a specific instance of a much broader problem – which is exactly what Rubric is addressing. The startup laid out its view in a manifesto titled "The Age of Evaluation," arguing that building AI products has become the easy part. A new bottleneck has taken its place: the outputs of AI systems need to be verified and evaluated before they can be trusted – not just generated and deployed.
The core problem: even a first-year financial analyst can use AI to produce a financial model that *looks* like the work of a ten-year veteran. A medical student can generate a diagnosis that *sounds* clinically sound. A junior attorney can draft an opinion that *reads* professionally.
But not all that glitters is gold. Looking and sounding professional is not the same as being professional. The difference only becomes clear downstream, through the consequences – which is the last place you want to discover it. Unless the output is first reviewed by a genuine expert in the field.
Historically, experts produced all such outputs directly – so the bottleneck was the production process itself. Now AI produces the outputs, shifting the bottleneck to verification: scrutinizing what the AI actually generated.
Many domains use standardized tests to calibrate AI performance – the same licensing and certification exams that doctors, lawyers, and other professionals take.
But those tests are only a first-pass filter. They screen out people or AI systems that clearly can't handle the material. They can't substitute for experience built through years of practice. AI needs to be evaluated not by whether it passes an exam – but by whether a seasoned expert can actually trust its judgment in the field.
Many domains also lack standardized tests entirely. In those areas, only expert judgment applies – making the evaluation problem even harder to systematize.
The most urgent dimension of this problem: it has to be solved *now*. While there's still a generation of professionals who built their expertise in the pre-AI era – and who therefore never had the option of delegating their thinking to a model.
Within a decade, that generation will retire. That leaves roughly ten years to build "gold standard" expert datasets that can serve as calibration foundations – not just for current AI systems, but for the generations of models that follow.
The real constraints on AI aren't compute and model architecture. They're the availability of genuine human experts in the domains where AI is supposed to operate. Everything except expertise can be scaled with capital. Expertise can only be scaled through people – and the pool of truly experienced practitioners is gradually shrinking.
Verifying AI outputs isn't a one-time task that can be completed and crossed off. It's a continuous process that needs to run permanently.
The importance, cost, and value of verification will initially rise to match the importance and value of the AI itself. Then it will begin to surpass them – as AI capabilities keep commoditizing while the underlying expertise they depend on remains scarce.
The need for human expertise layered on top of AI is starting to register more widely.
One illustrative example: Crosby ([related review](/review/bolee-prostaja-model-dlja-sozdanija-perspektivnogo-ii-produkta)), which raised $25.8 million for a legal AI firm that reviews contracts and other legal documents. The AI handles the analysis; human lawyers verify the outputs before delivery.
But that approach isn't systemic. First, the verification process is entirely manual – making it a bottleneck built into the business model itself. Second: what happens when the last generation of lawyers trained to think independently, rather than defer to AI, ages out of the profession.
A new market for "AI verification" appears to be forming, with several distinct components:
- Automated verification procedures built on codified expert knowledge extracted from real practitioners – which is Rubric's specific focus.
- Specialists trained in verification from a position of foundational skepticism toward AI outputs – a genuinely interesting problem in its own right.
- Certification processes for specialized AI platforms – explored in a [related review](/review/u-tvoego-ii-agenta-est-sertifikat).
- Other approaches not yet obvious at first glance.
The counterintuitive but strategically compelling direction: move toward AI verification infrastructure rather than AI development itself.
The core argument: as AI capabilities continue to commoditize, the value of the expertise that AI depends on – and the systems that verify AI outputs against it – will only grow. At some point, verification will be more valuable than development. That crossover is approaching for anyone who is currently chasing the hype around building AI platforms.
This isn't just a timely opportunity. It's a long-term strategic position. Which angle would you take into this space?