Luel built a legal marketplace where anyone can license their data for AI training – and earn each time it's used in a model run.
ENTRY ANGLES
Domain-specific data suppliers for specialized AI verticals · Edge case collection through professional workflows and feedback loops · Vertical-specific training data libraries to improve specialized AI performance
VERTICALS
CAPABILITIES
Domain expertise and ability to identify edge cases in specific verticals, Feedback loop infrastructure to collect training data from professional users, Data quality and curation at scale
This startup graduated Y Combinator in March – and has already closed $31 million from General Catalyst, the venture fund behind Stripe, Anthropic, Ramp, Anduril, Legora, and Mercor.
Luel built a multimedia data marketplace where developers can legally license content for training AI models and agents.
The mechanic is straightforward: anyone can upload their data and earn money each time it's used in an AI training run. Data here means recorded phone calls or videos of people performing activities – anything from origami folding to gemstone cutting for jewelry.
Despite being founded only last year, 600,000 people across 96 countries have already uploaded 15 million hours of audio and video to the platform.
The mobile app makes upload frictionless – record, upload, watch earnings accumulate in the in-app wallet. Users can upload whatever they want, but Luel also publishes a standing task board of guaranteed payouts: Spanish-language conversation recordings at $18/hour, Hebrew text readings at $42/hour, origami videos at $10.20/hour, and so on.
All uploaded content is listed on the Luel marketplace where AI model and agent developers can discover and pay to download it. The platform's business model is the spread: the difference between what it pays data contributors and what developers pay to license the data. Notably, the licensing is non-exclusive – multiple developers can buy the same dataset, which multiplies revenue without multiplying Luel's cost.
Beyond uploading data, users can also earn by reviewing quality submissions from other contributors. Only quality-verified content is visible to buyers.
The marketplace also supports bulk arrangements on both sides – wholesale data suppliers who aggregate large volumes from their own user bases, and bulk buyers who need datasets at scale. Wholesale buyers get pricing discounts; whether wholesale suppliers get premium rates for the same data types is an open question worth asking.
The scale of the data training market is genuinely jaw-dropping. Global projections put it at $4.8 trillion by 2033.
The supply problem is just as stark: the public internet has essentially been exhausted as a training source – AI companies have already scraped everything available. And what's left raises questions about both quality and licensing legality.
Luel's thesis is that a new market layer is emerging: organized, high-quality, legally licensable human-generated data. And it's not alone in that bet.
Kled, a structurally similar startup, has raised $10 million to date, including $5.5 million last October. The two are close enough that Kled's founder has publicly accused Luel of copying not just the concept but specific offer formats and interface design.
Neon ([related review](/review/glavnye-dengi-budut-zarabatyvat-ne-razrabotchiki)), which raised $25 million in March, takes a narrower approach: an app that pays users for recorded phone conversations, priced between $0.15 and $1.00 per minute by the users themselves. Neon applies its own quality filter and pays only for content that clears the bar. The recordings are stripped of personally identifiable information before being sold to AI developers for training conversational datasets. Neon pulled its app from the App Store shortly after launch when a security vulnerability was found, and relaunched this spring with stronger protections in place.
The momentum across these companies signals that the data supply chain for AI is becoming a distinct competitive front – not just an infrastructure cost.
In the early days of the AI boom, all the attention went to model quality. That race is still very much alive – the OpenAI vs. Anthropic competition is intensifying, not cooling. But it's now increasingly clear that training data quality matters just as much as algorithmic design.
This is opening a new competitive front: the race among data suppliers. And that front is still early enough to enter – but from a different angle than the current wave of general marketplaces.
The more interesting opportunity may be domain-specific data suppliers. The more complete and precise the training data for a specific vertical, the more accurately specialized AI products in that vertical will perform. Which creates a natural opening for niche data providers.
Y Combinator alum Rima ([related review](/review/izjashhnaja-biznes-model)) illustrates this indirectly. Rima is building an AI platform for accountants, but one of its distinguishing features is a growing library of edge cases – the exceptions, nuances, and judgment calls that separate experienced accountants from novices. Rima collects these through a professional education program: accountants learn the platform on real client work and report back the edge cases the AI didn't handle cleanly.
Traini ([related review](/review/ne-otkazhutsja-ot-takogo-prilozhenija)), which raised $10.6 million (including $7.1 million at year-end last year), built a device and app that translates dog barks into emotional states. Its feedback loop collects owner corrections to keep improving the underlying model. A niche dataset, certainly – but even that one has obvious buyers beyond Traini itself (veterinary diagnostics, anyone?).
Most of these companies are collecting proprietary data for internal use only. The logical next step – building standalone data suppliers for specific verticals that license those datasets to AI platform developers – is still largely wide open.
The broader takeaway: breaking into the AI market doesn't require building models or products. Collecting, cleaning, and licensing high-quality domain-specific training data is a legitimate and defensible position – and in many verticals, the leading supplier hasn't emerged yet.
So – what kind of data could you systematically collect?