Dokai captures screen workflows into institutional knowledge bases – but who maintains them when the software updates?
ENTRY ANGLES
Browser automation to detect UI changes and flag/update outdated video documentation · AI engine that generates video documentation directly from source code and application logic · Human-in-the-loop system where AI flags UI changes for human correction, then learns the pattern
VERTICALS
CAPABILITIES
Browser automation and UI detection, AI/machine learning for pattern recognition and learning, Source code analysis and application logic traversal
Dokai only entered the New York-based Entrepreneurs Roundtable Accelerator in June, taking home $150K in its first funding. The website is still sparse – a description of core functionality and a waitlist for beta access.
Dokai is a platform for building and transferring institutional knowledge.
Use cases include customer support, employee onboarding, internal product documentation, and answering recurring "how do I" questions fielded by IT helpdesks, HR teams, and other departments.
The whole workflow runs through a browser extension.
To add knowledge to the base, a user flips the extension into "Training Mode" and records their actions in the browser – walking through whatever process they want to document. Voice commentary can presumably be added simultaneously. The recorded video gets saved to the knowledge base.
When someone has a question, they open the same extension and type it in. Dokai's AI engine searches the knowledge base for the relevant video and surfaces the step-by-step answer. Finding the right video requires the engine to intelligently parse and index the recorded footage.
Dokai claims the approach lets teams create documentation and FAQ answers five times faster than traditional methods, and cuts the time needed to train employees on a given process by 50%. Showing once is faster than explaining repeatedly.
Y Combinator graduate Clueso ([covered here](/review/srazu-konfetka-jeto-horoshij-recept)) built a comparable product, though with more mature functionality – it automatically trims dead air from recordings, adds visual effects, and generates a text-based step-by-step walkthrough with screenshots alongside the video. That fuller feature set helped it raise $1.4 million in April on top of $500K from YC.
Other players in the same space include Guidde ([covered previously](/review/gde-byl-tekst-budet-video)), which has raised $15.6 million; Tango (previously reviewed), at $19.7 million – and since the original review, Tango added another $14 million; and Minerva (previously reviewed) at $5.1 million. All have expanded their feature sets in the interim.
Augmend ([related review](/review/najdi-bolshoj-rynok-a-tehnologii-najdutsja)) uses similar technology for a narrower task: letting developers create documentation for their code and workflows that other developers can actually use. That startup raised $2.2 million.
It's also worth noting that similar platforms exist for industrial and construction settings – training workers on equipment and physical procedures rather than desktop workflows. In those cases, the apps record experts demonstrating techniques in the real world rather than screen actions. Squint ([covered here](/review/700-millionov-chelovek-kotorym-obychnye-platformy-ne-podojdut)) raised $19 million for that approach; DeepHow (previously reviewed) raised $37.1 million; and Zaptic (previously reviewed) raised $9 million.
The trend is clear: corporate knowledge bases are moving to video. Video explanations are simply easier to follow than text, even illustrated text – and recording a walkthrough is far easier than writing clear documentation. With short-form video now deeply habitual for most people, the barrier to creating it has dropped to near zero.
There's a structural problem with video documentation for cloud software, though: it goes stale fast.
Cloud products are in constant flux – interfaces change, menu items move, new workflows emerge. A user opens a recorded walkthrough only to find the screen looks completely different. Eventually they might figure it out, but that completely undermines the core promise of these platforms – an instant, complete, accurate guide. Navigating a video shot against an old interface can be just as frustrating as reading outdated written documentation.
This opens an interesting adjacent opportunity: enter the video documentation trend from a different angle – specifically, automating the *creation and updating* of video documentation.
Updating could actually be more tractable than creation. At its core, it's browser automation: the AI engine periodically re-runs the same process, compares what it sees to the recorded version, and flags or fixes the differences.
- If the interface has changed, the engine does what a human would do – explore menus, find the equivalent path, and re-record the updated workflow. - If it can't figure out the mapping, it flags a human to guide it – and once the human helps, the engine learns the new pattern and re-records autonomously.
For a more ambitious version: an AI engine that generates video documentation directly from source code, traversing the application's logic to produce walkthroughs complete with branching paths and test data.
The infrastructure isn't far off – platforms like Antithesis, Momentic, Codium, and QA Wolf already do automated browser-based testing. MutableAI, [covered a couple of months ago](/review/tekuchka-usilylasʹ-znachit-jeto-stalo-kritichno), auto-generates and updates code documentation from GitHub repositories.
If video documentation is genuinely the next wave – and the evidence suggests it is – then automated creation is the inevitable next step. The moment for building it is now.