Everyone wants to put AI agents to work on real-world data — but today the tooling forces a trade-off. You either write all the code yourself, or you settle for a no-code toy that can’t reach the data that matters, or you hand your data (and your customers’ data) to someone else’s cloud. We think that trade-off is false.
WebRobot is one agentic data platform — what we call “ETL for AI Agents” — with three front doors for three kinds of user, and it runs on your own cloud (BYOC). The interactive demo is launching very soon; in the meantime you can already explore the live portal and the open code on our GitHub. This article walks through the three angles — and asks which one is yours, because right now we are actively looking for your feedback.
What We’re Building: “ETL for AI Agents”
An AI agent is only as useful as the data it can reach and the work it can actually finish. WebRobot is the engine that gives an agent both. It acquires data from the open web — even where APIs are closed or non-existent — then cleans, matches and structures it through distributed pipelines, and finally lets agents act on it: build a dataset, answer a question, publish a result. It sits in the space between scraping/RPA and serious distributed data engineering, with an agentic layer on top.
So this is not a chatbot, and not just a scraper. It is the whole loop: stealth-browser acquisition, distributed processing on Apache Spark over Kubernetes, AI-native transformation stages, and agents that orchestrate the entire flow from a single instruction. You can see the product surface in the WebRobot app, and how the same engine powers real use cases in our piece on AI agents for lead generation and article marketing.
BYOC: Your Data Never Leaves Your Cloud
The thread that runs through everything is BYOC — bring your own cloud. WebRobot is designed to run on infrastructure you own and control: Hetzner, OVH, Scaleway, or any compatible provider, EU-sovereign by design. Your raw data, your enriched datasets and your customers’ information stay inside your own perimeter — they are never funnelled into a third party’s SaaS just to be processed.
This is a deliberate stance, not a checkbox. It means data sovereignty and compliance are the default, cost is something you govern (you scale resources up and down on your own account), and there is no lock-in. The platform’s job is to provision and orchestrate the full stack on your infrastructure — the value moves to you, and so does the control.
Three Front Doors — Which One Is Yours?
1. For Developers — Claude Code Skills & MCP
If you live in a terminal or an IDE, WebRobot meets you there. Through a set of Claude Code skills and a Model Context Protocol (MCP) server, an AI coding agent can drive the whole platform with typed tools: list the ETL stages, draft a pipeline, validate selectors on a real page, run the job, inspect the output — all by conversation, all scriptable and composable into your own workflows.
This is the code-first door: maximum power and automation, version-controlled, and open. The skills and MCP server live on our GitHub — clone them, read them, and tell us how they fit your stack.
2. For Operators & Business — The Conversational Chat
Not everyone wants to write code, and they shouldn’t have to. The integrated conversational chat turns the platform into something anyone can use: you describe the goal in plain language — “track competitor prices across these sites”, “summarise sentiment about our brand” — and the agent builds, validates and runs the pipeline for you, returning results, charts and links right in the conversation.
It is effectively a data engineer in a chat box, with a human in the loop to confirm the important steps. This is the door we most want business users to try today: it is live on the portal.
3. For Power Users — The AI-Augmented Drag-and-Drop Designer
Between code and chat sits the visual pipeline designer: the familiar drag-and-drop experience of traditional no-code tools, but augmented by AI. Point it at a page and the assistant suggests the right selectors and steps; you keep the visual control and simply confirm or adjust, while the AI removes the tedious, error-prone parts.
This door is for the power user who thinks visually but wants modern intelligence built in — bridging the gap between classic automation builders and a fully agentic system, with a human always in the loop. Pairing it with capabilities like sentiment analysis turns a visual flow into real insight.
Under the Hood: Agents on the Anthropic Agent SDK
All three doors lead to the same brain. We orchestrate the agents on the Anthropic Agent SDK — a deliberate choice driven by development speed and by how cleanly it lets us manage flexible, allocatable resources. We can ship fast, and we stay in control of how much compute each task is allowed to consume.
Behind that agent layer is the real infrastructure: a scalable Spark cluster for distributed data work, an elastic stealth-browser fleet for acquisition, and a Ray cluster running the agents themselves. It is the same architecture that powers our work on agentic lead generation and content automation — now exposed through three front doors instead of one.
What’s Next: The Demo — and Your GPUs
The big near-term milestone is the interactive demo, launching imminently — a guided, hands-on tour through all three doors so you can feel the difference for yourself rather than take our word for it. Until then, the portal is live and the code is open on GitHub.
On the roadmap, one of our next sprints brings managed GPU providers into the BYOC model — starting with RunPod and other GPU server providers — so heavier AI and GPU workloads can be spun up elastically on the provider you choose, still on your own account. Mastering that elastic, multi-provider resource provisioning is, frankly, the heart of the whole approach.
We are deliberately seeking feedback from all three angles at once — the developer, the business operator, and the visual power user — because the same engine should serve all of them. So: try the portal, explore the GitHub, and watch for the demo. Tell us what works and what doesn’t — we’re opening a handful of early pilots, and your perspective will directly shape what we build next.
Disclaimer: Always respect user privacy and copyright, follow ethical data scraping practices, and abide by the terms and conditions of the websites and platforms you collect data from, as unauthorised data extraction may lead to legal or privacy issues. This article is for information purposes only and not intended as legal advice.

