2nd Floor College House, 17 King Edwards Road, Ruislip, London HA4 7AE, UK

WebRobot and the Future of Big-Data

Mission, vision, and technology
massive scraping dedicated spark clusters

WebRobot wants to become the hub, the reference point for all web scraping specialists drawing on the most innovative technologies to provide, at all times, an affordable and high-quality big-data service.

The Mission

WebRobot Mission

Our goal is to become a complete ETL (Extract, Transform, Load) service based on cloud computing and big data, involving data extraction, web mining, machine learning, and big-data analytics.

In the future, we see WebRobot in the centre of the data supply chain, with the most powerful extraction engine, a comprehensive ETL system to get and deliver data from and to the Web, IoT, Drones, Smart Manufacturing, Smart Cities, and any other devices. Our purpose is to provide the right tools to improve business and life in the most efficient and scalable way.

The Vision

WebRobot Vision

We believe technology is an opportunity and not a threat. We want to be protagonists of the upcoming revolution: a deep socio-economic environment change that will be driven by technology.

We’ve been working with our team and through strategic partnerships, to allow all our stakeholders to reach financial freedom through our services and business model.

Thanks to a unique B2B2X approach and a profit share scheme applied in the data industry, we are building a complete ecosystem where any parties involved can improve their business, products, and services, and monetize their effort and investment.

The Technological Context

WebRobot-High-Level-Architecture

Data analytic tools like AWS Athena and NoSQL databases like DynamoDB guarantee persistence. As for the headless browser technology, our main choice remains the excellent PhantomJS, although we will be open to integration with the most recent headless Chromium.

The wrapper induction algorithms are currently exposed by an internal API on the AWS Elastic Beanstalk context. They will be progressively integrated into the acquisition framework on Spark technology and Java / Scala languages. Visual support tools on Node.Js support the less experienced users in defining the ETL.

Our ETL is developed using parser generation technologies (ANTLR) and will be constantly evolving to converge towards our complete ETL vision.

Future research into decentralized contexts such as DFINITY or IEXEC technologies will uncover new horizons for deploying. We will be constantly open to the latest paradigms that DLT technologies will be able to bring out.

Related Posts

Leave a comment