A few weeks ago, I had a great chat with Jamie Thomson from EMC about Web Data Services. I noticed Jamie recently wrote an interesting blog post titled, “ETL for HTML”. ETL is a well known term for anyone working with Data Integration or Data Warehousing. It stands for Extract, Transform and Load, and describes a one-way process of extracting data from a source, transforming the data into a new format and then loading the data into a destination. Traditional ETL vendors like Informatica are most effective for extracting and loading data from sources which can be accessed in traditional ways through SQL, XML or program APIs. This is where Web Data Services products like Kapow Web Data Server come in as a next-generation ETL tool. The Kapow Web Data Server allows users to Extract and Load data to and from all the data sources, including those that cannot be accessed in traditional ways, with the only prerequisite being that users are able to access and see the data in a normal Web Browser.
We live in a browser-centric world today where “ETL for HTML” encompasses the 2 extremes: Web2.0 (e.g. web scraping, mashups, etc.) and Enterprise Data Management (e.g . data extraction, data collection, data mining, data conversion, data integration, etc.). “ETL for HTML” is the perfect universal term that best describes working with all the data we work with and see in our Web browsers. This gives us fast and automated access to any data in applications like SalesForce or NetSuite or any of the millions of other web-based applications that exist inside our firewall, at our business partners, with the government, or just out on the public web.
Jamie is spot-on with the term “ETL for HTML” as a way to describe how most of us will access web data. Although ETL traditionally describes a one-way process of moving data from point A to point B, Web Data Services provides two-way access to data. This means we can leave the data where it resides best (like in your HR or ERP applications) and get full programmatic access by using a product like the Kapow Web Data Server to “wrap” the applications into standard service APIs like REST, SOAP or .NET.
Why is this so important? Well for two reasons. First, with the data explosion around us it becomes impractical to move and synchronize data into one common data repository. Second, the data we need to perform our analysis and drive business decisions will change more and more rapidly. We will need new data sources daily, or at least weekly, to react to the ever changing business needs of the future.
So what is a good replacement for the term “ETL for HTML”? I suggest something like “Access, Enrich and Serve Web data”. This is a superset of ETL that also covers the way we want to access data in the future.
What term do you think we should use?
By: Stefan Andreasen 