Extract Transfer Load (ETL) has been with us since the beginning of Business Intelligence.
For decades ETL has been the IT capability responsible for producing ODS/EDW/MOLAP datasources for line-of-business consumption. This important function creates the tables and datamarts that enable traditional reporting and analysis. ETL includes functions like data cleansing, normalization and validation.
The horizontal (wide ranging) domain of big data means associated information has less consistency and alignment than traditional corporate data. So much so, that many pundits declare ETL will become extinct as we build out more big data solutions.
I gently suggest that ETL isn't going away. Instead, many related activities, like finding, filtering, organizing and categorizing, will be pushed up to the line-of-business user. Armed with new tools and techniques, this domain expert will enter the realm of self-service ETL.
The role of IT changes too. Large horizontal information sources are now provisioned in a "Data Landing Area" that stages unstructured, semi-structured and unmodelled data. Users pull data from these sources to facilitate their specific analysis requirements.
Does this mean IT is out of the ETL business? Not at all. When a Line-of-Business user has identified topics of interest, outliers or new data correlations, they push the metadata description of their analysis back to IT who prioritizes requests based on actual use, and then proceeds to create the most needed enterprise-ready (cleansed, normalized and validated) datasets for more general use.
The result: An efficient workflow serving a larger number of users with greater agility and accuracy.