Today’s business are trying to become more data-centric or to develop their data culture. To do so they have to start leveraging insights from their data. Therefore, ETLs (extract, transform, and load) are essential in data integration strategies. They allow businesses to gather data from multiple sources and consolidate it into a single, centralized location. They are usually created with two main objectives in mind:
- Firstly, feeding business intelligence tools to provide decision-makers with graphs, to show business evolution, display moving targets.
- Secondly, building and deploying data science models to extract business value from your data
Managing a continuously increasing number of ETLs is one of the biggest challenges that companies encounter during their life-cycle. The numbers sometimes can, little by little, explode and get fairly out of control. Many companies are not yet geared up to achieve their data-driven business ambitions.
So most of the ETL projects will not go as smooth as the companies had hoped. Our experiences taught us that managing a lot of ETLs creates 3 big challenges inside a company. Let us explain those challenges with a concrete example from one of our use case inside the insurance sector. Managing ETL’s are:
- Costly: This company has a thousand of ETLs maintained by 15 full-time employees. Creating and maintaining their ETLs was costing them more than 2 million euro a year.
- More expertise demanding: Some connectors must be added, broken ones must be fixed and all of them must be maintained. As the numbers of ETL are increasing so are the difficulties to enable business intelligence tools. It is difficult getting data from operational systems into the data warehouse. It means, to gather and maintain the data from the various operating system needs a lot of expertise.
- Time-consuming: In this kind of company to prepare, create and deploy 30 ETLs is time-consuming. In 16 of those data science use cases raw data had to be extracted to develop and train the models. Further on, each new model deployed in production requires new data pipelines. To prepare, create, deploy 1 ETL you need 6 pipelines to make the model up and running. Implementing the adequate engineering pipelines for the ETLs are time-consuming (6 times 30 takes a lot of time).
We can conclude from this example and our experiences that many companies are not yet geared up to achieve their data-driven business ambitions. Digazu is a technological brick that will help companies leveraging the full value of their data, by extracting data only once, sharing the transformations on data, and loading transformed data multiple times. It rationalizes the utilization of data and lowers its cost.