Data provides the amazing power to understand and grow businesses. But data always comes in different shapes and formats, and need to be transformed into a more analysis friendly structure to be able to get meaningful insights out of it.
Are you a Google BigQuery user? Do you have different data sources other than Google Analytics? Would you like to have a tool to collect all your data sources into BigQuery to have better control over it and start analyzing, visualizing and building various models to discover hidden patterns? If yes, then Talend is the perfect tool for you. Continue reading this post to find out how to use it to build your data pipeline.
Talend is one of our data integration partners for Google BigQuery, and is one of the most powerful data integration and ETL tools available in the market. It is easy to use, with more than 900+ connectors and components. Talend enables users to design their data pipelines, using drag-n-drop building block and each block is highly configurable, then converts this flow into Java code that can run on different machines making use of Java Portability. Another big feature is that developers can set Talend to produce Java code that runs on a single machine or produce MapReduce code that runs on a cluster of servers, making use of parallel/distributed computing, processing large amounts of data in a short time. Talend is also considered to be the next-generation leader in cloud and big data integration software.
Before we consider our use case, let’s first understand what ETL is. ETL stands for Extract, Transform and Load. It’s the process of moving raw data from various sources to a destination (data warehouse or a database) after manipulating the data.
There are various ETL tools in the market. ETL tools assist with the process, providing capabilities and advantages unavailable if you were to complete the ETL process on your own by using a programming language.
Advantages of using ETL Tools:
The use case at hand is one of a Chief Marketing Officer of a home appliance distribution company (let’s call it OME Appliances). Our CMO is in charge of marketing for a global network of retailers and dealers. Her Data/BI team is responsible for the consolidation and automation of reporting across all retailers and across all the brands they carry.
Daily files from different retailers containing transactions and revenue data are consolidated into two aggregated files: one for sales and another for services. Those files are then transformed and loaded it into BigQuery.
Hence the three main steps mapping to the ETL process:
At first, we considered custom development, using a programming language like Java or C#. But we realized it would have taken about 4 months.
We then decided, this is a task that should be implemented easily and efficiently using an ETL tool, and what was estimated to take 4 months, was created in just one week.
Now, we’ll get into some details of how we implemented the solution using Talend integration studio.
First let’s take an overall look on the job structure, then break it into the three ETL steps.
This step can be split into two, first uploading the files into GS then loading them into BigQuery, but it’s recommended to use this compact component.
After loading the data into BigQuery, now we can explore it in many ways, and make use of all the cool BigQuery features and tools that connect to BigQuery to analyze and visualize the data, to get meaningful insights, and answer business questions.
One of the powerful visualization tools that connect to BigQuery is Data Studio, below are sample screenshots of what can be accomplished in Data Studio using the external data loaded in BigQuery.
For a home appliances retailer, these Data Studio dashboards provide a high level overview and some detailed insight into the sales and service data for decision making.
For a brick and mortar retailer it helps to know for planning purposes which states sell the highest number of appliances, which brand of appliances have the highest gross profit. This will help the retailer move inventory in a way that maximizes revenue.
To allocate resources and plan ahead a home appliance servicing business needs to know which brands require the most labor hours and have the highest repair costs, and what is the average repair order by brand. This will help them procure and stock the required spare parts reducing delays. By knowing if the appliance being serviced is new or used the retailer can send personalized coupons or emails to customers encouraging them to get their appliances serviced regularly.
Data Engineering Team Lead Mahmoud is all about facing challenges, the harder the challenge the sweeter the victory. After receiving his Bachelor degree in Computer Science and a diploma in Software Development he worked as a software developer. Later switching careers to the Business Intelligence and Data Analysis field in which he found his passion working with data, analysis and revealing hidden insights to help companies make better business decisions.
As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…
Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…
Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…
This website uses cookies.