Data Engineering for Business Intelligence
How to tackle the growing complexity of extracting business value from data.
The role of the data engineer has been around in some sort for the last decades. Since the introduction of relational databases, it has been there to support the development and implementation of the infrastructure as well as to create the pipelines used to collect and transform the data. Although at the beginning it was greatly differentiated in the tasks that it performed, in time it became more and more similar to what is expected from Business Intelligence engineers. We will try to analyze the aspects in which the roles are similar, as well as to mention the challenges faced by BI engineers, how data engineering can support their operations and what the future looks like in the era of real-time big data.
How data is changing BI
Since the introduction of data warehousing, BI engineers have been tasked with understanding the data and extracting insights from it. The most important aspect of the BI role is to be able to understand the data from the domain knowledge perspective to create KPIs that allow measuring the success of a given operation. They are tasked with creating simple overviews of a business to stakeholders, to improve efficiency and smooth operation by automating repetitive tasks. These overviews are generally dashboards with carefully taught KPIs that are aggregations that should provide the most important information at a glance. They also have the responsibility to find insights from the data that allow them to inform in time if something is not working properly, or to find if there are new business opportunities that can be explored. BI teams create dashboards for operations such as budget reporting, forecasting sales, and stock, as well payroll and other operations. These dashboards are then used as sources of truth by various teams that use the KPIs shown to drive business decisions, optimize operations and provide them with a way to be self-sufficient in terms of harnessing the data required to support critical decisions.
In the past, BI engineers took a long time to be able to create new visualizations and integrate new data. Previously data warehouses had a rigid schema structure that did not allow simple transformations and there was no easy way to visualize the data. BI teams needed to spend a lot of time building new sources of truth and making sure that the data matched up before being able to roll out new dashboards.
Better Analytics Pipelines
Recently there has been an advancement of operational intelligence, which is a trend that seeks to apply the same successful way of working developed in BI to daily operations. The main difference between them is that operational intelligence seeks to provide insights from real-time data to support immediate decisions, compared to BI that is generally used to support medium to long-term decisions. This trend pushes the boundaries of the methodologies used to create BI dashboards. Dashboards that previously were entirely built on data warehouses, required nightly lengthy processes to provide daily updates and took weeks if not months to be properly checked and designed. These new dashboards are required to aggregate data from streaming sources like pub/sub systems, consume real-time analytics from IoT devices stored as small log files in several cloud data buckets, and apply stateful computations on data streams with technologies such as Spark or Apache Flink. This real-time data is aggregated to create KPIs that accurately reflect the context in which operations are taking place within minutes rather than days.
Data engineering plays a major role to support these operations as the architecture and tools selected to build the pipelines that will fuel these dashboards are of paramount importance to properly support BI teams facing these challenges.
Conclusion
The future will be filled with examples of companies actively working to create data-driven cultures that seek to support all operations with clear and real-time information. The changing and ever-increasing nature of data pose a challenge for this as BI and data engineers need to work closely to create reliable pipelines that can handle these new requirements. Data engineers can leverage the use of data lakes, transactional programming to overcome differences in tooling, and to adopt a containerized microservice API vision to distribute information more easily through the organization and provide easy ways to consume data. Moreover, the implementation of regulation and data governance practices restricts these developments to a higher standard of quality. Data engineers should always remember that their role is the one that is behind the informed decisions of a given organization, and they should have this in mind when designing data pipelines in the future.