Does your data strategy need a Data warehouse?

Featured Image

5 min read

If you want to get involved in data analytics, data management and artificial intelligence, the first thing you have to do is make sure your data is consolidated. However, this is a challenge for many companies.

Content: 

  1. What is a data warehouse? 
  2. How does a data warehouse differ from a data lake? 
  3. Cloud-based data warehouses
  4. Challenges

What is a Data Warehouse? 

Back to start of page

A data warehouse (DWH) helps companies consolidate data from different sources, manage it and process it as needed. This is also known as a "single source of truth", i.e. a source that contains all necessary and current information.

The concept is not new; developments that met similar needs existed in as early as the mid-1980s, for example, under the term "information warehouse".

A data warehouse has the following characteristics:

Integrated - Data that comes together from different sources and with different structures is stored uniformly.

Chronological - Historical data or data history also plays a role in the data warehouse, for example, to compare different time periods in reports.

Permanent - Data is not stored temporarily, but permanently.

Topic-oriented - Data is selected according to specific data objects that are relevant for the evaluation.

What is the purpose of a data warehouse?

  • Bringing together data from different sources and different types of data in such a way that it is centrally available and can be viewed.
  • The data warehouse thus provides an ideal basis for analyzing data or for data mining, for example.
  • Optimization of data quality through data cleansing and, for example, a standardized taxonomy but also metadata to record the history.
  • Structuring of data so that it is informative and readable for the user.
  • Performance enhancement for complex queries without affecting operational systems.

(Source: Wikipedia)


Read how the charity Diakonie RWL managed a 360° view on its members and introduced a association management system with the help of a data warehouse. 

Get the use case


How does a data warehouse differ from a data lake?

Back to start of page

Data warehouses are often mentioned in the same sentence as data lakes. Although both go together in many applications, they differ widely in function and setup.

The only similarity between a data warehouse and a data lake is that both can store huge amounts of data. Whether the data lake or data warehouse is suitable for an organization depends heavily on the intended use.

Data Warehouse: Structured data, Data use is known, Data use is easier, Less data with high quality Data Lake: Raw data sets, Data use is not identified, architecture can be easily adapted, Large amounts

Data Lake

  • Raw data sets, so data often still needs to be processed for use
  • Data utility is not identified
  • Predominantly used by scientists (data scientists) who, for example, "fish" information out of the data lake using artificial intelligence methods
  • Access is relatively easy
  • Often contains much larger amounts of data (because it has not yet been processed)
  • Data quality and data protection are particular challenges of a data lake
  • The architecture of the data lake can be easily adapted
  • Data storage is often cheaper, but processing is often more resource/cost intensive because the data has not been pre-structured for a specific purpose.

Uses: science, education, transportation, forecasting (predictive analytics), machine learning, use in areas where data is generally collected in an unstructured way more often (e.g., in the health care sector)

Data Warehouse

  • Processed and structured data sets
  • Use of data is known
  • Predominantly used by business professionals
  • Often stores less data, but with high data quality
  • Easier to use even for users who have no expertise in data analysis
  • Data can be used more easily in dashboards, tables, etc.
  • The architecture of the data warehouse is more specific and customizations are therefore more complex
  • Historization (the history of the data can be tracked, for example, to compare time periods)
  • Storage is often more costly, but at the same time costs can be saved because the data can be used more easily for the purpose for which it was intended

Uses: reports, financials, business applications, market analysis, evaluation of customer/user behavior, integration with other systems (CRM, data visualization, business intelligence)

Sources: talend "Data Lake vs. Data Warehouse" / Kleyman, Bill (2018) "The Many Use-Cases of A Data Warehouse" / Sulmont, Lis (2020) "Data Lakes vs. Data Warehouses"

Cloud-based data warehouses

Back to start of page

A major advantage of modern data warehouse offerings is the provision of cloud-based services. Cloud computing, especially as a service, can save infrastructure costs, enable scaling, and offer more transparent pricing structures. In addition, the service is the same regardless of the user location. Especially companies that operate internationally can benefit from cloud-based solutions to make access to the necessary functions easier, no matter the location.

Challenges  

Back to start of page

When selecting a data warehouse, it is always important to consider which data formats are currently being used, as not every warehouse can process all formats. A successful link with the systems is also necessary. Especially with older systems, this should be checked to guarantee that the data input is successful.

In addition, data protection principles play an important role, especially in Europe. From data encryption to anonymization, there are various factors that (could) play a role for companies to store their data according to the given requirements.


Find out more about the trends that can shape your digital transformation by subscribing to the DIGITALL Galaxy Blog. 

Subscribe

by Rosina Germanova

Rosina Germanova is a senior consultant in the field of business intelligence. She has over five years of professional experience and has the necessary toolbox to bring data to life and create valuable knowledge.

6 min read

Relaxed at work: 5 simple things to lift your mood

In autumn and winter, the decrease in sunlight - which affects the "happiness hormone" serotonin -...

3 min read

Expert interview: Cyber Security Trends & topics for 2023 and 2024

We sat down with our Cyber Security expert Deniz Tourgout to talk about current and future trends...

3 min read

The future in AI: facts and stats about artificial intelligence

What does the future of artificial intelligence bring and how are companies currently dealing with...