Give your data a home: choosing between data warehouses, hubs, and lakes.
October 7, 2019
Focus Area — Innovation & Technology
Give your data a home: choosing between data warehouses, hubs, and lakes
Data is everywhere. It is often described as an abundant resource, and like all abundant resources, we need to learn to manage and maintain our data if we hope to make proper use of it. But what defines proper use of data? For starters, it means analyzing it. Where and how we store our data affects how, when, and to what degree we can analyze it.
At our recent Council for Chief Data and Analytics Officers meetings, senior leaders from across Canada have been discussing how and where they store their large data sets. We’ve come to find out that three of the most common types of solutions are data warehouses, lakes and hubs. With each solution there are benefits and challenges; this blog post will explore and summarize the pros and cons of those individual storages types.
Let’s start with data warehouses.
Data warehouses are a very common storage type. They are often used by businesses who need to report on and analyze data from different sources, such as Excel files, CRM systems, and financial data. In a data warehouse, data is stored in a structured format which means the data is formatted in a specific way. In turn, this means data often has to be reformatted before it can enter the warehouse.
Data warehouses are popular because they allow businesses to quickly perform advanced analytics on current and historical datasets. The downside is that they can be expensive to scale, especially if they are not cloud-based. Additionally, warehouses cannot handle raw, unstructured, or complex data, which is why many organizations have been moving towards data lakes.
Contrary to data warehouses, data lakes can store structured, semi-structured and unstructured data—all in their original formats. Yes, really! A data lake can simultaneously store video, audio, text, and highly structured systems files together. When the data needs to be analyzed, highly skilled data scientists can extract the data in whatever way they need.
Needless to say, data lakes can offer many benefits – in large part due to their key component of flexibility. The raw nature of the data in a lake can make it easier to access, analyze, and scale. Data lakes can also reduce the costs of storing huge quantities of diverse file types. However, businesses might not be able to justify implementing a data lake. Businesses need to actively plan to integrate their existing data storage type with any new lake(s). They must also carefully consider which data lake platform to use as some are more complex and costly than others. It’s also important to mention that not all data would benefit from being kept in an unstructured and unclassified way, either.
Finally, we move on to data hubs
The data hubs act as centralized analysis points, connected to other data storage systems. They’re useful for collecting and connecting different data types and sources—without the reformatting required by a warehouse. Hubs also offer more structure than data lakes; hubs typically tend to perform a specific set of functions on the data that they collect and connect.
Data hubs can help businesses better understand their data, by way of acting as a catalyst for businesses to have conversations around the purpose, structure, governance, and analysis of their data. Hubs can also help businesses produce more meaningful reports and visualizations. Though they’re great in theory, there remains a lot of confusion about what data hubs are and how they operate.
Which storage type should my business use?
Great question! Unfortunately, there is no single solution for all businesses; every business is different. Each collects and analyzes different data and every business must consider their readiness for different storage systems. Many factors should be considered and will influence a decision. For example, what infrastructure do you currently own? What technologies can your talent pool manage or retrain to manage? What are the costs of maintenance for different solutions?
Want to learn more?
These issues will be the focus of our upcoming Council for Chief Data and Analytics Officers meeting (Toronto, October 8-9th). Join us! We’ll have several presentations and discussion sessions about diverse data storage methods. To attend the event or learn more about the Council, please get in touch with Marianne Fotia, our Manager of Executive Networks.
Dr. Vanessa Thomas
Senior Research Associate I