Guest: Tony Baer (LinkedIn)
In this episode of TFiR: Let’s Talk, Swapnil Bhartiya sits down with Tony Baer, Founder and Industry Analyst at dbInsight, to discuss the current market trends, particularly in the database and analytics space, and what he sees happening in the near term.
Key highlights of this video interview:
- Baer’s focus area is data and databases, particularly looking at how the cloud has reinvented the database architecture and how data is managed.
- With on-prem, you’re managing to capacity. In the cloud, you’re managing to resource.
- In the cloud, because we’ve essentially optimized many different parts of infrastructure, we can now distribute data in a way that we could not do before. When working with data that is remote, you deal with “issues” regarding data sovereignty and data locality from the standpoint of performance, as well as if the data can even leave the country.
- With the data warehouse, you offload complex queries from the transaction system, and you only work with data that’s significant from an analytics standpoint. When the cloud came in, we started to have the ability to deal with not just relational data, but multi-structured data or variably structured data.
- A lakehouse is a data lake that has atomicity, consistency, isolation, and durability (ACID) transactions.
- Baer believes the lakehouse will basically supplant the mixed-purpose, mixed-workload, general-purpose data warehouse because it’s going to do far more economically. The caveat is that if you’re working with a Teradata or something like that, where you have a SQL query engine that can do dozens, if not hundreds of table joins, you’re still going to need a very high-end data warehouse.
- There is still a very low awareness among practitioners and data professionals of what lakehouses are. The vendor community is ahead of the market in terms of awareness.
- The data lakehouse ecosystem is still solidifying. The watershed event was when Snowflake announced in 2022 that it was going all in on Apache Iceberg. The household names (Oracle, IBM, Teradata, SAP, etc.) have not yet weighed in. Baer forecasts that they’ll make their choices in the next 12 to 18 months and he sees this becoming an open-source play.
- About 12 months from now, Baer sees companies becoming aware that lakehouses could 1) save money, 2) scale, and 3) give a “good enough” performance (probably not as good a performance as proprietary tables on block storage) and will gradually get onboard.
This summary was written by Camille Gregory.