Cloud Native ComputingDevelopersDevOpsNews

Dremio Rolls Out New Data Lakehouse Features

0

Dremio, the easy and open data lakehouse, —at Subsurface LIVE 2023—announced the rollout of key new features. These features enable customers to more easily create their data lakehouses by performantly loading data into Apache Iceberg tables, query and federate across more data sources with Dremio Sonar, automatically format SQL queries in the Dremio SQL Runner, and securely connect Microsoft PowerBI using single sign-on (SSO).

Dremio also has added roll back and data optimization features for Apache Iceberg tables, making it even easier to manage data lakehouses using the open table format standard. Furthermore, customers can now use several new SQL functions for an even better SQL experience.

Dremio’s expanding functionality with Apache Iceberg now includes:

Copying data into Apache Iceberg tables – Dremio’s new COPY INTO SQL command makes it even easier and faster to load data into Apache Iceberg tables, which are a foundational component of data lakehouses. With one command, customers can now copy data from CSV and JSON file formats stored in Amazon S3, Azure Data Lake Storage (ADLS), HDFS, and other supported data sources into Apache Iceberg tables using the columnar Parquet file format for performance. Dremio efficiently distributes the copy operation across the entire engine to load data more quickly.

Optimizing Apache Iceberg tables – When using Dremio’s data manipulation (DML) commands to insert, update, and delete data from an Apache Iceberg table, additional files are created to represent these mutations to the table. Often, customers will have many small files as a result of these operations, which can impact read and write performance on that table and utilize excess storage. To improve the performance of Apache Iceberg tables, customers can now use the OPTIMIZE command in Dremio Sonar to consolidate these files into an optimal size. Customers running frequent DML operations can use OPTIMIZE at a regular interval to keep their Apache Iceberg tables efficient.

Table roll back for Apache Iceberg – Customers can now restore their Apache Iceberg tables to a specific time or snapshot ID with Dremio’s new ROLLBACK command. This makes it easy to revert a table back to a previous state with a single command. When rolling back a table, Dremio will create a new Apache Iceberg snapshot from the prior state and use it as the new current table state.

Dremio’s new functionality also includes new connectors for Microsoft PowerBI, Snowflake, and IBM Db2. Customers using Dremio and PowerBI can now use single sign-on (SSO) to access their Dremio Cloud and Dremio Software engines from PowerBI, simplifying access control and user management across their data architecture.

Additionally, customers can now add Dremio clusters as data sources, enabling query federation across these clusters. This feature set enables connectivity across Dremio environments, including hybrid environments where you have Dremio clusters running in a public cloud and on-premises.