Microsoft Azure has announced the general availability of Apache Hadoop 3.0 on Azure HDInsight. New features in Hadoop 3.0 bring improvements to performance, scalability, and availability, reducing total cost of ownership and accelerating time-to-value.
With ACID transactions on by default and several performance improvements, the latest version of Hive called Apache Hive 3.0 enables developers to build “traditional database” applications on massive data lakes.
The new Hive Warehouse Connector moves the integration from the metastore layer to the query engine layer. Furthermore, Apache HBase 2.0 and Apache Phoenix 5.0 introduce a number of performance, stability, and integration improvements. With HBase 2.0, periodic reorganization of the data in the memstore with in-memory compactions improves performance as data is not flushed or read too often from remote cloud storage. Phoenix 5.0 brings more visibility into queries with query log by introducing a new system table that captures information about queries that are being run against the cluster.
Spark IO Cache, a data caching service for Azure HDInsight, improves the performance of Apache Spark jobs. IO Cache also works with Apache TEZ and Apache Hive workloads, which can be run on Apache Spark clusters.
Hadoop 3.0 also brings enhanced Enterprise Security Package (ESP) support for Apache HBase as well as Bring Your Own Key (BYOK) support for Apache Kafka.
According to the company, Azure HDInsight offers rich development experiences with different integrated development environment (IDE) extensions, notebooks, and SDKs. It also supports a vibrant application ecosystem with a variety of big data applications available on Azure Marketplace, covering scenarios from interactive analytics to application migration.