Li Kang, Vice President of North America of Kyligence, chats with Swapnil Bhartiya about Kyligence Cloud 4.5. Kyligence provides an intelligent data cloud, which helps transform the way people use data. To make this happen, Kyligence provides a data analytics platform (both in the cloud and on-premise) that enables users to get insights from vast amounts of data stored on their platform (such as data lakes or a cloud data warehouse). They can then enable those insights to service front-end BI applications, dashboards, or machine learning (ML) applications.
With the latest version of Kyligence (version 4.5), they’ve expanded the platform to cover a lot more use cases and scenarios. One new addition is called Smart Tiered Storage, which incorporates a new open-source project, called ClickHouse. The first tier of this new feature can handle ad-hoc query scenarios. The second tier is the Apache Kylin-based OLAP pre-computation technology. According to Kang, “That’s where we precompute the results. We know what questions you may ask and we have the answers stored in the Apache Kylin-based aggregate index.”
Kang says of this two-tiered storage, “We can support a much wider array of use cases so that whether it’s a precomputed result or ad-hoc query, we can support both of them with a high-performance and high concurrency.”
Another new feature is called Kyligence Real-Time, which makes it possible for the platform to ingest data from streaming sources (such as Kafka).
As to Kyligence’s association with Apache, Kang makes it clear that Apache Kylin is the core of their product. Kang says, “As you know, Kyligence is founded by the creators of Apache Kylin. So we continue developing the core OLAP processing engine with Apache Kylin.” Kang also indicates that Kyligence continues to invest in leading the development of the Apache Kylin project.
The roadmap for Kyligence is focused on the data analytics domain—the convergence of the data warehouse and data lake, as well as the convergence of batch processing and real-time processing and traditional BI analytics and AI/machine learning. Kang also mentions that Kyligence intends on enhancing its products with more native support for AI/machine learning workloads.
Summary for this interview/discussion was written by Jack Wallen
Here is an edited transcript of the discussion…
Swapnil Bhartiya: Welcome to TFiR Newsroom. And this is your host Swapnil Bhartiya. My next guest is Li Kang, Vice President of North America of Kyligence. Li, it’s great to have you back on the show.
Li Kang: Yeah! Thanks for having me, Swap.
Swapnil Bhartiya: And today we are going to talk about Kyligence Cloud 4.5. But before we talk about this release, I want to talk a bit about the company.
Li Kang: We provide an intelligent data cloud and our mission is to transform the way people use data. So we provide a data analytics platform, both on cloud and on platform, that enables users to get insights from the vast amount of data that is stored on their data platform (like whether it’s a data lake or a cloud data warehouse), and enable that insight to service front-end applications like BI or dashboards or machine learning (ML) type of applications.
Swapnil Bhartiya: You offer software that folks can install or you offer a service?
Li Kang: It’s a software that people can either install on-premises or deploy to their cloud environment.
Swapnil Bhartiya: Excellent! Now, let’s talk about the latest version 4.5. What’s new in this release?
Li Kang: There are lots of new features in the 4.5 release. First of all, talking about the theme, we expanded our platform to cover a lot more use cases and scenarios.
So to highlight a couple of things, one is what we call Smart Tiered Storage, that is we incorporate a new open source project called ClickHouse in our Kyligence 4.5 release. With ClickHouse, now we can handle the ad-hoc query scenarios.
So if you recall from our discussion back in January, we talked about the Kyligence 4 release. At that time, we focused on things like cloud-native architecture, unified semantic layer and a smart push down. So we focused on enabling high performance, high concurrency analytics on the cloud or a data lake platform. And we leveraged our key technology or core enabling component—Apache Kylin, which is an OLAP technology or a multi-dimensional online processing technology built on the big data platform.
So in this 4.5 release, we expanded on top of that capability. We introduced ClickHouse, which enables users to run any type of query. You may have any question that you come up with and you can just ask that question, send your query to your data platform and ClickHouse will help you find the answer. And that’s the first tier in our Smart Tiered Storage architecture.
The second tier is our Apache Kylin-based OLAP pre-computation technology. That’s where we precompute the results. We know what questions you may ask and we have the answers stored in the Apache Kylin-based aggregate index.
So with this two-tiered storage, we can support a much wider array of use cases so that whether it’s a precomputed result or ad-hoc query, we can support both of them with a high-performance, high concurrency. So the Smart Tiered Storage to support ad-hoc queries is the first major feature.
The second major feature is called Kyligence Real-time. The OLAP or MOLAP technology was mainly used for batch processing in the past.
So with the 4.5 release, we expanded the support to streaming data sources. So now with our platform, we can ingest data from streaming sources, such as Kafka, and we have other sources that will be supported in the near future. So you have new events or new records or new messages arriving at the streaming data sources and we will pick up those new messages or new events, and we’ll merge that into our aggregate index. And then, we can run the OLAP analysis on top of that.
So in this scenario, we can process both the historical data and the latest information coming from the streaming data sources. So in all these different types of analysis, it is under the same data model, same unified semantic layer. So from the user standpoint, you don’t need to worry about whether you are dealing with real-time information from the streaming source or the historical data in the pre-built, pre-computed aggregate indexes. You are just sending the same query and the platform will handle different scenarios.
So these are the two major features. Other than that, we obviously further enhanced the performance. We introduced the new Spark, the latest Spark processing engine as the back-end processing engine. We expanded our unified semantic layer to support Power BI through the MDX Connector. So these are some of the other features that are available in 4.5.
Swapnil Bhartiya: Excellent. Thanks for explaining all those features in detail. Can you talk also about what role does Apache Kylin play in this release?
Li Kang: Apache Kylin is still the core of the product. As you know, Kyligence was founded by the creators of Apache Kylin. So we continue developing the core OLAP processing engine with Apache Kylin but as I mentioned, we expanded our open source effort. In this release, we introduced a new open source technology called ClickHouse, which has been very popular among the analytics developers. And also we are contributing back to other open source projects, for example, Spark and Apache Hudi. So just overall, Kyligence is committed to open source development, and we continue to invest in leading the development of Apache Kylin project.
Swapnil Bhartiya: If I may ask you, who are your direct competitors? Who do you compete against, and what edge do you bring to the market to stay ahead of them?
Li Kang: There are many products that play at the same tier where we play, that enable the insights on top of a big data platform, whether it’s on premises or on the cloud, right?
So in that sense, there are lots of vendors that are trying to solve this problem. We take a different approach. Some vendors focus on pre-computation with OLAP, some on real-time processing, and some on ad-hoc data exploration. With 4.5, we introduce this platform that is able to support all these different use cases or scenarios. So we can do both the OLAP batch processing, we can do real-time OLAP processing, we support ad-hoc queries. And then we govern all three different scenarios with the same unified semantic layer and behind the scene are our AI-driven engines.
So with the AI engine that automates the modeling and automates the optimization of the system, that probably I will say is our secret sauce, the engine that powers all different use cases. And then this engine is running behind the scene to support this platform, which can support all three different use cases. So I think that’s our strength, and that’s how we can beat our competitors.
Swapnil Bhartiya: Now that the 4.5 is out, I’m pretty sure the team is already working on the next release. What are the areas that you are going to focus on with the next release?
Li Kang: Beyond the 4.5, I think the trend is quite obvious in this data analytics domain– the convergence of data warehouse and data lake, the convergence of batch processing and the real-time processing and the next convergence that’s happening is the convergence of traditional BI analytics and the AI machine learning workload. So, I think that’s quite obvious and that’s our direction as well. We already talked about the convergence of batch and real time, the convergence of the ad-hoc and the pre-computed queries. That’s already happening in our product.
So naturally, enhancing the products with more native support to AI/machine learning workload and also continuing to support that in open source fashion, you will see we’re going to have some new product released along that line pretty soon.
Swapnil Bhartiya: Li, thank you so much for taking time out today to talk about this new release and also the market that Kyligence plays in. And I would love to have you back on the show. Thank you.
Li Kang: Thank you, Swap.