Cloud Native ComputingDevelopersDevOpsFeaturedLet's Talk

What’s New In Varada 3.0 | Eran Vanounou

0

Varada has released v3.0 of its analytics platform that brings elastic scaling to its autonomous indexing platform. We sat down with Eran Vanounou, CEO of Varada, to discuss this release, what’s new and how it stacks up against core competitors of Varada.


Guest: Eran Vanounou
Company: Varada

[su_note note_color=”#ddd” text_color=”#fff”]Varada offers a new standard in data virtualization. Varada enables large and complex datasets to optimally serve analytic users and apps by automatically making them extremely fast and instantly operational. Varada delivers the zero data-ops approach and empowers data architects to seamlessly accelerate and optimize workloads. Varada’s dynamic and adaptive indexing technology ensures optimal control over data and cluster performance, as well as enables to perfectly balance performance and cost.[/su_note]

Here is an edited transcript of the discussion by Jack Wallen.

What is Varada all about?

Think about it as a data platform, and what we do is apply a different strategy to accelerate your analytics workload. And we do that in the simplest way. The reason we do that in a seamless way is mainly due to our main innovation, which is indexing…or I would say dynamic indexing. We embrace the data lake architecture. So, you can think about Varada as a multi-tier on top of your data lake and below your data consumers. We run within your virtual private cloud (VPC). And we don’t move, copy, or model your data. Your data lake will be the single source of truth and we will take it from there.

What’s new in Varada v3.0?

If I’m moving a little bit backward, I will divide our journey into three parts. The first was the most complicated, which is where we had to deal with IP, patterns, and building really adaptive indexing. So that was like the main technology. And when we started, we had to know what exactly needed to be indexed, and when? So we did two things. First of all, aside from the indexing, we added a smart cache. Acceleration strategy is a combination of the right data that needs to be cached and the right indexes. But the layer we added, at least in the second phase, was a smart machine learning layer that is sensitive to your workload patterns and behavior. And this is the layer that continuously takes the decision about what needs to be indexed and when and what needs to be cached.

Back to your queries: Patterns and behavior. So you are actually getting autonomous decision-making about dynamic indexing. And what we announced today is a powerful elasticity. We are bringing the ability to scale out and scale within the cluster and in between clusters.

Improvements in Adaptive Indexing, one of the core innovations of Varada

This is our core innovation, indexing, where when we say indexing, we treat it a little bit different than the classic definition of indexing. The way we do that is we’re leveraging (Solid-State Drive) SSD and Nonvolatile Memory Express (NVMe). And if you imagine a huge table in your data lake, let’s assume 900 billion rows across 1000 columns. We look into a single column, let’s say it’s a session ID. Within this single column, we look into a few kilobytes, say 64,000 rows. It’s really small, we call it a nanoblock. Within this nanoblock, we look at two things, the structure of the data, in this case, it’s a session ID, so it’s a number, and also the specific content. In this case, if it’s a number, it could be a high cardinality number or low cardinality number. And then we choose the right mathematics, the right index that will fit into these more narrow blocks. And we tailor that whether it’s a dictionary or even Lucene. If you zoom out, we will do that for any other block. And each block will have an independent and tailored index independent to the other block. Everything is connected to Presto, as a SQL engine, and therefore you benefit from both the agility and the power of the community and everything that Presto brings, but with the smartness of other indexes. So this is actually the main innovation.

What’s the direct impact of this release on customers?

For this release, we have two main layers. The first layer is you Mr. Customer. You have your own data lake. This is your data layer. This is the single source of truth. And we call it the cold layer and then under Varada SSDs, we have what we call the LOCH layer, which is where the indexes and the cache sit. The recent release added another layer, which we call the warm layer, which is essentially the same indexes already indexed, cached, and compressed, but kept out in an object store and is ready to be consumed when a cluster needs to scale out. This makes it possible to have those already indexed and already cached in an object store that enables each cluster to scale out really fast. It’s a multi-cluster approach. If you need another region, and you need just to build another cluster.

Pandemic accelerated digital transformation and cloud journey. How does this release address some of the challenges these ‘new’ comers are facing?

This is a good question. So we meet customers on a journey. So one of the journeys that you mentioned is like customers are very slow to move and adapt to new innovations. So they want to move to the cloud. And when we look at that sort of thing, data lake is a great innovation. It’s pretty easy, it’s pretty cheap, and not necessarily due to Varada. It’s due to Amazon and Google and, and everyone else. So when they choose that data lake is the right way to go, we think, “Okay, now that I have this data, finally accessible, how  can I make that operation? And then usually we see customers thinking about, “Okay, is it a data warehouse approach or a data lake approach?” And the data warehouse won’t disappear, because we’ll always have a demand for data warehouses. But with data lakes, you can do so many things with data easier, cheaper, faster, versus moving the data and coping and modeling and waiting. So we see customers on the move from on-prem to the cloud, or customers that are using the data warehouses, and all the BI and application everything on the data warehouse. And when we’re realizing that it’s easier, cheaper, and faster to offload from the data warehouse into data lakes. So we see those two phenomena happening together.

Direct competitors of Varada

So first and foremost, I would say that a well-known name will be Snowflake, we did an amazing job mailing interviews. But I think we differ from Snowflake in two to three key areas. The first one we will run in your VPC. We will treat your data as the single software that works with the messages that you don’t need to move, copy or model your data. That’s number one. Number two: We are leveraging our key innovation, which is indexing. The indexing makes a big difference. As I mentioned, indexing is an autonomous acceleration strategy. So actually, the system listens to your workload and adjusts itself to your needs. So this is a big advantage. And last but not least, it’s just cheaper due to the fact that you don’t need to move the data and it’s part of your infrastructure. Also, our solution is more targeted to the data platform team, to more to the CTO, CIO, head of engineering versus the analyst on the top. So, one competitor, a big one would obviously be Snowflake, which is one of the alternatives that customers are considering. But we enable customers to control data democratization. We’re based on Presto. We like Presto. We adopted Presto because we believe in the power of the community. Presto was born to work on data lakes, and we practically leverage Presto as an engine. So obviously, we are customers that use Presto. We also partnered with Starburst, as a joint effort to go to market together. And we’re also offering Varada as a Presto connector to your existing Presto environment. So, we’re pretty flexible. But Presto alone is also kind of a competitor. But we are also adding a meaningful value for those Presto users. And Dremio is yet another open-source solution that addresses data lakes and is doing a pretty nice job. But, again, our indexes make a big difference. The autonomous indexing which is our main competitor, Amazon Athena, also provides a solution to run on data lakes. This is essentially a Presto-based solution as well. But it’s a brute force, it’s a full scan. So when you compare Amazon Athena to Varada, it’s a huge difference in terms of performance, concurrency, and definitely costs.

How competitors inspired some features of this release?

I would say that we started with the core, like every startup that it was a technology versus a solution. And then step by step, we are adding more capabilities. Things that we learned from customers and also competitors, was to make sure that we focus a lot on ease of use. Because this field is really, really complicated. How can we make our go-to person, which is the head of the data platform, or chief data officer…how can we make his life easier? How can we enable him to cope with the business need in a timely manner and in a cost-effective way? So that’s practically what we do. And if you think about, for example, Kubernetes, that’s something that we’re about to release soon. This is part of the features that we enable to make Varada faster and easier.

What’s next in the pipeline for Varada?

Kubernetes is something that’s happening pretty soon and will be released. The second thing is the additional cloud vendors. We’re starting with Google and Vantaggio. So that’s something that is part of our plan. A few other things are related mainly to ease of use, and the ability to adopt Varada in an integrative environment. So that means, for example, Iceberg connectivity, and a few other third parties that are part of the big data analytics.