Cloud Native ComputingDevelopersDevOpsFeaturedLet's Talk

Coalesce Comes Out Of Stealth To Address Data Transformation Challenges

0

Guest: Armon Petrossian (LinkedIn, Twitter)
Company: Coalesce (LinkedIn)
Show: Let’s Talk

Coalesce, the data transformation company, recently emerged from stealth mode with $5.92 million in seed funding. The Coalesce Data Transformation platform enables data transformations at enterprise scale by increasing data engineer productivity and insights to tackle today’s data-intensive architectures. Armon Petrossian, Co-Founder and CEO of Coalesce, joins us on TFiR to talk about the platform, how it would change the market, the company’s vision and more.

Here are the key takeaways from this discussion:

  • What problems were you trying to solve that you created the company?

“I, along with my Co-Founder, was exposed in a unique way to the largest enterprise data warehouse use cases that this world has seen, quite commonly Fortune 500 and global data warehouses. We were focused on solving the very specific problems of those data warehouses at scale and that was the expertise and experience that we took with us when we started Coalesce.”

  • How would you define data transformation? 

“Everything in between the process of taking data from its raw format that’s just been loaded from your source systems to the point where it is prepared properly with business rules and transformations documented with accurate data lineage, that’s what data transformation means to us.”

  • We live in a data-driven world, with every sensor, system and IoT devices collecting data around us. Are there any specific areas that you folks are focusing on?

“From an architectural perspective, we’re the only cloud transformation tool that was built with column awareness from the ground up in architecture. That’s what unlocks the ability for our users to develop at a dramatically faster rate than any other user with a different tool or the next best alternative, and then also manage these projects as they become larger at scale.”

  • Can you talk about your multi-cloud strategy? How would you differentiate between a data lake and a data warehouse?

“We are very focused on the data warehousing use case, that is, quite commonly the biggest aspect to doing data. And being data driven is having a properly built data warehouse.”

“Data lake is where you can take unstructured, semi-structured data and do some prep prior to loading into your data warehouse. But ultimately, to build data properly and get the most out of your data, you want to focus on building, managing and preparing a data warehouse that is consumable for data analysts or data scientists.”

  • What does the Coalesce platform look like? How does it actually help?

“Coalesce gives you that same look and feel of an intuitive GUI when you look at the user interface. However, behind the scenes, it’s generating native SQL. It’s your target platform and it’s all customizable, editable, as well as flexible. You can then use the platform to create a standard for your organization.”

  • What kind of trends are you seeing in the data transformation space?

“One thing that I don’t think is going away is that the central IT team will still be responsible for the core datasets of an enterprise, rolling out those data sets and certified fashion to those departments so that those departments can then subsequently add on or do department-specific data prep.”

“We’re seeing tons of issues and organizations with operational governance, data pipelines that have been built incorrectly or inaccurately or inefficiently.”

  • Can you share some best practices for data engineering teams to tackle data-intensive architectures and improve data analytics?

“We want to empower our data experts to use a solution that is as flexible as they need it to be, but also as efficient as something that’s intuitive and user friendly.”

  • Coalesce recently raised $5.92 million in seed funding led by 11.2 Capital and GreatPoint Ventures. What are the key growth areas that the company plans to focus on with the new funding?

“Things that are yet to be solved properly are the ability to profile data, build a logical model of your data, and then push that forward to the transformation layer and build the physical model efficiently. So Coalesce plans to focus on data discovery, along with profiling and true data modeling.”

  • Anything that users should be excited about?

“One of the things that people should be excited about when it comes to Coalesce is being able to accomplish data projects at scale.”

“We’re really able to help our users and customers accomplish migrations more efficiently than any tool that’s on the marketplace.”

[expander_maker] 

Swapnil Bhartiya  00:00 Hi, this is Swapnil Bhartiya and welcome to TFiR Let’s Talk. And today we have with us Armon Petrossian, Co-Founder and CEO of Coalesce, a data transformation tool built for scale. Herman, first of all, welcome to the show, there’s so much to talk about, of course, you folks are getting out of stealth, you have raised, you know, 5 million seed funding. So congratulations. There is so much to talk about. But first of all, welcome to the show.

Armon Petrossian  00:24Thank you so much. I’m so happy to be here.

Swapnil Bhartiya  00:36Since you are a co-founder of the company, I am also kind of curious what specific problem that you saw in this piece that you created the company with a co founder, tell us about a story. Yeah, absolutely.

Armon Petrossian  00:46 So my co founder, and I, you know, we’ve known each other for quite some time, we’re together for nearly a decade. And we were exposed in a unique way to the largest enterprise data warehouse use cases that this world has seen. So quite commonly, fortune 500, global data warehouses, and we’re focused on solving the very specific problems to those data warehouses at scale. And that was the expertise and experience that we took with us when we started Coalesce. It’s all about how to transform data at scale, and as efficiently as possible, which is today where most data teams spend the vast amount of time, the vast majority of time when it comes to trying to drive value out of data. It’s largely in the data transformation space. And that is where we sit exactly.

Swapnil Bhartiya  01:40If I ask you, how would you define data transformation? What is that?

Armon Petrossian  01:45Yeah, so you can see in the analytics landscape that automation has impacted pretty much every sector of getting data from your source and its raw format, to actually driving insight, whether it’s using a BI tool or a data science tool. The bottleneck really is once you land data into your cloud platform, that point up until you’re actually using an insight driven tool, like a dashboard dashboarding tool like a Tableau or data science tool. So everything in between the process of taking data from its raw format, that’s just been loaded from your source systems, whether it’s a relational database, web source or API, to the point where it is prepared properly with business rules, transformations documented with accurate data lineage, both from a table level and column level, building dimensions, facts, everything that is really required to taking data from your raw format to the point where it’s actually consumable, and accurate, persisted properly. That is what data transformations mean to us.

Swapnil Bhartiya  02:54We kind of live in a data driven world, whether you talk about small sensors and IoT devices, or even just like cars, everything is collecting. Even my Smart TV collects data on me. So are there any specific areas that you folks are focusing on? Or is it doesn’t really matter? You know, anybody can use it in any use case, we’ll talk about the platform also. But I just want to go there slowly, slowly.

Armon Petrossian  03:16Yeah, definitely. So you’re totally right, we’re seeing data growth disproportionately compared to the amount of people that can support those use cases. It’s incredibly common for data teams to be under a lot of pressure from the business to help drive insight and help drive value from the data. What the bottleneck is for doing that is actually getting data prepared properly. It doesn’t matter if you’re a data scientist, if you’re a data analyst, if you’re a data engineer, the big bottleneck is actually transforming data efficiently and doing it accurately. And so what Coalesce does is something that is incredibly unique to us. from an architectural perspective, we’re the only cloud transformation tool that was built with column awareness from the ground up in architecture. That’s what unlocks the ability for our users to develop at a dramatically faster rate than any other user with a different tool or the next best alternative, and then also manage these projects as they become larger and larger at scale. That’s the biggest unique differentiating factor, I would say, around architecture, and where people spend the most amount of time with these types of projects.

Swapnil Bhartiya  04:29 One more thing is that as we do talk about multi cloud, depending on for hybrid cloud storage, depending on what you’re running, where you’re collecting the data, and then you’re sending it to either a data warehouse, or you’re building your own data lake. So how do you how do you like it? I mean, I’m pretty sure that whatever you folks are doing is purely multicloud because you really don’t know where customers are running. But how do you really because that data itself has no value Right, you have to extract value from it. And that is done, you know, by two data transformation. So let’s start with, you know, the challenges that you see with the multi cloud story. There is so many clouds. And then you know, I will also kind of hear your opinion on data lakes versus data warehouse. So let’s start there.

Armon Petrossian  05:19Yeah. So to answer the first piece, which is round, multi cloud strategies, I don’t see that going away anytime soon. However, we do believe from the best of breed approach that you’re seeing many, many organizations adopt snowflake, given it supports any major cloud platform today, and is a best of breed data warehouse. That’s a big reason why we chose that platform to be GA ready for launch. So from a multi cloud perspective, you know, you may have data on different types of cloud systems or maybe, you know, all TP verso or book versus an OLAP system. However, we are very focused on the data warehousing use case, that is, quite commonly the the biggest aspect to doing data, right. And being data driven, is having a properly built data warehouse. Which leads me to the next the next piece that you’re asking Swapna, which is around the differentiation between a data lake and a data warehouse, you’re seeing new term right with the data lake house as well becoming very popular common term, in the way I like to view it is, what is the high level goal here that we’re all aiming for, it’s getting value out of data. And to be able to do that you need to prepare it in a way that’s consumable. That can happen on a data lake that most commonly is done on a data warehouse, the way of you a data lake is as a place where you can take unstructured semi structured data, and do some prep prior to loading into your data warehouse. And that’s a very common need. I think that that is something that organizations will continue to do. But ultimately, to build data properly and get the most out of your data. You want to focus on building and managing preparing a data warehouse that is consumable for, you know, data analysts or data scientists, and as easy form possible. And oftentimes, you know, to put some of the pressure on some of these upstream users, whether it’s a data analyst or data scientist to do Data Prep from a data lake can lead to some poor behaviors, both from an operational governance perspective, and also from a lack of knowledge around data warehousing concepts.

Swapnil Bhartiya  07:46And I’m guessing that’s where Coalesce enters the picture to help fit all these complexities to make it easy, so that, you know, as we discussed earlier, in today’s world, everybody has to leverage data. And but it could also become very complicated. So Let’s Talk about the platform aspect, you know, what does the platform look like? How does it actually help? It doesn’t matter if you’re a data scientist or engineer. So because today’s where we look at unicorn developer, you should be able to everything right?

Armon Petrossian  08:11Yeah, definitely. It’s, it’s interesting, I think that we’ve seen the market shift quite a bit where historically, you had users that were comfortable and using a GUI based tool. And what they found is, is that oftentimes those gooeys, lack flexibility compromises that flexibility. And, you know, most of these data transformation tools are historically ETL tools, which were largely widget based transformations. So it led to a lack of flexibility. As a result, you’re starting to see a shift more into this code centric or code first approach that’s becoming popular as well, because people feel burned by the gooeys of the past, you know, he’s never worked for us, we’re just going to write code, we’re going to do this manually, we’re going to write code. Coalesce really is the best of both worlds. So when you look at the user interface, it gives you that same look and feel of an intuitive GUI, however, behind the scenes, it’s generating native SQL, it’s your target platform. And that’s all customizable. It’s all editable. It’s all flexible, as flexible as code really. And you can then use the platform of Coalesce to create a standard for your organization, which we’ve seen help so many organizations when it comes to empowering some of these data, Insight types of people, whether it’s data scientists or data analysts, to do Data Prep properly and efficiently without having to be as knowledgeable with Core Data Warehouse concepts, since that’s typically not their wheelhouse. But again, driven by central it driven by the central data guild, if you will,

Swapnil Bhartiya  09:56in today’s world. You no matter what you do, Unless you need to have a software or you know, cloud or IT strategy, and sometime the bar can be so high, you look at CNCF landscape, the technology is so you know, complicated crazy. That’s why we also see popularity of low code, no code is there. So can you also talk about, you know, how you also, um, you actually did touch upon? And that’s what I’m asking it to make it lower the barrier. So, you know, companies can, you know, easily get value out of the data that they’re collecting out there? Yeah, yeah,

Armon Petrossian  10:28definitely, I would say that the area where people have gone wrong, is where they go too far until one side of the spectrum. So if you go too far into the GUI, side, the house, then you’re going to naturally compromise flexibility, because you can only work within the framework that the GUI allows you to be in whether it’s no code or low code tool. However, on the flip side, if you go too far into this code, first programmatic approach, what you’re going to come to find is, as you’re migrating 10,000 database objects from an on premise platform to the cloud, that’s going to become really time consuming, as become a horrible pain to manage as you scale, not just for the initial implementation, but when changes come into the picture. And when business requirements change, it becomes incredibly hard to manage these types of projects at scale. And so with Coalesce, we wanted to take the best aspects of both worlds, give you that intuitive interface, however, make it customizable. And code first in the sense where now you can do everything that you could with code, but it’s just simply accelerated and so much easier to manage with this interface.

Swapnil Bhartiya  11:38I mean, you since I’m talking about it, and I can not think of what an entity is. But once again, toggle everything has code, right in projects code. And we also get off there, I also want to talk quickly about, you know, in the especially in the data transformation space, what kind of movements, what kind of trends you’re seeing, because we also talked a lot about automation. So when you look at all the movements that are happening in this space, so as to not only maintain a balance, but also meet customers where they are in their journey, you know, irrespective of what approach they’re taking, whether they’re taking, you know, integers code approach, low code, no code gy, at the same time, you know, they should be able to plus, you know, I just said you, you’re able to do a lot of things, but you know, automation is also driving a lot of things. So, I mean, there are a lot of things bundled up there. But I want to hear your insights. What do you think?

Armon Petrossian  12:36Yeah, from a trans perspective, I think one of the things that we’ve noticed here at Coalesce is becoming more and more common, which I personally love, is that you’re seeing organizations view data differently in a much more positive fashion. Where you you see the historically the central IT team that was predominantly focused on data is now starting to branch off into other departments, almost a citizen data roles within specific departments, whether it’s finance or marketing, you starting to see people and organizations hire data personnel for each one of these departments, along with central IT. So that begs the next question. Okay, great. So you’ve got data personnel in every department, and you have your core IT team that has been the ones that are responsible for data historically? How do you make them all work together in a uniform fashion in a way that’s productive for everybody. And one thing that I don’t think is going away is that the central IT team will still be responsible for the core datasets of an enterprise, they will also be responsible for rolling out those data sets and certified fashion to those departments so that those departments can then subsequently add on or do department specific data prep. As of today, you know, one of the things that makes it difficult to do this with a code first approach is that it can get very messy. You know, as you tried to scale this out organizations, in particular, people who aren’t familiar with data warehousing concepts, as the project grows, as the date data demands grow, you can dig a really big hole by not having appropriate data warehousing concepts applied to those departments. So we’re seeing tons of issues and organizations with operational governance, data pipelines that have been built incorrectly or inaccurately or inefficiently. Whereas with Coalesce, we can empower the central IT team, the central data analytics personnel, to then roll out one solution that fits all the audiences and everything in between both central central IT data analytics teams for the enterprise data sets, as well as the data analysts, data scientists and everything in between.

Swapnil Bhartiya  14:48Excellent. You touched upon this earlier while Coalesce does take care of a lot of things. But do you also have a kind of playbook so that once again is picked you off what level up or data Hyundai’s are In junior, how they should kind of prepare the data. So it becomes easy not only to transform it, but also that sometimes it can be, you know, you’re dealing with massive amounts of data, which once again, you know, you have to move it around. So share, you know, if you have either some best practices or steps are things that they should or should not do, yes.

Armon Petrossian  15:22So Coalesce really helps guide users around parameters that are most effective for doing Data Prep properly. And so and doing transformations properly. And so that’s just a byproduct of the platform is that it comes out of the box with these best practices that are just baked into the solution. Now, I think where it gets a little bit more tricky is in edge cases, or in scenarios where those out of the box aspects of the tool need to be extended, or at least historically, that’s been the case. And so with Coalesce, the benefit is anything that you could do on your target platform, this case, snowflake is possible within Coalesce. And even better, creating a standard with our tool is incredibly easy. So that way, you never have to repeat the same thing twice. And you can also encourage the right behavior when it comes to doing data transformations properly. So so really want to be able to empower our experts or data experts to use a solution that is as flexible as they need it to be, but also as efficient as something that’s intuitive and user friendly.

Swapnil Bhartiya  16:33Excellent. Now, we talked a lot about data and everything else. Let’s Talk about the company. Now you folks, first of all, once again, congratulations. I don’t understand, you know, what is going to be the focus, the vision of the company, is, you know, the world we live in is changing so fast. Plus, you have also raised, you know, 5 million. So I also want to understand the areas that you are trying to grow this give me you know, high level overview, the vision of the company that you have there. Yeah, definitely,

Armon Petrossian  16:56you know, we started with transformations for a reason we saw that that is the largest bottleneck when it comes to data today is getting data transformed, prepared properly. Now, one thing that’s unique about Coalesce is the architecture that I mentioned earlier around it being focused on columns, and being calm, aware and building the solution from the ground up with that aspect incorporated into our platform. And from here, we’re stuck, we’re seeing different types of problems that we’ve been exposed to in the past that still really haven’t been answered today. And what’s most important is being able to integrate those solutions together with transformations. Whereas as of today, if you look around the modern data stack, a lot of the solutions in the vendor space are point solutions for one specific problem that may have an integration with other parts or other solutions, but from an architectural perspective, aren’t aligned. And so when I say having that calm awareness is so important, important to architecture is what will allow us to then subsequently go and solve other problems and bring compounding value as we do that. One thing that I can say is has yet to be solved properly, is the ability to really profile data, build a logical model of your data, and then push that forward to the transformation layer and build the physical model efficiently. Data Discovery is constantly a problem, along with profiling and data modeling to true data modeling,

Swapnil Bhartiya  18:30we are at the beginning of 2022. And you folks, you know, today, you know came out, I want to also understand, of course, there are a lot of things you cannot yet talk about. There’ll be a lot of press announcements and nuanced mentality. But one of the things we should look out for one of the things we should be excited about, that you folks will be doing this year, the biggest

Armon Petrossian  18:50thing that people should be excited about when it comes to Coalesce is around being able to accomplish data projects at scale. What we found is as we go from organization, organization, enterprise to enterprise, you have been the more common format of cloud use cases have been department specific workloads, not quite yet. The enterprise data warehouse that is the most core function to analytics as an organization. And the big reason why is because looking at a migration from an on premise platform, you know, where you had your enterprise data warehouse built over a decade or multiple decades, is a very, very time consuming, and Effort Driven migration. And organizations are now starting to think about how to really migrate the most core workloads and the DW. And with Coalesce, that becomes possible at a more efficient rate. That’s then that’s ever been seen before. So we’re really able to help our users and our customers accomplish migrations more efficiently than anything. Any tool that’s on the marketplace. So encouraging the behavior migrations to the cloud. Doing it efficiently. Turning that three year project down to six months is really something that that we view as an incredible use case, incredible value add just to the data space in general,

Swapnil Bhartiya  20:12Thank you so much for taking time out today. And of course, not only talking about the company, but also the challenge in the data transformation space that you’re tackling with the company. Congratulations once again, and I said, you know, there’s so much to talk about. You gave us some kind of preview there. So I would love to have you back on the show. But thanks for your time today.

Armon Petrossian  20:29Absolutely. Thank you so much for having me. Swapnil. It was a pleasure and I’m looking forward to doing this again soon.

[/expander_maker]