Guest: Taylor Murphy (LinkedIn, Twitter)
Company: Meltano (Twitter)
Keywords: DataOps, open source
Show: Let’s Talk
Summary: Meltano is building an open source DataOps OS that aims to be the foundation of every team’s ideal data stack. We recently hosted Taylor Murphy, Head of Product and Data of Meltano, to understand what exactly DataOps is, how companies are starting to recognize the value of the data that they have and the hidden value that’s there for individuals and larger organizations, the role of open source in the data stack, the value open source adds to the community as well as Meltano as a business, and more.
“We believe that open source adds a ton of value to the community and to us as well as a business. Several of the team come from GitLab, which shows that having an open core business model is fantastic and builds a large community where there’s open source that is free to use for every product, and then there’s a paid version that larger customers will pay for. That is something that doesn’t really exist for data teams and that’s what we’re trying to build,” said Murphy.
Highlights of the show:
- What is DataOps and why is it a game changer for data professionals?
- Where does Meltano fit into the modern data stack?
- What is the role open source plays with Meltano?
About Taylor Murphy: Taylor Murphy has been involved with Meltano since its inception, acting as the primary customer with whom the team engaged to understand the needs of modern data professionals. Taylor is passionate about maximizing the potential of data and building the future of the data profession.
About Meltano: Meltano enables everyone to realize the full potential of their data. We are building a modular, open source DataOps OS to be the foundation of every team’s ideal data stack. Meltano simplifies configuration, deployment and monitoring, unites best-in-class open source tools and technologies for the data lifecycle. It allows data teams to benefit from DevOps best practices such as version control, code review and CI/CD. Meltano is in use today by organizations including GitLab, HackerOne, Netlify, Remote.com and Zapier.
Here is the full unedited transcript of the show:
- Swapnil Bhartiya: Hi, this is your host Swapnil Bhartiya, and welcome to TFiR Let’s Talk. And today we have with us Taylor Murphy, Head of Product and Data of Meltano. Taylor, it is great to have you on the show.
Taylor Murphy: Thanks for having me.
- Swapnil Bhartiya: Since this is the first time I’m talking to somebody from the company, I would love to know a bit more about the company. So tell us, what do your folks do?
Taylor Murphy: Yeah, so Meltano is building an open source DataOps OS, that aims to be the foundation of every team’s ideal data stack. So, at its core, it’s an open source project that will integrate with a number of different tools in the data ecosystem, and provide a foundational layer of DataOps best practices.
- Swapnil Bhartiya: Can you define a bit, what do you mean by DataOps?
Taylor Murphy: Yeah, so DataOps really, it’s two components, there is a cultural and also a technological component to DataOps. Colloquially, it’s kind of known as DevOps for data teams, and that specifically talks to bringing best practices from DevOps, such version control, end to end testing, isolated environments, all of these great things that software engineering teams have, bringing those over to data teams and data professionals. Along with the technical side, there is also the cultural aspect of increasing quality and confidence in the insights that these data teams are generating. Data teams are dealing with a lot of data. Data is constantly changing and the business is constantly changing. And DataOps is a set of tools in methodology to really increase overall quality and confidence in the output of data teams.
- Swapnil Bhartiya: This could be an interesting kind of discussion about the topic. When we started talking about Cloud Native cloud, we were looking at breaking down old silos, which were more structured around specific areas or you can talk about networking, storage, and compute, and what not. It was more about specific people specialized in the specific field. We do want specialized folks. But now when we look at all these new paradigm shifts it started with DevOps, now we see that kind of soft silos are being built. And when I look at it, it’s nothing bad because no matter how we look at, there are certain folks who will have expertise and experience in certain areas, at the same time, as much as we want to have unicorn developers who can do everything, that is not pragmatic, that is not very practical, that’s also not very efficient. So can you also share your kind of thoughts about when you do talk about DataOps? It’s like, not the old silos, when we talked about those silos, they were not even talking to each other, here we do see, so how would you define this change, this evolution?
Taylor Murphy: Yeah. So, I think there’s a couple of things going on there. Companies are really starting to recognize the value of the data that they have, and data teams in the past have typically been fairly ad hoc, or you’re pulling somebody from finance. Maybe you bring in somebody from the outside who’s under finance or under a specific team and is kind of starting to pull these insights. As technology and as processes have improved, companies are saying, oh, there’s a ton of value here, we need to understand what’s going on in our business, what’s going on with our customers, and companies are popping up to really level up the data profession, what we’re seeing on our end, there are fantastic tools that are being created for specific problems that people have to solve, whether it’s… The data warehouse is kind of the center of gravity for data teams right now. There are tools specifically for moving data from point A to another, tools for testing, tools for transformation, for discovery and cataloging. And they’re all fairly right now isolated, and aren’t well integrated with each other.
And it’s a very similar story to what software engineers kind of experience with different tools to build your full tool chain, and maybe folks would specialize in these areas, but maybe you have a small depth team that’s trying to do a little bit of everything. So, what we’re seeing is, there’s kind of an unbundling right now of fantastic tools and fantastic ways of working in specific areas. And we honestly predicted that at some point there will be a tool that can help bring these together, and we want to be that tool and be that foundation. Does that answer your question?
- Swapnil Bhartiya: It does. And it also leads to my another query, which is, if we look at today’s world, will it be fair to say that we actually live in a data driven world? Apps themselves have no value though, data is what is driving all the way from Tesla cars, to a small app that we use on our phone just for whatever reason, healthcare, finance, whatever it is, these are about. So we do live in a data driven world. And that goes back to your point that, in the early days, you will pull some folks or somewhere, today we talk about the whole dev team, but you do need to extract value from the data. So, can you also talk about the importance of data in today’s world? And then we will talk about how you are kind of lowering the barrier of entry because not everybody can be data scientists, most of the time we are dealing with engineers and developers.
Taylor Murphy: Yeah. So, data I think, has always been important in companies. I think it’s just a recognition or even a rediscovery of the value that’s always been there. I think over the past 10, 15, 20 years as software is starting to eat the world, we demanded better user interfaces and experiences for this. But any app isn’t worth salt, unless there’s a database behind it and there’s data in that. And so there has always been this focus and value on users’ data. And I think it’s just a recognition of how data kind of permeates everything, and the hidden value that’s still in there for individuals and for larger organizations. Even internally within a specific company, you are setting up applications and processes that are generating data, and there’s information and insights to be had from there. So it’s not just consumer facing or, or B2B what’s going on inside.
There’s a lot of data being generated there as well. So it is very much data driven. I think I like to say, decisions and insights are data informed, less so than data driven, but the value of data is absolutely key in every present.
- Swapnil Bhartiya: If you can also talk about, when we look at modern infrastructures, we do talk about other components of the stack. Can you talk about what the stack around data looks like?
Taylor Murphy: Yeah. So, typically if you were to go out and kind of query folks for what is the ideal modern data stack, and that’s kind of the term that’s come into par, what is the modern data stack. It’s anywhere from seven up to 12 or more different tools that you’re kind of pulling off a shelf. Those can be pure open source tools, or they can be contracts and SaaS vendors that you’re having to negotiate contract with. Typically folks like to see a tool for moving your data from point A to point B, there’s going to be your warehouse, you may have a testing tool. If your organization is especially complex, you may have a catalog or discovery tool, there’s observability tools for understanding what’s going on. And it’s, honestly, it’s a very similar story to software engineering, where there were different tools for version control, and continuous integration. And integrating all of these can be very challenging.
And that’s what we’re hearing from folks is like, “Hey, this, this tool solves my problem, and it does it really well, but it doesn’t talk with this other tool.” Where I’m now having to renegotiate with this vendor, or I need to swap it out. And that’s one of the big problems that we’re seeing is that there are, this influx of, of fantastic tools for solving real problems, but the challenge of configuring them, potentially deploying them if you have a secure environment of managing the integrations between them is what’s lacking. And that’s why Meltano exists, is that to help bring order to that chaos,
- Swapnil Bhartiya: Right. I actually want to talk a bit more about the chaos so that we can also understand Meltano holds them. You touched upon some of the challenges, but if you look broadly also, when we talk about cloud, when we talk about multi-cloud hybrid cloud, so challenges also differ. And with data, there are unique challenges as well. Some folks may put in the data lake, some put the data warehouse and then a new jar, then say well, you went from that, these two words. So, can you talk about, just give us a quick glimpse, what are some of the biggest challenges you see folks face when they do deal with data stack? And then we’ll talk about how Meltano comes in to help them.
Taylor Murphy: Yeah. I think the biggest challenge is, when you’re going from zero to one, if you are a relatively new company, and you’re just getting data, and you have specific questions to answer, the data tools that you can go out and buy or use, gets you to that answer fairly quickly. It then becomes an even harder challenge when you need to change things. So perhaps you have your raw data and you’ve transformed it, you’ve answered the question and you now have a dashboard, but something about the business changes, the question then becomes, how quickly and how adaptable is your team to actually managing that change? And then how confident are you going to be in the insights and the reports that you’re generating downstream?
That’s where DataOps and the tooling comes in. Day one, you can maybe get away with just buying a couple of tools, answering the question, but if you’re not able to handle change and deal with change management more broadly, that’s going to be a challenge. And that’s where we see people kind of falling down as it were, and how they’re able to respond to what the business needs. So it’s that change management, it’s that confidence when you’re making changes, and it’s, how do you keep up with the dynamic state of the business and of the data that you’re working with?
- Swapnil Bhartiya: You mentioned day one, that’s where I also see a lot of play with open source. With open source, what happens is you can get the code, get it up and running, but day two challenges are where you need to add functionalities, update, upgrade, security, safety. So first of all, I want to hear from you, what role is open source playing in the data stack. And then we will talk about how Meltano comes in, because you folks do a lot of open source, as you also mentioned, to help, either it could be commercial operating, or whatever support you offer so that folks can move from day one to day two.
Taylor Murphy: Yeah. So open source is very foundational to what we’re doing as a company and as a project. We believe that open source adds a ton of value to the community, and to us as a business. Several of the team come from GitLab, which is shown that having an open core business model is fantastic and builds a large community where there’s this open source, free to use for every product, and then there’s a paid version that larger customers will pay for. And something like that doesn’t really exist for data teams, and that’s what we’re trying to build. Open source provides two main benefits we see. One is that, there are just data teams that don’t have access to best in class tooling, whether it’s financing, or not able to negotiate a contract, or whatever process it may be, having a free open source tool that brings in these best practices and makes it easy to use, that just levels up the game for everybody, it grows and elevates the data profession generally.
The second part is, there’s a community side to this as well. We’ve seen with dbt Labs and their open core product, dbt, the value that the community around these tools can really have. On the extract and load side, there is a particularly unique challenge, where there are thousands of different SaaS tools and places where you may want to move data to and from, no vendor is going to be able to maintain all of those connections. The only way we believe that it can be done is through an open source community and open standards, that’s why we’ve focused heavily on the Singer ecosystem for transporting data, or replicating data. And so open source makes it easy to collaborate and to work with a community of people, to solve very similar problems. It also empowers people, I think, to understand what they’re tooling, they’re running, and to make changes. If you need to deploy your tools into a secure environment, or you want to look under the hood and see what’s going on, open source enables you to do that, and it really brings people together.
So, one of our core values as a company is community, and open source is a huge part of that. It’s not just something that we like to play lip service to, like it’s really important to us.
- Swapnil Bhartiya: You talked about externalization, of course, that is another bigger challenge, if I’m not wrong, Linux Foundation, I think last year, they came out with an initiative which was more or less externalization on data as well. How much effort do you think are going on, or how much of it is needed? Open source, once again, goes back, it’s just about a code collaboration, but data as you mentioned, it was a totally different beast altogether.
Taylor Murphy: So, the standardization around data in particular, for data replication is essentially about the transfer format. So there’s different ways to specify what data looks like. The Singer standard that Meltano currently integrates which specifies basically how records and schema messages, and metrics about data should be output by a particular program, whether it’s written in Python, or Go, or another language. The value of having the transfer specification be open and clear is that, any number of people can write, connectors that output this data into a certain format. And as long as the connector of the target can accept data in that format, you can pipe it anywhere that you want. And so it helps, instead of having to build one to one integrations for this SaaS tool to this warehouse, and then another SaaS tool to the same warehouse, you can build one connector for the warehouse, and then as many connectors or taps as you need, to pipe data. So it really just unlocks both sides of the equation, and allows us to build a few other folks to build some, and kind of have this larger community of open tooling for moving data.
And this works both directions, whether you’re moving data from a SaaS tool to like a warehouse, or from a warehouse or database, and pushing up to a SaaS which is, now typically called reverse ETL.
- Swapnil Bhartiya: Yeah, thanks for sharing that. Now, if I ask you, just go back to the company itself, what are the things that you should be looking at for 2022? Of course, this is the beginning. And what COVID taught us was not to plan too much ahead, but either you can talk about the trends that we should be seeing in 2022, in terms of data stack, or you can also talk about what we should expect from Meltano, what I think your folks are working on.
Taylor Murphy: In 2022, we’re seeing… We have a lot of predictions about what’s going to happen, and this actually informs our roadmap and how we’re developing Meltano with the community. One is that the data stack is going to continue to become more complex and evolve, as professionals are figuring out what their best practices look like. New tools, whether the closed source or open source are going to be coming onto the market. And people are going to be trying these out to see if they work. What that means for us as Meltano, is we want to maintain a level of flexibility and composability with people’s data stack as they try out new tools. One of our goals for this year is to enable Meltano to be a full foundation for an entire data stack end to end, replicating your data, transforming it, testing it, visualizing it in a BI tool, and understanding more about the overall lineage and processes that are happening.
So, for us, we’ve a very ambitious roadmap, we want to integrate with more plugins, we want to increase the quality and usability of the tool Itself, but we want to enable data teams to try out different tooling, and integrate it into their data stack overall, while bringing those DataOps capabilities as the foundation. So, built into any Meltano project are going to be end-to-end testing, this multiple environments that enables you to deploy and configure application for staging production at wherever you need to deploy it, and also integrating with different version control vendors, so that you can have confidence in the state of your data stack. So, one last point I would like to make about Meltano is that, we see this as something that we are building with the community, and we are very receptive to feedback, and we want to understand how people are using this. We don’t have a waterfall methodology, we have regular meetings with the community to understand where their pain points are, what they care about, and we love collaborating with them on building the future of data tooling.
- Swapnil Bhartiya: Taylor, thank you so much for taking time out today. And of course, talking with the company, but more interestingly, you shared your insights on the whole data stack. I love those insights, thanks for those insights, and I look forward to our next discussion. Thank you.
Taylor Murphy: Thanks for having me.