PrestoCon has arrived and the focus of this year’s event is all about innovation, what Presto has accomplished, and how Presto serves as a platform for SQL analytics.
Dipti Borkar, Presto Foundation Community Team, is particularly excited about the C++ and native worker presentations. On that, Borkar said, “There’s a couple of sessions about how that is progressing and how the new Presto worker that is being rewritten in C++ from Java is really going to transform Presto into next-gen Presto. You’ve seen there’s a lot of talk today about databases and data lakes, and we believe that users are looking for an open-source option for open data lakes. And with this new work, it will really take Presto and interactive querying to the next level.”
As to the focus of the event, Tim Meehan, PrestoDB Technical Steering Committee Chair, turned his attention to, “some use cases around scaling with Presto on Spark and talks regarding Kafka at scale.”
Girish Baliga, Chair, Presto Foundation, Governing Board, took a different perspective and focused on Presto as a platform for SQL analytics in general. On that, Baliga said, “The focus is more on the platform aspects. So this touches on many things, starting all the way from the bottom layer, which is the storage layer.”
As far as how these companies are using Presto, Uber uses the platform for business reporting, analytics, data science, machine learning, and any kind of big data processing. Meta uses Presto for interactive analysis, batch workloads, and interactive flows. Finally, Ahana uses Presto as a managed service that runs in the cloud to leverage the power for data lakes.
The Presto community is unique in that they believe in community-driven open source and being a part of the ASF. What really excites Meehan about the community is, “how it’s grown outside of Meta, and although it was sort of developed at Meta, seeing, once again, all the creative ways that people end up using Presto.”
Swapnil Bhartiya: Hi, this is your host, Swapnil Bhartiya, and welcome to TFiR: Let’s Talk. PrestoCon will be hosted on December 9th. And today we have figures from the community to talk about the event and, of course, the Presto community. We have once again, Dipti Borkar, co-founder and CPO of Ahana and Chair of Presto Foundation. We have Tim Meehan, software engineer at Meta. And once again, Girish Baliga, senior engineering manager at Uber. Dipti, Tim, Girish, it’s great to have you on the show.
Tim Meehan: Thank you.
Girish Baliga: Thank you.
Dipti Borkar: Great to be here, Swapnil. Nice to meet you again.
Swapnil Bhartiya: Of course, everybody knows what Meta is, everybody knows who Uber is, everybody knows who Ahana are, but I just want to hear from yours perspective, from Presto’s perspective. Girish, tell us how Uber is leveraging Presto?
Girish Baliga: Yeah. Great to be back here, Swapnil. So Uber is, as you all know, a company that’s determined to help everyone in the world go anywhere and get anything. And Presto is one of the secret sauces behind making this happen. Presto is used very widely at Uber, it’s used for all manner of business processes, all the way from business reporting and analysis to data science, machine learning, and pretty much any kind of big data processing that we need to do.
Swapnil Bhartiya: Tim, tell us how Meta is… actually Presto and Meta, I know it’s a different relationship altogether, but tell us.
Tim Meehan: Yeah. So Meta developed Presto originally and it’s used, similar to Uber, quite extensively within Meta, and it’s used for lots of different use cases across the company, from interactive analytics… People love using Presto because they can very quickly get insights into data without having to wait unnecessary amounts of time. And they like to apply what they use interactively into things like batch workloads, interactive flows. So it’s used very extensively within Meta and we support some very large clusters within Meta.
Swapnil Bhartiya: Dipti, tell us what is Ahana doing with Presto?
Dipti Borkar: Sure. I mean, you heard from Girish how Presto is being used at Uber, and Tim, how it’s being used at Meta. Of course, these are planet scale, internet scale companies, but the power of Presto can be used by everyone. And that’s what Ahana is about, we have a managed service for Presto that runs in the cloud so that a two-person data platform team can leverage that power on data lakes and get the same insights that Meta and Uber get from their data, and be a data-driven company leveraging open, open source, open formats with Presto in the cloud.
Swapnil Bhartiya: Excellent. Now, PrestoCon is almost here. Dipti, if I ask you, what is going to be the focus of this event?
Dipti Borkar: Yeah. We are very excited to bring together PrestoCon to everyone in the community. This time around, there are a lot of new innovations and there’s a lot of different speakers. Of course, the Presto Foundation members are presenting as well. I’m particularly excited about the C++, the native worker presentations. There’s a couple of sessions about how that is progressing and how the new Presto worker that is being rewritten in C++ from Java is really going to transform Presto into next gen Presto. You’ve seen there’s a lot of talk today, these days, about databases and data lakes, and we believe that users are looking for an open source option for open data lakes. And with this new work, it will really take Presto and interactive querying to the next level.
Swapnil Bhartiya: Tim, if I ask you from your perspective, what is going to be, once again, the focus of this year’s… on this PrestoCon?
Tim Meehan: Yeah, absolutely. So at PrestoCon, we kind of bring together a couple of different aspects. The first is technically what have we accomplished? What are we looking forward to? And so there’s some pretty exciting things that we’ve both accomplished since our last PrestoCon. And also, some really exciting technologies that we’re looking forward to sharing more broadly with the overall community, including C++ worker as Dipti mentioned. We’re extremely excited sharing more about that. Some use cases around scaling with Presto on Spark, very excited sharing more about that. And also, some talks regarding Kafka and how [inaudible 00:05:01] uses Kafka at scale. These are some very interesting technical topics that I’m really looking forward to at PrestoCon.
Swapnil Bhartiya: Since we are talking about, of course, planet scale or solar system scale company, so of course, Girish, tell me what you folks will be presenting at the event and how are you looking at this PrestoCon?
Girish Baliga: Sure. So we’ve been with Meta for a long time on this Presto journey. And I think over the past few years, we’ve worked on Presto as an engine, Presto as an open source engine, and so forth. But I think this year is really the year where we start to rethink Presto and look at Presto more as a platform for SQL analytics in general, right? So the focus is more on the platform aspects. So this touches on many things, so starting all the way from the bottom layer, which is the storage layer. We support a couple of different format, so Meta does ORC, we do Parquet. So we’re going to talk about some contributions we made there. Then there is the security aspect, which is becoming very important, and there’s a couple of exciting talks about Apache Ranger. And then there is how to use Presto not just for SQL analytics, but for developing general purpose data analytics.
So to give you an example, we are talking about how we have deployed the Kafka connector for Presto at Uber. So this is not just for doing analysis, this is for debugging production use cases, right? So what went wrong in my Kafka message? What’s going on in my topic? What happened in this particular event? That kind of debugging. Earlier it used to be command line tools, it used to be a whole bunch of logs analysis. Now, it’s just some Presto queries. And so I think these three examples, among many others, brings very nicely the idea of Presto as a platform, which I think is now coming into focus, and what we are focusing on for PrestoCon for this year.
Swapnil Bhartiya: Excellent. Dipti, this is going to be a bit tricky question for you to answer. If I ask, of course, this is kind of, you folks know, this is like a baby, that event there… But if I ask what are the… If I have to ask what are the things that you are going to be excited about? Of course, you folks mentioned some of the talks and discussion that we should look forward to. But if I ask, hey, this is one of the things you are excited about, or the other thing that you would recommend people to go and watch?
Dipti Borkar: Yeah, absolutely. Tim and Girish have covered a lot of great sessions that’ll be part of it. I think the top three, from my perspective, from a community perspective, that have been asked about, as we talked about, the C++ work is a fantastic progress, and I would recommend people to attend that. In addition, there is a lot of work that’s going on in the cloud, and AWS will be presenting with Ahana on Lake Formation and how Presto integrates with Lake Formation, that would be one of the interesting sessions to see how governance can be done on open data lakes, which is security on data lakes is becoming more and more important.
And then along the same lines, the Apache Ranger session will be a great session for authorization of open data lakes. So those two in the security area will be really great.
I can’t finish without saying Bytedance, so Bytedance is presenting. And we have many new users of Presto this year that are presenting… talk about internet scale and planet scale. We’re excited about hearing about how they use Presto at large scale. They are transitioning from older technologies to Presto and have contributed back to the community as well. That’s another session that I would call out.
Swapnil Bhartiya: Since you brought up the point of community, I also want to learn, ever since the Presto community kind of emerged, today what does the community look like? Of course, we have three major contributor in the project, but talk about how diverse the community is and how it looks like. And once again, since you also brought up the point of governance, but that was about data lakes, but let’s just talk about this aspect as well.
Dipti Borkar: We are very excited about the community. The foundation itself has grown quite a bit. More recently, HPE joined the foundation. We have many, many sponsors for the conference, a lot more than we’ve had in the past, so we’re excited about that. And in terms of the contributions, we see contributions coming in widely as well from a range of different companies, from individuals. And the kind of capabilities that are being added on, like we talked about some of the connectors, SAP is building a new connector for one of their data sources and they will be presenting about it. And so we are seeing a lot more contributions from a range of different companies, individual. And the community has grown quite a bit.
In terms of governance, absolutely, we really believe in community-driven open source. And by that we mean, Linux foundation, it’s the golden standard for open source, or being a part of the ASF. So these are very important aspects as enterprises bet on technologies for their next generation of data analytics. So it’s great to have Presto as a part of this open governance that makes it open. Tim is chair of our technical steering committee, and everyone can attend the meetings, come in and contribute, listen to what’s going on. Girish is Chair of our governing board, and that is transparent and open to members as well. And so we are very excited about this approach. We’ve seen it flourish, and looking forward to growing it next year.
Swapnil Bhartiya: Tim, please, from your perspective, because it originated at Meta, or back in those days, Facebook, how do you see the committee has grown? What are the things that you are excited about when you look at this community?
Tim Meehan: I think what excites me most about the community is how it’s grown outside of Meta. And although it was sort of developed at Meta, seeing, once again, all the creative ways that people end up using Presto, Girish touched on this, when you have a technology like Presto, it’s so general and so sort of universally applicable, it’s very interesting to see other companies and other people join the community for certain particular needs, for problems that they’re solving. And so as chair of the technical steering committee, I am doing my best to make it so that people who have these new use cases, or have these sort of things that they want to bring to the table, that they have an opportunity and that they have a very clear and easy means to bring it to Presto.
Swapnil Bhartiya: Girish, if I ask you, of course, not only from Uber’s perspective, but from Presto community or foundation’s perspective, how do you focus… As she mentioned, new members keep joining in… What is driving the adoption of Presto in general? Because when we look at this company, these are, once again, planet scale companies, and how do you enable them or encourage them to not only become part of the community as well, but use it as they please to use it, and also bring their changes back to the community so that it makes better for everyone else?
Girish Baliga: Absolutely. I think one of the very key aspects of the Presto Foundation, and in general the parent Linux foundation is this two tier structure, right? So we have a separation of the technical and the board of governance. So what that gives us is the ability for our technical contributors to work and focus directly on the technical aspects of the project. So if you want to add a connector, if you want to contribute some code back, if you want to do a bug fix, you’re welcome. And you can just engage at that level. But we also encourage and have a forum for the corporate sponsors and participants to come in and contribute. So they contribute through foundation dues, which are actually not that much. If you consider the foundation dues, it’s more of an ability for them to come in and participate as a corporate institution.
And the governance structure here allows folks to come in and participate at different levels, whatever they’re comfortable with, and there is no obligation to do one or the other. So you can come in and just be a technical contributor, you can join the board by making an actual contribution and be part of the governance structure. And with this, we get a foundation budget with which we can do various things like sponsor PrestoCon and other activities that we do.
So I think this kind of an engagement structure has been very successful as it opens up not only for the technical, but also the corporate folks. And over the past couple of years, I’ve been very closely involved with getting new board members. And I’ve seen that this makes a lot of sense for folks. As Dipti was mentioning, we’ve had a whole bunch of new members. And this is great, they find that they can engage with us both technically as well as corporate entities, so.
Swapnil Bhartiya: Girish, Tim, Dipti, thank you so much for taking time out today and not only to share, of course, your own journeys with Presto, but also talk about PrestoCon and the things that we should be looking forward to. So thanks for your time today, and I look forward to our next conversion. Thank you.
Dipti Borkar: Thank you. And if you haven’t registered, now is the time to register for PrestoCon. Looking forward to seeing everyone on December 9th. Take care. Bye-bye.
Girish Baliga: See you all at PrestoCon, folks.
Tim Meehan: Bye.