DevelopersDirk & Swap: Conversations on Open SourceFeaturedOpen Source

Understanding Open Source Supply Chain | Dirk Hohndel

0

Modern software comprises many components that come from open source projects—these could be libraries, frameworks, toolkits or software. Just the way a car maker knows the source of each component used in their car, companies building software products and services must also have a full inventory or bill of materials (BOM) of the code they are using in their own products. It’s very important to understand the open source supply chain. But why is it important? What are the various risks to which companies are exposed if they don’t know what code is flowing through their products and services? How to track the software supply chain? 

Guest: Dirk Hohndel
Company: VMware
Show: Conversations On Open Source

Here is an unedited transcript of the interview.

Swapnil Bhartiya: Welcome to another episode of Dirk and Swap: Conversations on open source. And once again, we have with us today, Dirk Hohndel, VP and chief open source officer at VMware. Dirk,  it’s good to have you on the show.

Dirk Hohndel: Thanks, Swap. It’s good to be here.

Swapnil Bhartiya: What exactly is the software supply chain, especially from the point of view of open source, because no matter if you are building a product or a project, you are bringing a lot of components that come from all over the place. You have no idea what code is in there, what kind of licenses there are. It can create a lot of issues, not only in terms of security, but also you may not comply with the license. So let’s understand basically, what is a software supply chain or bill of material.

Dirk Hohndel: So, I won’t give you a textbook academic definition. I will try to explain it a little more in more practical terms. If you build a software application, whether you ship it to a user or whether you run it as a service over the web, one of the key questions you have to ask yourself is, “Well, so which components go into this?”

Dirk Hohndel: I always compare this to building a car because that seems far more tangible and easier to understand. You want to understand every screw, every piece of rubber, every piece of leather, everything that goes into the car. Who made it? Who assembled it? Who put it there? Where did it come from? What’s the genesis of this?

If we apply this to software for each component that went into your application, you want to understand where the sources are, who built these sources into a binary. And how can you track this back to its origin, so that you have all the correct licenses? Are you following in the context of an open source license? The terms and conditions by making sources available, by giving credit to the authors, or whatever else might be the implications of an open source license.

But then also, you want to understand actually, do the sources that I believe are at the core of a specific module? Are those the sources that were used to build, or is the binary built on something else? Are there additional parts in the binary? Say a backdoor or a tracking mechanism, or whatever. There are many, many options to consider.

When you think about your software supply chain, you really start from the end point, from where you are today, you’re building this application. And you want to go all the way back for every single individual component recursively. So for the components of the component, all the way back down to the sources, to the origin of every single piece that comes together to create your product.

If that sounds tedious to you, you’re correct. This is an amazing amount of work. This is really complex to get right. And that’s why there is a lot of tooling to help with that. And that’s why there are quite a few organizations out there that try to create more structure around this.

Swapnil Bhartiya: Why should you know the whole supply chain? Where is your code coming from? You give some very good examples of how it could be a violation. You should also give attribution to the authors, but there may be something wrong with that project and you want to move. Just talk about, either you are just a user, or you are a vendor who’s repackaging it. Yes, it doesn’t matter who you are, why should you are about the supply chain?

Dirk Hohndel: I typically separate this out in a few different buckets that I think about. So, one of those buckets is the question of security. So, do you know, actually, what is truly running as part of this component? Do you know if there are security vulnerabilities? If there are intentional back doors, if they are spyware, malware, whatever?

Very closely related to that, of course, privacy concerns. If you look at the European GDPR. But overall, all over the world there are privacy rules that ensure that certain types of tracking are disclosed to the user. Are you sure that none of your components have a call home feature that you aren’t aware of?

These are the concerns that go into, does the software do what you think it does and nothing else? And then on the other hand, other questions that are about legality. So can you actually use the software or can you distribute the software? If you think about this from the point of view of the software maker.

And if you are a user, you care about the fact, whether the people who wrote components of this were actually correctly paid, or recognized, or attributed. Or if the license rules that they have put on their software were followed, because you want to make sure that the software that you use is actually legal in the way it has been delivered to you. The legal aspects of all this and the security, privacy technology aspects of all this really come together here.

One area that people often don’t want to think about in the context of software supply chain, is also this fitness for purpose. And that’s harder to capture because you want to ask yourself, let’s say you’re running a nuclear power plant, and you have to respond to specific signals within a certain amount of time.

You want to make sure that the software that you’re using is capable of doing this. Or let’s say you are running a global telecommunications network that is getting millions and tens of millions messages a second. You want to make sure that the software that you’re running can handle that volume of data.

There is this other aspect of wanting to understand if all of the components that are used are actually tested for the scale or the scope of the use that you have in mind. That typically is something that is not taught much, or thought much, or talked much about in the context of software supply chain. But that always plays into every decision about the components that you create. But more typically, we talk about security, privacy and the legal aspects.

When you do talk about trust, you give so many examples. So it becomes confusing. Are you trusting the project name? Are you trusting the people? But at the same time, then you mentioned people. You are trusting just one person. At the same time, if you’re trusting a project, that project may change hands also.

Swapnil Bhartiya: You can trust the community, but there is no single community. There are communities. Sometimes the product gets forked. The main project got far because of whatever reason. And all original creators, they moved to create something else. So, who are you really trusting?

Dirk Hohndel: Trust is a fascinating question. Because the way we apply trust as humans is very illogical and emotional. And if you try to take a step back and you try to ask yourself, “Who are the decision-makers that influence that work product that I’m getting?” Now you are suddenly changing this from a person to person relationship, but more to a trust instructor.

Earlier, I said, for example, taking the Debian distribution, feels reasonably safe. Why is that? Because there are a lot of people who have a long track record of having the best of the users in mind. And because there are so many senior people involved, it actually becomes pretty hard to sneak things past them because so many different people look at the different components from different angles. And they have this trust network that creates their hierarchy there, the way this project works within itself.

Applying trust to healthy, established communities like this is reasonably safe in my mind. On the flip side, if you have a project that is controlled by a single person or a single entity, where you don’t have any intermediaries in between that do the review for you, where you can trust the intermediaries, like the Debian project, now you need to far better, who is this entity? Who is this company? Who is this person?

And very often I find open source projects that have somewhat outlandish claims on functionality. And then you start clicking and it’s, “Oh, it’s this hacker group in Russia.” Not to have any blatant prejudices against certain nation states.

But maybe you don’t want to include that code without the little bit of digging deeper and understanding what it does. Because we do see a lot of nation state level attacks from China, from Russia, from other countries, on this software supply chain, specifically with the intent to undermine the security of our infrastructure.

Trust is always in the context of who sits between the original creator and you, who might be doing review steps and whom you can trust as a structure, as an organization. And the less of these layers you have, the fewer you have, the more you need to engage yourself and make sure that you understand what code you’re using, and what the implications are for your project or product.

Swapnil Bhartiya: Now, when they do look at this whole conversation, one thing is pretty clear that keeping an eye on your supply chain is not an easy task. It is complicated. Plus, depending on what your nature is, you may not have all the resources to keep track of everything. Let’s look at the solution side. Earlier, when you started, you mentioned some projects. What kind of help is available there? Not only to big companies, which actually can afford all help, but a lot of other individual developers also. So that they can keep a track of their software supply chain, without having to worry about all the complexities that come with it.

Dirk Hohndel: That is such a complicated question, Swap, because there are, of course, a ton of companies who will happily sell you this as a solution. From the largest suffer companies to a number of startups and many, many consultants who will help you get this right.

And I admire all of these companies. I have no intention of dissing anybody for their efforts there. Because I think all of these are well-intentioned and all of these are helping. Depending on the profile of the user, of the customer, many of these solutions are really what you should use to solve a problem that you don’t want to dig into.

But on the flip side, if you are interested in taking control of this for yourself, there are also a number of efforts that make this really easy. I’ve mentioned Debian a few times. The Debian community has done phenomenal work in tracking the path that every piece of code takes into a distribution. And to make sure you understand where everything comes from. That’s a great way to start.

Another one is the Yocto project. If you look more at the embedded side of things, the Yocto project has significant extensions that allow you to track in detail every component, the sources, the exact version, the Git chart of every single component that goes into a binary build.

Full disclosure, obviously. I was one of the people who kicked off this project more than 10 years ago. But this work came long after I was actively involved. And this is really fascinating work that allows you to have complete control of every line of code that goes into a project.

Now, if we go up a little bit from the individual built projects for distribution building, and we look at things like the SPDX effort that is used to identify components, identify their licenses. And make it easy to create a full tree of all of the components that went into your application.

There is the automated compliance tooling project in the Linux Foundation ACT that SPDX is part of. There is a lot of tooling work that goes on. The Tern project that VMware donated to the ACT is a project to disassemble container images and help you understand what’s inside this container image.

There are a lot of different projects targeting different parts of the creation process to help you track where your software comes from and what the corresponding sources might be. Some of these are focused on licensing. Some of these are focused on compliance.

One of the things that’s really important is that whenever you make decisions about the components that you’re willing to take in a product, that you keep both of these aspects in mind. Because if you run a scanner that verifies that all the licenses are fine, everything is under license, you’re happy to include, that doesn’t mean that all the components are secure.

On the flip side, if you’re running a security scanner that says, “Oh, all of these components are safe. There are no doors. There is no malware included,” that doesn’t mean that you aren’t including something under a license that, for whatever reasons, you can’t use. So, you always have to keep both of them in mind because both aspects of your supply chain are really critical for a successful outcome.

Swapnil Bhartiya: I always ask you to give me some playbooks. What should be the starting points? Because you mentioned there are a lot of projects: SPDX and other products. They are doing a lot of things. But once again, whether you are a developer, a vendor, a user, what are the first steps that one should take to start keeping a track of their supply chain and then they evolve over time? But this is a complex problem to be solved.

Dirk Hohndel: Yeah. I think this is a very complex topic that way exceeds what we can do here in a conversation. It really starts with understanding what is your development model? Where do your components come from? And start there.

Am I importing large binaries through container images, through virtual machine appliances, through distributions and libraries that I’m using? Or am I building from source myself? How is this all put together? Who controls the assembly process?

But very quickly, you have to go to your engineers and you have to talk to the engineers about how they are creating your final executable. And that really is the step where there is no shortcut. You can’t just magically wave your wand and say, “Oh, and now everything is fine.”

You have to understand the process, how your applications are created. And this is something that, especially as you get started, is really tedious and painful. But it’s also incredibly important.

Swapnil Bhartiya: Dirk, once again, thank you for talking about… I mean, as you know, this is a topic which is really close to our heart because we bring it in almost every discussion, software supply chain. Thanks for this discussion. And I look forward to our next topic. So thank you for your time today.

Dirk Hohndel: Thank you for having me, Swap.