Challenges and opportunities for open-source in Silicon — Part 2
In Part 1, we recognized the pervasiveness of open source in software, and its role in accelerating innovation over the past few decades. We also highlighted that silicon hardware faces unique challenges in terms of low-cost iterate-ability (something we will explore further in a future post).
In this post, we look at the Linux kernel to provide us some prerequisite insight and perspective before we delve into the pure silicon related aspects.
There is an OS kernel running on your phone/tablet/computer right now. You probably don’t directly interface to the kernel unless you’re doing systems programming. The OS kernel is low-level software which is responsible for:
- CPU scheduling and communication for all processes and their threads
- Virtual memory management
- File-system, storage, networking and device I/O
- System programming interface
- Security support (access control, crypto, integrity etc)
The diagram below shows at a very high-level the location of the OS kernel in relation to the hardware platform and the application software layers above it.
It should be obvious then, that an OS kernel is the “central” software that runs on any “computer” (embedded chip, laptop/PC, workstation or cloud server). It should also be evident, that this software is extremely critical and must be rock-solid and trusted.
To approximate the OS kernel as being hardware is actually not too far removed from reality. In fact, operations such as virtual memory management are so closely intertwined with the underlying CPU architecture (memory model, managing page tables, TLB flushes) that each CPU architecture requires its own kernel variant.
See here for a map-view of the Linux kernel to give you some idea of the various components and their interactions.
The Linux kernel — Pervasiveness
The Linux OS and thus its kernel is the most prevalent open-source kernel in use today. This spans anything from low-power embedded devices to super-computers. In fact, Linux is the most prevalent operating system for non-PC based applications.
“As of 2017, the Linux operating system runs 90 percent of the public cloud workload, has 62 percent of the embedded market share, and 99 percent of the supercomputer market share. It runs 82 percent of the world’s smartphones and nine of the top ten public clouds” — linuxfoundation.org 2017 report
The smartphone numbers for example are not surprising given that in 2020, 85 percent of the smartphone market is Android based and Android runs on a modified Linux kernel (source).
The Linux kernel — Anatomy of its success
The Linux kernel as of today has over 27 million lines of code (“27MLOC”). This is carefully crafted and tested code because it needs to be extremely reliable, secure, performant and scale efficiently (with increasing SMP-cores, processes and threads).
An OS kernel can fail in many ways. Not just the “blue screen” type of hard fail, but more silent and insidious failures such as security breaches or data corruption. This demonstrates that there is a lot of trust and confidence placed on this critical software component.
So how did Linux become such a successful, reliable and trusted OS for commercial and mobile-consumer applications?
Based on my observations and research regarding this question, the following themes and mechanisms emerged:
- Lack of deadlines — Economy of “good intent”
- Individual engineer excellence — a natural tendency towards a meritocracy
- Observability and testing — vast and distributed code inspection, testing and usage (eyeballs)
- Distributed and scalable development practices — intrinsically valuable source code quality and management practices
The first theme (“Lack of deadlines”), may at first appear alarming, especially if like me, you have spent your entire career in developing commercial products chasing deadlines — the notion of “we need this now” is likely part of your DNA. Note that despite this “economy“, the Linux kernel has a very predictable release schedule (every nine or ten weeks for a “major” release). The premise of this theme is that the Linux kernel developers are motivated by building the best solution rather than a schedule deadline (which is typically a function of time-to-market bound opportunity cost or planned project cost).
There is no debate, that at the silicon product level (chip-level), there is no way around hard schedules, but when you break things down to individual sub-components, it becomes more of a matter of managing expectations of schedule and functionality “intercept-points” (more on this in a future post).
The second point (“Individual engineer excellence”) appears to imply a rather harsh state of affairs. But it is in fact a natural tendency of human beings as a social and community-oriented species. Similar economies exist in the sport and entertainment worlds which are in fact “open-source” from a merit-attribution perspective. A top basketball player such as Michael Jordan or rock drummer Danny Carey (pictured above) are at the top of their game because of long term sustained excellence. Their skills and accomplishments are fully “observable” and so we want them to do the things they do. The open source development world is similarly transparent and places the responsibility of development on excellent engineers by observing and attributing merit.
This brings us to the third point (“Observability and testing”).
Observability is the mechanism by which the collective assigns consensus merit.
To go back to the analogy of sport, the more people that “observe” a given sport, the more “pure” the attribution of merit to the players, teams and coaches will be. In sport, the “measure” of merit is very clear (points, wins, entertainment value etc.). In software development this amounts to automated testing, code review, code analysis, directed testing and of course usage (user experience, performance, reliability, security).
One might argue that a closed source code-base will benefit just as much as an open source code-base from the perspective of usage-based testing, but this will always be limited by the set of users who posses the means to obtain a license.
When it comes to code reviews and directed testing, the open source variant will always be subject to more extensive scrutiny (the number and diversity of “eyeballs” is hard to match). And thus we can expect that
as the code base increases in contribution and popularity, code inspection and directed testing at the very least scales accordingly.
This is in sharp contrast to the reality of how much a single enterprise could possibly scale their testing even as revenues and profits grow.
There is also the benefit of compute scale for automated testing where open source benefits from the cumulative power of all enterprises which contribute to the project. It should be noted that the majority of contributors to the Linux kernel are in fact from the commercial sector (source).
The fourth point (“Distributed and scalable development practices”) is a bit less succinct, but critically important.
According to a 2017 survey, 70 percent of source-code control is performed with Git (source). Git was invented by Linus Torvalds (the inventor of the Linux kernel) for the purpose of distributed development source-code management. This demonstrates that the nature of distributed development actually creates highly innovative tools and practices.
The quality assurance of the Linux kernel is based on scalable and repeatable development practices (e.g. coding standards), continuous integration, test frameworks and regressable testscases. It should not be surprising that there exist collaborative open source projects such as LTP and CKI which focus on testing of the Linux kernel and distributions, and are developed and maintained by industry.
What does this have to do with Silicon again?
The Linux kernel represents one of the crowning successes of open source. Purveyors of silicon, should take note of these key points:
- The Linux kernel is a very low-level and “central” piece of software. Can you think of any silicon chip components which are “central”? Actually, there are more than the processing units you may be thinking of. Are “central” components more appropriate for open source development?
- Written in low-level C and assembly and interacting directly with hardware platform specifics, the Linux kernel is reliable, secure and performant. It is the one piece of software that must continue to behave when layers of application software above misbehave. In this sense it very much resembles a hardware design.
- The Linux Kernel is prevalent in all computing outside of personal computers and is thus “proven” in its quality and scalability, but it didn’t start life that way.
- The commercial sector is responsible for the majority of contribution and testing of the Linux kernel. This provides both checks and balances (diversity) and sharing of the burden of development costs.
- The key mechanisms by which the Linux kernel became reliable and trusted, hold valuable lessons in how to approach open source in relation to silicon development.
In future posts of this series, we will dig deeper into the last point and how these lessons translate to the silicon engineering realm. We’ll also start exploring concrete examples of “modern” open source hardware efforts. Lastly but not least, we will highlight the less obvious benefits for the industry and the humans who make it all happen.