Linus Torvalds Mini-Interview from 2021

Kristin Bell
Lf Asia, CC BY 3.0 https://creativecommons.org/licenses/by/3.0, via Wikimedia Commons

Back Story: Soooooo…I was taking an operating systems class back in the winter of 2021, and we were learning some stuff about Linux. Because I do random things all the time, when I found out that Linus Torvalds lived near Portland, Oregon (where I go to school) I thought “hey, why don’t I send him an email and see if he’d like to talk to our class! The worst that could happen would be that he wouldn’t reply.” This was during pandemic time, so our class was all online anyway. So, I found an email address online for Linus and thought “okay, I’ll try this email and see if it goes through!” Lo and behold, it went through and it got to him. Linus doesn’t like giving talks, but he agreed to answer some questions that my class posed for him. He was so generous with his answers. I have no way of thanking him. About a year or so later I asked him if it was okay to post his answers on my blog, but I didn’t get around to it until today. I figured that Jan. 1 was as good a time as any! So below you will find the mini-interview that Linus was very gracious to oblige. I think I emailed him around February 5, 2021 for this interview. I have added quote marks around the replies and the A1, A2, etc., but I tried to keep everything else as was written.

Q1)
What impact has RCU (read copy update) had on the scalability of Linux? Has it had other noticeable impacts?

A1)
So I’ll start from the other end of that question, and say that scalability in general is most definitely not about one single thing. It’s a lot of small details, and it’s often about getting quite complex locking (and memory ordering) right.

And in many ways, I think the single most important thing for scalability in Linux has ended up actually being the support infrastructure for finding problems that people have built up. So rather than all the details about when and how we do locking, it’s the verification system: notably “lockdep”, which finds locking violations where we might otherwise have deadlocks etc.

And I say that as somebody who wasn’t initially enthusiastic about it, and who wasn’t involved in the original design and implementation of it. But it has proven invaluable over the decades in actually making it possible for us to have complex locking and getting it right, because without that kind of verification and automation help we wouldn’t have been able to go as far as we have.

Note: it’s not _just_ lockdep, we have a lot of other tools, but lockdep is perhaps the most obvious one and the one that has made some of our complexity possible. Because scalability _is_ complicated. It took us years and years to go from the initial uni-processor model to the “big kernel lock” model, and then to individual locks and eventually to quite fine-grained locks.

With that out of the way, it is definitely true that in many cases we can go _beyond_ locking, and use actual lockfree algorithms, and RCU is by far the main one. We have a couple of other lockfree things (mainly one-way state transitions that can depend on memory ordering, but also statistics), but they are tiny details in comparison to all we do with RCU. And yes, RCU allows us to in many cases avoid the locking overhead entirely on the reader side, and that then helps get over another stepping stone in scalability. Even with “perfect” fine-grained scalability, the overhead of just the synchronization that a single local lock has is very noticeable, and will cause a lot of cacheline bouncing when you have loads that want to touch the same data structure on tens or hundreds of different CPU’s.

So the advantages of RCU is very noticeable in certain situations —one big example being pathname lookup. Absolutely _everything_ looks up filenames, and while you can (and we do) spread out locking so that each lock is individual to the file or directory, you end up having an enormous amount of different processes that touch the same directories: things like the root directory or your current working directory is involved in the starting point of just about every single pathname lookup, and so even “local” locks like that end up having the potential for being just overwhelmed. And then each CPU in the system potentially spends all its time just passing the cacheline with that lock around.

That’s where RCU comes in, and avoids having to pass cachelines around between each CPU – if most users are just looking things up, and reading the state, then with the proper algorithms and RCU you can keep all the data in cache lines that can be shared across all CPUs and there is no further serialization needed – until a writer comes in.

So RCU helps in a lot of scenarios, but it does require that “mostly read” behavior. That “many more reads than writes” is a very common situation, though, and we use RCU quite extensively in the kernel.

We end up using RCU all over the place, with lots of internal kernel data structures being protected by RCU for readers, and then a “normal” lock for writers. It’s gotten to be such a common pattern in the Linux kernel that most kernel developers no longer even really have to think about it very much. It does add complexity – you do have to be very careful on the writing side (and particularly when releasing data structures), and only certain data structures are amenable to that “look up without holding any locks” pattern, but it’s very powerful when it does happen.

RCU also has had an indirect impact on a related very subtle area: since it is our most common lock-free algorithm (by a _huge_ amount), it’s the one that has had almost all the discussion about different CPU memory orderings, and the most impact on our internal memory ordering model (and the tools that check it).

Basically, when you use locking, memory ordering doesn’t matter – the lock itself will handle any ordering requirements. Of course, the locking code itself can get it wrong (and that has happened), but the locking code is in the end _reasonably_ small, and tends to be about very controlled code sequences.

But there are two cases when locks break down: IO and lock-free programming. Generally, DMA and memory-mapped IO does not honor the same memory ordering rules that regular CPU memory accesses do, and so for IO you often end up needing special care wrt CPU memory ordering. That’s a whole separate can of worms.

And for lock-free algorithms, you obviously by definition cannot rely on the lock to handle memory ordering for you. So when you do RCU or other lock-free algorithms, you really expose yourself to all the odd corner cases. And as a result, RCU has ended up being basically the driving force for all our discussions about memory ordering across different CPU architectures.

Q2)
C and C++ are famous for their lack of type safety. Rust is a newish language designed for systems programming. What is your opinion of Rust? Do you envision that it will be used in Linux? The ownership model of Rust seems clunky but workable. Do you have an opinion on this?

A2)
So first off, I’d like to say that C can actually be fairly type safe, as long as you follow a few rules. The main one ends up being to not rely so much on the integer type system (because pure integer types end up having all the silent implicit type conversions), and to actively avoid casts.

In the kernel, we then also have a few tools to actively make the type system even stricter in kernel-specific ways, and we can do type-checking even on integer types by adding certain markings (most notably this helped us with CPU endianness problems, where you can mark an integer to be “32-bit little endian” etc, and have tooling to check that it is then used appropriately). We also use that same tool to distinguish user pointers from kernel pointers (and from IO space pointers) etc – things that normal C type checking doesn’t know about.

But if you use a lot of structured data – and we do – and pass pointers around and generally avoid “void *” interfaces, even basic C does a fair amount of typechecking. And it does it at the right time – when building, because the types are static. I personally detest weakly typed languages that then do run-time conversions (or checking) from one type to another, and in many ways C is more strongly typed than those kinds of languages.

Now, C then obviously is very weakly typed in that C _allows_ you to do absolutely anything you want – a simple cast will turn any type into any other type. That’s the kind of thing you do actually require for low-level system programming (not just kernels, but many things like allocation libraries and heap management really wants it).

But many of the “C is unsafe” opinions aren’t so much about the type system, but about the fact that C very inherently does pointer arithmetic, and that the language has almost no notion of an “array” outside of the purely syntactic ones. Arrays degrade to just pointers, and the two are basically 100% interchangeable. That’s a very specific kind of type weakness, and it’s one that is hugely powerful but also absolutely an enormous big stumbling point and results in it being truly trivial to have array overflows and all the issues that causes.

Again, the pointer arithmetic part of C is often the thing that makes it very powerful for many situations, but it’s inarguably the thing that really makes for easy-to-make mistakes. Arrays are a very common data structure, and when any array access is something you really have to think about or should check, it’s unquestionably a problem.

That said, for the core kernel, all these weaknesses are really things we use all the time, and something like a kernel virtual memory manager really fundamentally needs the ability to override any type and to do wild pointer arithmetic. So while pointer arithmetic is dangerous, it is also very very powerful, and it’s fundamentally the reason C is the go-to language for kernels and low-level libraries. Even when you don’t actually use C, you use languages that have that same model (ie C++ and Objective-C etc are not different in this regard).

HOWEVER.

What is a strength and a good thing when you do low-level memory management or similar, is not generally a good thing when you do “random code”. Not even in a kernel. So a language that has more strict rules, doesn’t allow arbitrary pointer arithmetic, and has to carry around more data in order to actually distinguish between any random pointer to one object, and a pointer to a _range_ of objects (ie an array) does have a lot of advantages.

So there is a lot of interest in not rewriting the kernel in Rust (or something like it, but Rust does seem to be the main contender), but to allow certain _parts_ of the kernel to be written in a language that actively makes it harder to make simple mistakes, and that still can interface with the rest of the kernel.

Device drivers in particular are a very tempting target for using a stricter language. Yes, device drivers do certain very odd things (any IO access is very special, and interrupts are a thing), but those odd things have to be abstracted away for other reasons anyway (“memory mapped IO” simply works very differently on different architectures, and it’s not just about the memory ordering, it’s about how the actual access is physically done). And once you do that, restricting what you can do in other ways is generally a very good thing.

And device drivers are roughly half of the kernel, so device drivers isn’t some small special case. If we started encouraging device drivers to be written in Rust, we might well get the best of both worlds: C for the parts that really fundamentally do odd things, and Rust for a large amount of code that really shouldn’t be doing all those subtle things in the first place.

That said, it’s a big undertaking. It’s a whole new set of tools, it’s a fundamental expansion of our infrastructure. And nobody wants to rewrite all existing drivers – it’s more about rewriting a few individual ones as a test-case, and encouraging new drivers to then be written using Rust.

And even that is non-trivial, because all our existing infrastructure- all the driver interfaces, all the helper functions, but even things like our object tool verification _after_ it has been compiled – are geared towards existing C compilers. So our existing IO accessor helper functions are inline functions in C header files. The kernel isn’t even really written in “C” – we have our own per-compiler header files, we know _which_ C compiler we use, and we have workarounds for specific _versions_ of the compiler because of known bugs or limitations.

So yes, Rust as part of the kernel will almost certainly happen, but it’s been discussed for a few years already, and it’s not there yet even now.

Q3)
It is reported that you created Linux for your Ph.D. project. Were you at all intimidated by the idea of creating an OS? Did you ever want to give up? How did you bust through the blockades? Why did you decide to create an OS in the first place?

A3)
Actually, Linux was never really part of my studies originally. I did do it _while_ I was at the University of Helsinki, but the only actual real overlap with my academic studies was that I wrote my Master’s thesis on the Linux memory management.

I’m actually not a PhD, I’m just a MSc in computer science (ok, so I am a PhD h.c. twice over, but that happened later _because_ of Linux, rather than due to my studies).

Linux started originally because I was just trying to learn about the low-level details of my new computer – all the fancy things that the 80386 could do, and I had never had before. I didn’t come from a PC background, just home computers, so to me a 386 PC/AT machine was a new home computer that acted very differently from the ones I had had before (originally a VIC-20 with a 6502 CPU, then a Sinclair QL with a 68008 CPU). I wrote boot floppies to learn how the BIOS worked, and how to get into protected mode and turn on paging, just because it was a new CPU for me.

It didn’t even start as an operating system. It literally was me going through the architecture manuals and wanting to learn about all of it, so the initial “task switching” wasn’t because an operating system runs multiple tasks, it was literally because the 80386 had special hardware support for that, so me writing code to switch between two tasks was to test out the hardware, not because I wanted to to run two processes. To test it, I just wrote two tasks that printed out

“AAAAA..” and “BBBBB..” respectively at a certain rate, and then my task switching code switched between them every second or so.

That “two processes that printed out text” then became “one process that read from the keyboard and wrote to the serial line” and “another process that read from the serial line and wrote to the screen”. That’s how a terminal emulator works, and for a while I would literally boot into my very special terminal emulator and read internet news from the university computers that way. This was back in the days when a 9600bps modem was “state of the art”.

By that point, I realized I should have some way to download stuff to the disk too, and that requires nor just a disk driver, but a filesystem too – and it became clear that what I really wanted to do was to write a full operating system.

And the rest is history.

Now, at this point, I should really give the Finnish education system a huge shout-out. Because while none of this was really directly about my studies, all of it was made possible because of how the Finnish education system works.

In particular, education in Finland is free. As in ENTIRELY free. Yeah, once you go to University, you have to pay for books. But even that is cheap, with the required literature generally being from the university press, and it’s basically printing costs. Some more advanced classes might have required reading that you really would want an outside book for, but it’s not the outrageously expensive kind.

And this is at the best university in Finland. And don’t take my word for it – look it up – Finland isn’t some third-world country when it comes to education. It’s consistently at or near the top in world rankings.

In fact, education isn’t just free, you actually used to get a stipend and the student loans were very cheap. So you’d get _paid_ to be educated. When you’re a small country like Finland, and you don’t have some incredible natural resources like oil or gold or diamonds, and you can’t rely on tourism (because honestly, the weather sucks), you’d better rely on an educated workforce.

So with that in mind, it took me something like eight years to get my MSc, and I ended up graduating with about $7k in student debt for living expenses. $7k for _eight_ years of living and getting an education. I was a TA for the last four years or so, so that is part of it, of course, but my point is that you guys are probably in a rather different situation at PSU. You presumably really want to get your undergraduate degree, and start working to pay off your student loans, and maybe get a MSc later? Or are studying _while_ working.

Me, I had fun at University, and Linux was a big part of my life, but I could afford for it being purely a hobby on the side, and I could afford to – by the last few years – having it be a pretty *big* hobby which slowed down my studies because I spent more time on my hobby than I did on the studies.

Plus I could do so _without_ spending a second thinking about “this should help me pay off my student debt”. I knew that just having a CS degree would allow me to do that after a couple of months (and did). I think one of the most common questions I got from US people back in the days was “why didn’t you make money on it”. That simply wasn’t an issue.

And the computer science department was very encouraging. They knew about my hobby, and they let me put my machine on the CS network (this was back in the days of thick ethernet, and when people were more naive about network security etc). They started using some PC’s running Linux as X terminals, and after I had ported Linux to an alpha, the CS department actually bought an alpha machine as one of the department servers, because the X terminals had worked out so well.

And this was all despite Linux very much *not* being part of the curriculum.

So Linux was never really part of my studies, but I do want to point out how it was made possible by a good education system, and how the CS department was very happy to use it and support my odd little hobby (of course, Linux was used at a number of universities by the mid 90’s, it’s not like University of Helsinki was unusual in that sense – having a cheap Unix lookalike that worked on cheap PC clones was a big deal in academia that needed that kind of thing).

Q4)
We are working with xv6, and even that seems very complex to us. How did you ever manage the complexity of Linux?

A4)
I’ve literally been working on this for over three decades by now. I got the PC that started my “I need to learn how this thing works” exploration in early January of 1991, so just over 30 years ago. The 30th anniversary of the initial release of Linux will be later this year.

And the complexity has been incremental. That first release all those years ago was just ten thousand lines of code, and was a barely working OS. And over the decades, there’s been thousands of people involved – with some of the current maintainers literally having become involved back in ’91 (ok, very few go _that_ far back, but still..)

I’m mostly a “technical lead” person these days, and don’t actually write a lot of code these days. I integrate code from other people, talk about issues over email, try to resolve conflicts (both technical and personal), and you should not think that I know every corner of the current 30 _million_ lines of code and documentation and scripts.

But after three decades of doing this, a lot of the complexity is just stuff that has become almost instinctive. When I read patches, it’s almost not even conscious any more: I just “see” what’s going on. You kind of internalize things eventually, and don’t have to even think about them.

Q5)
Do you think quantum computing will ever be viable for home computers? How difficult do you think it would be to make quantum processors work concurrently/in parallel with classical processors?

A5)
I’m not a believer when it comes to all the hype about quantum computers and how it’s the future of computing. I think it’s interesting research, but I don’t think it has any real relevance for general-purpose computing, and I don’t believe it ever will.

Don’t get me wrong – I think a lot of quantum effects are already very relevant for any day-to-day computing. Modern electronics is already at the point where quantum effects are very much part of the picture when it comes to the physical layer. Semiconductors very much _are_ a quantum phenomenon, and when you get into photonics etc it becomes even more obvious, and doing entanglement for communication security is not science fiction.

But the part where you actually start using entanglement for computation, I stop believing the hype. Yes, you can solve certain problems that way, and yes, depending on the problem and just how you look at it, we’ve already reached “quantum supremacy”.

And it’s all BS and almost entirely irrelevant for general-purpose computing. And more importantly, it will never even _become_ relevant for GP computing, simply because you can’t do it in something like a phone that you carry around. That’s simply not how the laws of physics work, and to actually keep things entangled long enough to use it for computation requires too much of a controlled environment for it to be relevant to normal situations.

So quantum computing will improve, and you’ll have more qubits, and who knows, maybe you’ll even have a _lot_ more cubits once some of the scaling and error handling issues get sorted out. Maybe photonics even makes it economical.

But look at what quantum computers can do: they are basically entirely about simulating physical problems (and generally about simulating physical problems in the quantum space). They are very much not about “Playing angry birds” on your cellphone, and they never will be.

So quantum computing can be relevant, but it’s not relevant in all that many areas, and it most certainly is not relevant as a _replacement_ for existing computing. It’s purely an augmentation, and honestly, it’s not even one that I think is all that interesting or wide-ranging. I think all the advances in neural networks are a lot more interesting area – again, that’s not something that will
_replace_ existing general-purpose computing, but it’s something that is very obviously already augmenting it, and unlike quantum computing you can do it on a microcontroller in fairly chaotic environments.

But I should perhaps make my own personal hang-ups explicit: I think that what makes modern computing so interesting is exactly that it’s _everywhere_, and I’m a huge believer in the mass market as being what makes technology relevant. So a setup that fundamentally requires cooling to close to absolute zero to avoid perturbations of the quantum state is perhaps exciting physics, and yes, it allows you to do things you might not be able to practically do with classical computing, but at the same time, because I feel it’s fundamentally not something you can scale out to the mass market, it’s not something I believe will really change the world.

And if I was a quantum physicist, I’d consider the above paragraph the uneducated blathering of an inferior mind.

Q6)
How do you vary or schedule your work to avoid physical fatigue? I currently work in the industry and take night classes at the university, and I find it hard to always will myself to do activities and stay physically busy after paying all of the day’s mental taxes.

A6)
Well, partly probably because of how Linux came to be, I’ve never considered it work, and I never “scheduled” it. It was a hobby, and it’s absolutely never been 9-5 for me.

Yes, for almost two decades by now, it’s been my “full-time work”. After University, I moved to the US, and I worked at a startup for several years, and Linux continued to be something that was “on the side”, even though – like at University – the company I worked at did in fact use Linux fairly extensively. But around 2001 or so, it became clear that I really needed to spend more time on Linux, and first I took a year off in order to be able to get the Linux 3.0 release out the door (supported by OSDL, which later became the Linux Foundation), and then when it became clear that people wanted that situation to continue, the “one year off” became “a few years off”, and now it’s basically been my full-time job for two decades.

But despite being a full-time job, it never became some kind of structured office job. For random reasons, for me weekends actually end up being the busiest work-days of the week. I think it’s partly just historical – they used to be when I was able to concentrate basically entirely on the kernel – but now it has become a matter of “a lot of people work on Linux commercially, send me their work by the end of the week, and so the weekend ends up being when I tend to do a lot of merges and then on Monday the cycle starts anew”.

We have the “bigger cycle” of releases roughly every 10 weeks or so, with the first two weeks being the “merge window” when I merge all the new features (as opposed to the above “weekly fixes” thing on weekends), and so during those two weeks I tend to be very busy indeed the whole week. But then it calms down as we go through our release candidates, and right now, for example, I’m at the end of that release cycle, and we _should_ have not a lot of urgent stuff going on. Knock wood.

But even the above sounds more structured than it really necessarily is. On a daily basis, if I am feeling under the weather, or if I
didn’t get enough sleep for whatever reason, I’ll just walk away from the computer. I don’t get migraines, but I get headaches occasionally, and when I feel one coming on, I’ll just go lie down during the middle of the day with my eye covers. And I don’t even think of that as “taking a break”. I just don’t have enough of a daily schedule for that to be an issue – and there is absolutely _zero_ point for me to work if I don’t feel up to it. I’d rather work three hours a day and feel like I was 100% on top of it, than feel like I had to be there 9-5.

So physical fatigue to me is just not an issue. If I feel it, I literally just walk away. If I get frustrated, and if I notice I start getting too pissy with people over email (and I don’t necessarily always notice in time – sorry), I walk away. Because trying to work through it would just make everything worse.

Of course, I say that, and I can do that, because I truly do like what I do. I still find it interesting. So I simply don’t have to force myself to work, because quite often, if I’m out doing something else, I’d rather be working. Still. After 30 years. I’d simply be *bored* not doing what I do.

But physical activities are a problem. I’ve never found them interesting, and I wonder about people who claim they get runners’ high etc. I spent 11 months in the Finnish army, so I’ve been forced to do physical things, and I have never ever EVER in my life gotten anything that could remotely be described as a “runners’ high”. My wife is into Tae Kwon Do, and she does biking (now, during the winter months, she spends 45 minutes a day on her bike exercises), and she _likes_ it. So I know people like that exist. But christ, no. Not me.

So being physically busy is hard, but I don’t blame my work (except in a “computers are a hell of a lot more interesting than jogging” way). I’ve tried to do “Beat Saber” in VR the last couple of years, and that actually has worked better than anything else I’ve ever done (ie “work up a sweat five days a week” kind of thing), but I have fallen off even that wagon, and in the last week I think I only did it one day.

So I’m 51, perfectly healthy (knock wood), but I’ve very much got a “dad bod”. I’m in shape – but that shape is called “pear”.

Q7)
Did you ever imagine that Linux would be so widely used? Did you envision that it would be able to play-nice with other interfaces like making something Windows compatible? In other words, has it surpassed the ideas you’ve had about its usefulness over the years?

A7)
Oh, it literally surpassed my expectations back in 92.

Of course, part of that is that I really didn’t have very high expectations. It was always just a private personal project, and making it available originally was more of a “Hey, look at what I did” than anything else.

And I think that actually helped a lot. *Because* I didn’t have a lot of lofty expectations, when people then started talking about what they’d want and need, I didn’t have a lot of preconceived notions of where I wanted to push things, so I was very open to peoples ideas. Because literally, by the end of 1991, I had made Linux already do everything that I had initially expected it to do.

So for 29 of the last 30 years, all the really big development has come from other people having more interesting uses for it than I did.

In fact, all I really ever do – *still* – is compile the kernel, and read and write email.

And I think that’s been an advantage to Linux, because it made me so open to working with a lot of other people. Yes, I may have strong opinions when it comes to code quality and maintenance (and I have *very* strong opinions when it comes to the “#1 rule of kernel development: don’t break existing users”), but at the same time I’ve not tried to really control the direction all the various involved people – and companies – have when it comes to Linux.

Part of that has been conscious too. I’ve never worked for a commercial Linux company, because I really wanted to be neutral and I wanted my Linux work to be very obviously _not_ tainted by any direct commercial interests. My opinions are my own technical opinions, and people can (and do) disagree and try to convince me I’m wrong. And I can be convinced – particularly by numbers. “Numbers talk, bullshit walks”. And because I’m neutral, and because I’m not beholden to any particular commercial entity, I may have my personal hang-ups, but I think I’ve been pretty good at incorporating other peoples and other companies work into Linux.

End result: Linux is used in a lot of areas that I personally have never really had a lot of interest in, and that’s ok. In fact, it’s more than ok. It’s how things _should_ work, and I actually think it’s one of the big strengths of Linux, how it’s so usable in so many different areas. A lot of companies and individuals have had very specific things that they want to improve in Linux, but _I_ have generally not cared all that much about any particular use. As long as it makes technical sense and is maintainable, it’s good.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s