Why I am building e8vm

2015 9/13

When I was in graduate school working on my Phd, I often prefer writing systems from scratch. Some people often think I am crazy, or at least doing it in the wrong way. But from my personal perspective, I have a simple reason: when doing work, I prefer using software that I can understand, and writing one by myself often guarantees that I understand every detail of it.

I also tried using other people’s software. I also have seen others using other people’s software. It is often easy to get something running, but hard to make fundamental enhancements later. Freedom is lost. Innovation stops. What left is endless layering.

However, building from scratch does not solve the problem. Often times, the builder is the only one that really understands the code; others are still foreigners. The project effectively dies when my life moved on and I had no interests on further maintaining it. The code might still compile and run, but it becomes hard to make fundamental enhancements for other people that have interests to take it on. I see the cycle that me becomes the others.

I want to stop this cycle.

Code is the formal representation of a piece of executable logic. It should live independent of a maintainer, like literature. A piece of code should be understandable just by reading it.

Code understandability is even more important than its performance, because time beats everything. In a Chinese Kungfu novel, there was a man that seeks to revenge his enemies for his family. He hid in a cave for decades and eventually trained himself to be the best Kungfu master in the world, but when he came out of the cave several decades later, all his enemies are dead already. Time will is toughest killer, and it is the same for code. Human understandable code is immortal, and immortal code will win everything eventually.

However, I find code today is often extremely hard to read. Randomly pick a Github repo, say some that has at least several thousand lines of code, could you even precisely estimate how long time would it take to fully understand it? The state of “fully understood” might be a vague heuristic, so let me ask more specifically: could you estimate how long time would it take for you to rewrite it in another programming language by hand?

I once wanted to translate v86 to Go language, but I didn’t even know where to start. Blindly translating the code line by line, file by file does not really work. It would be like writing thousands of lines of code but without a single successful compile. A probably more feasible way is to track the git commit logs and reproduce the development history but in another language. However, that means I have to go through all the failed trails and attempts in the history as well, which would be not so much better than building from scratch by myself.

How about documentation then? Does code documentation make code more understandable?

It probably does, but developers do not like writing documentation, and there are perfectly good reasons for it. First, the working developers do not need documentation. When they are writing the code, the design is mostly in their brains already. Second, things keep changing, and documentation easily gets out of date. The argument that one would need the docs in the future is hence not true for a general piece of doc, because they are either out-of-date and useless, or replaced by another piece that is more up-to-date. So why write words down if you are mostly likely to delete them later? Third, good written code pieces often document themselves. Packages, files and symbols all have readable names in English. Language keywords are also readable English words. So why restate what the code does in plain English again, and often in a less precise way?

There should be a better way.

So I ask myself, why code is often not readable in its plain form?

For this question, people have many different answers and even proposed many solutions. Most of the solutions are technical ones, like design a new language, or a new programming paradigm, or a new heuristic for readability or complexity.

But I think there is a more fundamental reason: because people do not read code in its plain form.

We read books, we read articles, we read blogs, we read tweets, we read paintings, we read film scripts, we read recipes. We do not read code. Even professional programmers in the software industry are mostly feared or at least reluctant to read code from a large project, not saying learners, beginners, or non-programmers.

I might have not learned how to write good articles during my 5.5 years of Phd study — writing is hard — but I did learn how to improve one’s writing skills: you must ask others to read your written words for feedbacks. Let the readers tell you which part they do not understand, at which part they start to get lost. It is the only way to improve.

I think code is no different.

I want to build a website where people can publish, subscribe and read code, like reading a tweet, a blog post, or a story.

“How about github?” One might ask.

That’s a good question. My opinions: Github is for sharing, not reading. Without reading, readability won’t improve. In my opinions, a code reading website needs to satisfy these requirements to enable and encourage reading:

  • It only uses programming languages that are designed to be easily readable.
  • It automatically breaks big projects into small pieces, so that a reader does not need to dive deep and read everything before he can understand something.
  • It automatically provides a reading order of the project like a table of contents.
  • It compiles and runs the code inside the browser with just one click. Code is after all living animals.

This code reading website is what I want to build, and E8VM is the core foundation of this website. It is written in readable code for readable code — at least I wish it is readable. Its readability will be tested when the website launches and people start reading it.

This is how history will be made. Immortal code will start here.

Do you want to be a part of it?