Isaac Hung

Why do we need programming languages?

December 24, 2022


Programming languages have existed for almost the entire history of computing. Wikipedia has a list of over 700 different programming languages, including currently used and historically used languages, each with its own unique purpose and style. Needless to say, the field of programming languages has been undergoing constant evolution over the past few decades.

Most programmers can’t imagine writing code without a programming language. But computers have no way of understanding directly the source code humans write, and as such a translator is necessary for turning source code into something the CPU can interpret and run. Building this translator (typically an interpreter or compiler) is a non-trivial task, and often the use of a programming language incurs some form of runtime overhead, be that a VM or garbage collector.

So, why do we need programming languages?

Abstraction

Abstraction is the ability to program in concepts and ideas, rather than implementation details. In other words, abstraction is hiding the irrelevant details and information and instead focusing only on what matters.

Languages like assembly and C are generally considered to be low-level languages, because they don’t abstract away most of the details, for example, memory management, or in the case of assembly, explicitly loading and storing data. On the other hand, languages like Python and Haskell are considered to be high-level, as they provide many abstractions to hide away details.

For example, Python and Haskell have automatic memory management, through means such as tracing garbage collection and reference counting. Python provides facilities for object-oriented programming, where logic is abstracted away into objects, and Haskell abstracts over function arity with currying. On the other side of the spectrum, a language like C expects the programmer to allocate and free heap memory by themselves, leaving much more control to the programmer and opening up optimization opportunities.

Generally, abstraction is considered to be an advantage for most programs. However, in some cases — especially systems programming — a programmer may need to reach for lower-level features and languages in order to directly manipulate the computer’s resources. This may be for optimization purposes, or for complete control over hardware, which is a common need in embedded programming. On the flip side, it is easy for low-level languages to feel overwhelming and tough for newer programmers, and minor mistakes can lead to devastating bugs and security vulnerabilities.

While many languages abstract away lower-level implementation details, they still offer “escape hatches” to the programmer. For example, in Python, you can actually turn off the garbage collector:

import gc
gc.disable()

Use of such features is uncommon, but the option is provided to the programmer in case they need it.

Portability

Back in the early days of computing, programmers wrote their code on punch cards, then machine code, then eventually assembly language. In addition to being extremely tedious and error-prone, the programs were tied to the computer architecture they were programmed for, and couldn’t be executed on other architectures.

Nowadays, we have even more CPU architectures, operating systems, and are running our code on mobile devices, IoT devices and embedded systems. The development of high-level languages has also given us the ability to port our code almost everywhere.

Some languages that compile to native code, for example C/C++ have compiler backends that support a multitude of architectures, while others like Java run “bytecode” (a made-up instruction set) on a virtual machine (the JVM in this case) that can be ported to any device. Other ways people have found to run code anywhere include “transpiling” (compiling to another language’s source code, e.g. Elm compiling to JavaScript), or using LLVM as a compiler backend. Some modern native code compilers have support for over a hundred “targets” (CPU architectures or operating systems).

Analysis and optimization

Writing programs at a higher abstraction level opens up new possibilities. When translating the source code to an executable file, many transformations can be done to the source code before it is converted to an executable file. To the programmer, the compiler is essentially a black box — what it does to your code does not matter, as long as it does exactly what you tell it to.

The most obvious benefit of this is optimization. By applying certain transformations to the code during the compilation process, we can speed up the generated code, in some cases by a substantial amount. For example, division instructions are on the expensive side, so we might replace a division by 2 with a binary right shift instead (this optimization is called strength reduction). This naive optimization might not always produce the best performance, but it serves as a simple example. Some other ways we might boost the performance of our code is by performing constant folding, where we evaluate constants at compile time to reduce the work done at runtime, or by eliminating dead code that never runs or sharing common code between different computations. Modern compilers might try to exploit the parallel nature of today’s processors and run independent instructions in parallel to increase throughput. In fact, speed isn’t the only factor that can be optimized for; compilers sometimes perform optimizations to reduce the size of the output binary as well, making it easier to run the generated program on an embedded device like an ESP32.

Safety

Moreover, in addition to optimization, most compilers perform some sort of safety analysis on the code. This can range from the basic semantic analysis and sanity checks that practically all compilers perform, to advanced data flow analysis techniques used to optimize memory management. Some languages feature static type systems, where the types of all values are checked at compile time to ensure the program will run properly, others like Haskell analyse the side effects that functions may have, while the Rust compiler checks that programs comply to a set of ownership and borrowing rules in order ensure memory safety and prevent data races.

These safety guarantees are invaluable for programming at scale, where codebases may contain millions of lines of code, written by multiple teams of engineers. By letting compilers verify that our code does what we intend it to do, we prevent many bugs from the beginning, for example segmentation faults and program crashes. Compilers can literally save the time and money of developers and companies.

The downsides

While the idea of programming without a proper programming language (i.e. assembly or machine code) might sound insane today, it wasn’t uncommon for software to be written in this way. Early operating system kernels, video games (most notably, RollerCoaster Tycoon), and compilers were written in assembly. Nowadays, it is still commonly used for device drivers, embedded systems, kernels and education.

High-level programming languages do have their downsides. Often high-level source code isn’t optimal, either for execution speed or generated code size. For example, Python is generally much slower than most other programming languages, due to its highly dynamic and interpreted nature. Similar effects can be observed in other languages such as Java and C#, with their virtual machines adding an extra layer of overhead.

Some programming languages are standardized, which has potential to cause all sorts of problems. The typical example here is C++ and its standard library, which has evolved over time into the bloated beast it is today. All serious programming languages have to maintain some form of backwards compatibility in order to prevent breaking existing codebases with language updates. The challenge is “stability without stagnation” — keeping the language stable enough and compatible with existing code, but not sacrificing on important innovations, improvements and security updates.

Finally, all compiled or translated languages have some form of build process, which takes time. Some languages like Go focus on shortening compilation times, while others like Rust suffer from long build times due to their nature.

Conclusion

From the reasons stated above, it is evident that programming languages play a key role in modern programming. While it has already come a long way since the ages of punch cards and machine code, I believe that the field will continue to bring improvements to programming and software in general for the forseeable future, evolving alongside changing requirements and demands for software development.