Programming Linguistics #1 — First Things First

December 1, 2015 · Posted in Programming Linguistics 

Programming is something I’ve been doing since I was in elementary school. Maybe back then it wasn’t quite what it was today — but from a young age I’ve started experimenting with making computers do the things I’d like them to do. For one of the first things I’ve ever tried, I started out crafting batch files that used all kinds of weird hackery to accomplish simple tasks. I slowly moved into HTML, a little bit of JavaScript and after some time gradually learned to write PHP (I only learned about CSS later on because at the time, CSS didn’t even exist yet).

As the years passed I’ve learned plenty of other languages. I still do a lot of PHP, but I also like using C/C++ (which, despite their similarities, are also very different languages). But, as with most things, over time things have started to bother me about the tools I use — and invariably, sooner or later the thought comes up — “I could do better”.

If, just if, I ever were to design and build a programming language of my own. What would it look like? What concepts would it use? What would the syntax be like? And what about libraries and functions it provides? How would the tools for it be designed and built? Endless questions similar to these arise, and in this (what is hopefully going to be a) series of posts, I’m going to explore what my answers to those questions would be.

Let’s start by exploring some key design points (many of which will probably be explored in further detail in future posts).

Use Cases — Low Level vs. High Level

I don’t just program regular old computers. A field that I’m particularly interested in, is embedded software. Embedded software deals with the software that runs inside all kinds of devices. Pretty much everything with a power cord has a computer of some kind in it — no matter if it’s an MP3 player, a television, a coffee maker, or a washing machine — and they need to be programmed. It’s an interesting field that is much closer to the actual hardware than writing software for desktop computers is. Oftentimes, programs are written using some form of assembly language (a human-readable form of raw machine code, which is can be optimized to a very high degree, but it’s difficult to maintain code, and susceptible to errors), or some dialect of C (a language that is now more than four decades old). The tools available are more often than not of somewhat questionable quality, and user-friendliness is hard to find.

Those languages are used because they are close to the hardware. They can be compiled down to a point where they don’t necessarily need anything else to run — and not having to rely on external libraries and runtimes to be able to execute your code is vital when it’s going to be in an environment where there aren’t any things like that. That’s also why every major operating system and device driver is written in C.

Because of my fondness of low-level things like that, I want my language to be able to be stripped down to a point where it can easily be executed in an embedded environment, or be used as a language for OS development. At the same time, I also want it to be feature-rich, using modern programming techniques, with a wide variety of available libraries and routines to make high-level application development a breeze. I want to demonstrate that those two aren’t necessarily exclusive.

So, to summarize my first key point:

  • The language must be able to be used for low-level tasks (such as embedded software or OS development), while retaining a modern, feature-rich environment. Programs can be stripped of external dependencies and be completely self-sufficient, but rich and complete libraries are available to use for many different tasks.

Core Concepts

One thing that I believe is very important is to be platform-independent. Programs should be as portable as possible between different computer architectures. One aspect of this is that ‘traditional’ types like int or float won’t exist, because their size isn’t the same on every platform. Instead, primitive types explicitly specify the size they occupy (more akin to names like uint32) — providing the additional benefit that you are always aware of how much space something will occupy, and how large or small a stored number can be, reducing the risk of accidental integer overflows. Sizes that aren’t available natively on a given platform will be implemented through software, so that even the code written for an 8-bit microcontroller can use a 64-bit integer without any problems (other than a small performance hit).

On a higher level, this also includes independence from operating systems. Standard libraries include mechanisms for handling things like file systems and sockets in an OS-independent way, making it easier to write programs that will run on different systems.

Variables are strong-typed. A strong type system eliminates many potential errors, and can be compiled to work in limited environments far more easily than weak-typed languages (which are also inherently less efficient in terms of execution speed and memory consumption).

The language will be strongly object-oriented. There won’t be a technical reason why you couldn’t write a purely procedural program in it, but an object-oriented approach has many benefits. Classes feature a high degree of customizability — features like operator overloading, polymorphism, and multiple inheritance promote reusability and a syntax that makes sense when writing code using these classes. Template syntax, similar to C++, allows for generic programming.

Native support for exceptions, even on limited platforms, provide a convenient and powerful method for handling error conditions. An additional, special type of exception allows triggering warnings (conditions which are not necessarily errors, but should probably not occur) through the same syntax and mechanism without breaking the program flow away from where it is.

A complex problem in computer science is memory management. Keeping track of how memory is allocated, who owns what, what exists where in which virtual address space, and so on, can sometimes take up as much memory as the actual information being stored. In addition, the specific method of memory management used often greatly affects the performance of an application. While the language will feature built-in mechanisms for managing memory that should be ‘good enough’ for most cases, it’ll be possible to roll your own memory manager when a more optimized solution is required — or forfeit the use of memory management altogether, when running in a limited environment with code that can make do using only static memory allocation.

Modernization compared to many existing languages includes that source files will natively be encoded in UTF-8 and strings will support Unicode out-of-the-box. There will be features to make using libraries and managing dependencies easier, as well as simplifying the build process (the average C or C++-based project has a complicated build system that takes as much work to maintain as the code that it compiles, which in my eyes is wasted time and effort on the programmer’s part) and built-in standards for code documentation and testing.

In summary:

  • The language features object-oriented programming using a strong-typed system of variables. There is a strong emphasis on readable, reusable, easily maintained, well-documented, testable code.
  • Programs are platform- and operating system independent. Source code is compiled to machine code, so separate builds will be required for different platform/OS combinations, but writing code that’ll work for each version will be easy.
  • Things that should be supported, such as Unicode, work without any special effort on the programmer’s part. Doing things like rolling your own strings library just to get a usable set of features won’t be necessary.
  • Tools and build systems should be easy to use, even for complicated projects.

Coming Up Next…

In the next post in this series, I will be going into the basics of source files, project structures, and basic syntax elements.


Leave a Reply

A Soul Waking