Programming Linguistics #4 — Back to Class

January 19, 2016 · Posted in Programming Linguistics 

In part 3 of this series, I’ve talked about what namespaces and functions would look like in my hypothethical programming language. This part will start on the subjects of composite types and run-time type information.

Composite types are an invaluable part of modern programming languages. They allow us to take a collection of related variables and treat them as a single unit, which is great, because not many things can in practice be represented as a single numeric value.

In my language, a distinction would be made between two major variants of composite types: structures and classes. Even though they would be largely the same in terms of how the compiler treats them, there is a semantic difference between the two. Classes are collections of variables (properties) and functions (methods), that support many advanced features like inheritance, overloading, visibility, and so on. Data structures, on the other hand, are merely a collection of variables — they do not have methods, properties are always considered public, and they don’t support inheritance or other more complex features.

Data Structures

Since data structures are simpler, we’ll start there. The basic syntax to define a data structure should look pretty familiar:

structure Foo
  is exact
{
    uint8 $aProperty;
    bool $anotherProperty;
    …
}

A structure is declared with the keyword structure, followed by a list of variable declarations. (Notice the is keyword, which will be covered shortly — its purpose and behavior is exactly the same as it is in classes.)

Once defined, structures can be instantiated like any other variable, and its members can be accessed through the period:

Foo bar;
$bar.aProperty = 2;

When it comes to the syntax for simple data structures, there isn’t much more to say. Classes, however, are another matter entirely.

Classes

public class MyClass
  extends TheirClass,
  extends protected AnotherClass,
  uses SomeTrait,
  is singleton
{
    public uint8 $aProperty = 1;
    private bool $anotherProperty = false;

    constructor()
    {
        …
    }

    constructor( uint8 argument );
    {
        …
    }

    destructor()
    {
        …
    }

    public bool getAProperty()
    {
        return $this.aProperty;
    }
}

Demonstrated above is the syntax for several important features.

  • Classes can be declared as public, protected, or private, dictating their visibility within their module and to external modules. This functions in the same way as it does for functions. If omitted, the class is assumed to be either public or protected, depending on the compiler settings.
  • Classes can inherit any number of other classes through the extends keyword. A visibility keyword (which defaults to public) can be specified, which determines the level of access to the parent class from the outside. If a class is inherited protected, subclasses can access its properties and methods but they cannot be called from outside the class; when declared private, the properties and methods of the superclass can only be accessed by the inheriting class.
  • Traits can be included using the uses keyword. Traits provide a mechanism of re-using small chunks of funtionality from class to class. More on this will follow later.
  • The is keyword can be used to specify additional attributes that dictate how the class behaves. Examples of parameters include singleton (which allows only one instance of the class to be created at any given time; new instantiations simply create a reference to the existing instance), aligned(…) (forces instances of the class to be created on specific memory boundaries), exact (stops the compiler from re-ordering the in-memory representation of properties), and final (other classes cannot inherit this one).
  • All class members can have a visbility keyword associated with them to dictate whether that member can be seen and accessed by all other code, only the code of the class and its subclasses, or only the code of the class.
  • Properties can have an initial value defined. When an instance of the class is created, the property will be automatically initialized to that value, removing the need to manually hand-code that initialization in the constructor.
  • Constructors and destructors are special functions that do not have return types, and get called automatically when an instance of the class is created or destroyed. A number of different versions of constructors can exist (with different combinations of arguments), but there can only be one version of the destructor (which never takes any arguments).
  • Methods can access other class members through the this keyword, which is always a reference to the current instance of the class.

Some other features of classes are:

  • Constants and data types (including other classes) can be defined as part of the class. This is mainly useful for structures and enumerations specific to that class.
  • Multiple versions can be defined of each function, with the one that actually gets called depending on what combination of arguments is being passed to it.
  • Properties can have getter and setter functions defined as part of their declaration (am I the only one who is tired of hand-coding a billion getSomething()/setSomething() methods?). They are called automatically when accessing the property.
    
    public uint8 $aProperty = 1
        get():
        {
            return $this.aProperty + 5;
        }
        set( $value ):
        {
            if( $value > 100 ) throw SomeException();
        };
    

    Both getters and setters are optional. Within setters, there is no need to actually write the new value to the property (unless you wish to override it). As long as it’s not overridden within the setter code, and no exception is thrown, the new value is automatically written.

    Note that the type of the setter’s argument does not need to be specified; it is always identical to the type of the property.

    In some cases it may be necessary to specify different getters or setters depending on whether the property is being accessed from within the class, from a subclass, or from the outside. The regular visibility keywords can be used to accomplish this:

    public uint8 $aProperty = 1
        public set(): { … }
        private set( $value ) { … };
    

    When no getter or setter is available for a given visibility level, access on that level will be denied entirely. This is useful to create properties that can be read publicly, but only written from within the class:

    public uint8 $aProperty = 1
        public get(): { return $this.aProperty; }
        private set( $value ): { … };
    

    The primary use for these getters and setters is to make sure that the values for properties are within allowable ranges, thus making sure that the class is never left in an inconsistent state. However, as an extension, they can be used to create dummy properties, which are read-only properties that are computed on-the-fly and do not take up memory space (ideal for status flags and the like). Using these features, a semantic distinction can be created between accessing an object’s information and actually performing tasks on it.

  • Classes can overload most of the operators, similar to C++.
    class MyClass
    {
        uint8 $value;
        operator + ( MyClass $rhs )
        {
            return $this.value + $rhs.value;
        }
    }

    Operator overloading introduces a risk of abuse by programmers, but if used correctly, can be extremely useful. Imagine writing a class that represents a complex number: by overloading arithmetic operators, the syntax to perform such operations on its objects becomes much more intuitive. Another example is overloading the array access operator […] in container classes.

  • Methods can be declared as virtual, allowing subclasses to override them (which is not allowed by default). One can also declare pure virtual methods; i.e. methods that do not have an implementation, but rely on subclasses to provide one. Unlike other languages, this does not make it impossible to instantiate objects of that class; instead, attempting to call a pure virtual method causes an exception to be thrown.
  • Properties and methods can be declared as static.
    class MyClass
    {
        static uint8 $aProperty = 1;
        static bool calculateSomething() { … }
    }

    Static properties and methods act the same way that global variables and regular functions do; the only difference is that they are declared and accessed as if they were a part of the class.

With all of these features, classes provide powerful features and intuitive syntax, both in their declaration and use.

But wait! There is more…

Run-Time Type Information

Run-time type information is the ability of the program to determine the types of variables, as well as retrieve information about those variables, when the program is running. When dealing with complex class hierarchies, this can become very important, as objects may need to be cast and re-interpreted between different types.

Executables incorporate tables of data about the data types that are used. Normally, each class (or other data type) will have an entry in this table. Types are identified by a type identifier: a 32-bit hash value, computed in a standardized manner (to ensure compatibility between code from different compilers). The data stored about a type include what it is (a class, enumeration, and so on), how much memory space objects of that type occupy, what classes they are derived from, relative memory offsets of properties, and tables of function pointers that help resolve what versions of functions to call in particular situations.

Whenever information about the object is needed, such as when casting an object to one of its inherited classes, these tables of information are accessed to find the needed data.

Having this type identifier also provides additional benefits. Because type identifier values exist for even the most basic data types, functions that use variable argument lists can reliably know what the types of the passed arguments are, and act accordingly. This is something that’s impossible in languages like C (and the reason behind the long list of possible options for the well-known printf() function). Having this information greatly reduces the potential for bugs.

Going on the example of printf(): simple data types like ints and floats may not normally have a type identifer stored along with them, but because their type is known at compile time, the information can still be passed down.

In addition to being useful for functions with variable argument lists, this mechanism also enables functions to accept single arguments of an unspecified type. This avoids the need to write several overloaded functions to deal with integer or floating-point arguments, for instance. (More in this will follow in the next post.)

The beauty of it all is that, even though there may be some overhead when using these features, when they are not in use, they don’t introduce any additional cost, and they can potentially be omitted entirely in low-level environments.

Coming Up Next…

In the next post in this series, we’ll take a look at some special data types and strings.

Comments

Leave a Reply




A Soul Waking