Programming Linguistics #6 — Pointing Fingers

February 2, 2016 · Posted in Programming Linguistics 

Part #5 of this series dove deeper into run-time type information with the type-of operator, and enabled us to have enumerated lists, as well as sequences of characters. Today, we’ll round out the type system by taking a look at unions, pointers, and references.

In addition to structures and classes, we have a third category of composite data types: unions. Unions allow us to access the same memory space in several different ways.

Another feature that we so far have not explored, are pointers and references. Through pointers and references, we can access objects by knowing where they are located, rather than needing to have the entire object on hand.

First, however…

Unions

In some cases, programmers need the ability to access data that is in the same memory location in different ways. The need to do this isn’t very common in a PC environment, but when working on an embedded platform, it often comes up. Having the ability to access the same memory location in multiple ways is often a great help when dealing with things like special function registers.

Obviously, there is a large potential for unintended consequences here. When using unions, one must take great care to make sure that the data within the union lines up the way it’s supposed to, and that no errors will result from mistakenly interpreting data incorrectly. This makes unions more ‘risky’ to use, but they can be a useful tool nevertheless.

The syntax to define a union is similar to that of a structure, however, each entry within the union defines a data type (or a set of data types), that all overlap the same memory space. It is not necessarily for entry to be the same length. Entries can be named; each name may only occur once within the union.

Let’s look at a basic example:

union MyUnion
{
    uint32 unsigned;
    int32 signed;
}
MyUnion $foo;

When an instance of this union is created, it can be accessed in one of two ways: $foo.unsigned lets you use the integer as if it is unsigned, $foo.signed lets you access it as if it was signed.

Now, for a more complex example: imagine that we’re programming for a Microchip PIC18F series microcontroller, and we need a convenient way to access its INTCON register as complete 8-bit value, or per individual bit. Here’s an example of how one would accomplish that:

union INTCONRegister
{
    uint8;
    set
    {
        unsigned<1> GIE;
        unsigned<1> PEIE;
        unsigned<1> TMR0IE;
        unsigned<1> INT0IE;
        unsigned<1> RBIE;
        unsigned<1> TMR0IF;
        unsigned<1> INT0IF;
        unsigned<1> RBIF;
    };
}

INTCONRegister INTCON
  is at(0xFF2)
  is volatile;

After these definitions, it would be possible to access the value of the entire register as $INTCON, or access the value of a particular bit within the register through $INTCON.RBIE.

Note that the above example introduces a couple of other new syntax elements as well:

  • The first entry, uint8; does not have a label associated with it. When not explicitly accessing a different element of the union (i.e. accessing it as $INTCON), the first unlabeled element is used. Further unlabeled elements are not accessible.
  • The individual bits are declared as unsigned<1> types, which are our first look at a built-in template type. This is one of three keywords that lets you declare data types of arbitrary size on-the-fly. unsigned<x> is an unsigned integer that is x bits wide, signed<x> is a signed integer that is x bits wide, and void<x> defines x bits of unused space. The void<x> type can only be used inside structures, classes, or unions, and never has a label; it is used simply for spacing and padding. signed<x> types must always be at least two bits wide; because one byte of signed types is used for the sign, signed<1> would not leave any room to store an actual value.
    Note: the signed<x> and unsigned<x> keywords cannot be used to define types that are wider than what the platform natively supports.
  • The set keyword groups together items of a union. Sets may optionally be named.
  • Variables can use the is keyword to define additional attributes in the same way that classes can. In this example, the at attribute specifies a fixed memory location for the variable, and volatile tells the compiler that the value of the variable can change at any time and that it should not try to optimize reads from it.

Pointers and References

For any languages with low-level capabilities, pointers and references are two important psuedo-types. They store the memory address at which some variables is actually located. Through pointers, certain things can be optimized (i.e. only passing the location of an object in a function call instead of the entire object), but they are also what make dynamic memory management possible.

There is a little bit of a difference between how pointers and references operate, however, to explain that properly, we first need to look at how pointers work.

Pointers

A pointer is datatype that is defined using the name of an already existing type followed immediately by an asterisk:

SomeClass* $foo;

Pointers always take up a fixed amount of memory (how much depends on the platform), and are automatically initialized to point to memory address 0. Since no actual object would ever be created there, normally pointers to that location are considered invalid, and attempting to use it would result in an exception. (The compiler will automatically determine when null pointer checks are appropriate and insert them to ensure no attempts are made to dereference null pointers.)

We can set the memory address that a pointer is pointing to by assigning it:

$foo = 0xABCD;

However, chances are that you want to initialize the pointer to the location of an existing object.

$foo = @bar;

Remember how the percent sign can be used to access information about a variable’s type? The ‘address-of’ operator @ can be used to retrieve the memory location of a variable (typed as void*). In order for this to work, $bar must be an object of type SomeClass; if it was anything else, the compiler would yell at you because the types are incompatible (with the exception that void* is implicitly compatible with any other pointer; this is a detail that makes using malloc()-style functions simpler to use).

Trying to read $foo at this point will simply yield the memory address that was assigned to it, however, you might want to access the object that the pointer is pointing to instead. You do this by prepending an asterisk, like so:

*$foo.someMethod();

The true power of pointers comes in play when dealing with arrays or other cases where it becomes necessary to compute locations of objects in memory, however, more on this will follow later.

References

Pointers are a relatively tricky thing to deal with. Care must be taken to make sure that the address that is pointed to doesn’t become invalid. In the above example, what if $bar gets deallocated but you’re still trying to use *$foo? The result would probably not be what you expected it to be.

References provide a somewhat more robust method of indirectly accessing objects. A reference is declared similarly to a pointer, but uses the ampersand instead:

SomeClass& $foo = $bar;

References aren’t required to be initialized when they are created, but it is good practice. Trying to use an uninitialized reference will result in an exception being thrown (the compiler automatically figures out at what points this check needs to be made). A reference may be initialized with another reference of the same type, but will in that case simply copy the first reference, rather than becoming a reference to a reference. To access the pointed object, there is no need to use the asterisk:

$foo.someMethod();

One can make the reference point to a different object by using the @ operator:

$foo = @baz;

Whereas the memory location pointed to by a pointer can be easily changed, references can only be assigned in this manner, making it much more difficult to inadvertently make it point to an invalid object.

Using references can trigger some other automatic features that pointers do not offer:

  • Classes can inherit the built-in ReferenceCounted trait. When they do, a reference counter gets added to the class. Whenever a reference to an object of that class is created, the reference counter is automatically incremented by one; when the reference is changed to point to a different object, or the reference is destroyed for some other reason, the counter is decremented. This can be used to ensure that an object is no longer in use before destroying it, and can potentially be used by garbage collectors (even though those are not a part of the language by default).
  • In addition, classes can define the strongly_referenced attribute. Doing so surrounds instances of the class in memory by special marker values, adding a few bytes to the size of the object. Whenever a reference is initialized, a check is made automatically to ensure that the object the reference is being made to is still valid; if it isn’t, an exception is thrown. When the object is deallocated, the marker values are overwritten to ensure that deallocated space isn’t mistaken for a still-valid object.

These additional measures greatly reduce the risk of accidentally trying to reference a variable that no longer exists, or interpreting other data as if it was some object.

Overall, whenever possible, it is recommended to use references over pointers.

Coming Up Next…

In the next part of Programming Linguistics, we’ll take a break from data types by looking at how we can define and use macros, test or alter compiler settings, deal with project files, and how to take full advantage of what Unicode brings to the table.

Comments

Leave a Reply




A Soul Waking