Programming Linguistics #3 — Functional Functionality

December 15, 2015 · Posted in Programming Linguistics 

Part two of this series went into a few things about the compilation process, modules, and established some of the basic data types and operators that our little programming language would have. In part 3, we’re looking at namespaces and functions.

As I’ve stated in part 1, my hypothetical language would be object-oriented. To me, it only seems natural that this is a thing, because any kind of usable data doesn’t limit itself to one or two variables. Instead, it consists of many variables grouped together into logical structures. The first thing that people try to do in a purely functional language is figure out a way to represent their data structures, and what you end up with is typically something similar to an object-oriented approach, but a lot less pretty.

Having said that, I don’t believe in overuse of classes and objects. Whenever I look at other code, I see far too many classes that exist purely as a container for (static) functions, and have no data members at all. The way I see it, an object should be a collection of related information (properties), along with a set of logical operations that can be performed on that information (methods), that together represent a ‘thing’. ‘Thing’ is a deliberately nonspecific term, because it can be anything — whether it be something more tangible like an employee, student, or car that is the subject of many programming examples, a more abstract concept like a shape or a file), a ‘thing’ that provides some kind service (like a printer driver), or something that facilitates other functions (like a mutex or memory page), doesn’t really matter. As long as it has information associated with it, and tasks can be performed on it, having it be an object is appropriate. Stateless functions do not belong in that category, and (in my opinion) belong as a regular function in whichever namespace is appropriate instead.

Another thing that I’m against are overzealous attempts at code re-use – being able to re-use your code is great, but if it complicates things to the point where you spend so much time and resources writing complex layers of abstraction to where the original ‘thing’ has been reduced to an unrecognizable spaghetti, you’re really overdoing it. But alas, that is a subject for another day.

To get back on the topic of this series: before delving into classes, we need to have their building blocks sorted out. We covered basic data types in the previous post, so we’re going to explore functions as well as their containers — namespaces.

Namespaces

Namespaces serve as containers for data types, functions, classes, global variables, and other namespaces. They are important, because they greatly reduce the risk of naming collisions, especially when trying to integrate third-party code into your project.

The top-level namespace, in which you work by default, does not have name. While I would consider it acceptable to write an application within the top-level namespace, referencing objects from other namespaces as needed, building any kind of re-usable library in the top-level namespace is not, since it’s very easy to create clashes (I am looking at you, Windows API).

Other than the top-level namespace, namespaces are identified by a name — an identifier that follows the same basic rules as the rules for variables, minus the dollar sign:

  • Identifiers can only consist of the basic Latin letters (so A through Z), Arabic numerals (0 through 9), and underscores.
  • Identifiers are case sensitive (foo is not the same as Foo), but the compiler will usually warn you if you declare to such items, since they’d be prone to confusion and therefore usually considered a bad practice.
  • Names cannot be identical to any of the reserved keywords, because it could create ambiguity and confuse the compiler.
  • I haven’t yet decided on whether an underscore can be the first character of an identifier. Tons of code that I’ve seen makes use of leading underscores to mark things like private functions, but there’ll be other mechanisms to handle those kinds of things, so in my eyes they’re unnecessary, ugly, and confusing.

Whenever not inside some kind of declaration (class, function, …), it’s legal to change the current namespace using a simple statement: namespace foo; Every declaration after that would be considered to be inside the foo namespace (explicitly declaring something to be inside a particular namespace is also legal). Namespaces can be nested infinitely, using a double colon to separate them: namespace foo::bar; would be an example. The namespace statement is always relative to the top-level namespace, and if desired, it’s possible to switch back to that by not providing another name: namespace ; (though, most commonly, there will be a single namespace statement near the top of the source file, without further switching later on).

The namespace statement only affects whetever file it is in (it does not automatically propagate to included files).

Whenever a variable, class, or function is being referenced, it’s assumed to be inside the current namespace, unless otherwise specified (a call to some_func() will look for that function in the current namespace unless it’s being called as foo::some_func()). If you’re using a lot of things from a different namespace, the previously discussed using statement can be used to ‘copy’ everything from an external namespace into the current one: using namespace foo; However, when both the local and imported namespaces contain things with the same identifiers, those defined in the local namespace take precedence, followed by each of the imported namespaces, in the order in which they were specified.

It’s also possible to alias long names to something shorter or easier to use: using namespace foo::bar::baz as qux; will allow you to access anything inside foo::bar::baz by writing qux::… instead. Collissions created this way (if the aliased name is equal to the name of an existing namespace) are handled in the same way as normal: things declared in the ‘original’ namespace take precedence over those declared in the aliased one.

Functions

Functions behave and are declared in ways that are familiar to many programmers. Functions have a name (which has to follow the same rules as the rules for namespaces), they can take zero or more arguments (of a defined type), and they can optionally return values to whoever called the function.

The basic syntax to define a function looks like this:

return-type function-name( argument1, argument2, … ) { … body … }

For example:

uint32 foo( char bar, float baz ) { … }

When necessary, functions can take a variable number of arguments, using three dots (or an actual ellipsis, Unicode U+2026), in which case special functions are available that can be called inside the function to access the arguments that were actually passed to it. (Unlike C however, a mechanism will be available to accurately detect the type of the arguments that were given — more on this in the next post.)

uint32 foo( … ) { … }

So far, this is nothing spectacularly groundbreaking.

There are, however, two significant departures from what is ‘standard’ in most other languages.

Function Visibility

Just like class members, functions can be declared to be public, protected, or private. If a function isn’t explicitly declared as one of these, it will be considered either public or protected, depending on the compiler settings used.

  • public functions can be called from anywhere in the module, and if the module is used as a library, other code linking against it can see and use these functions.
  • protected functions can be used anywhere in the same module, but not by other modules (if the module is a library, the address of the function won’t be made available to other programs linking against it and it is likely hidden from public documentation as well).
  • private functions can only be used within the same source file. This is mostly useful for ‘helper’ functions that are only needed for the routines defined in that source file, but aren’t necessarily useful outside of it. This behavior is implied with anonymous functions.

This mechanism helps to keep namespaces of libraries clean and enforce correct usage of code, avoiding bugs.

The public, protected, or private keyword goes before the definition of the return type:

public uint32 foo( char bar, float baz ) { … }

Multiple Return Values

Traditionally, functions return either one argument, or none at all (examples of languages supporting multiple return values exist, such as Lua, but they are few and far between). The ability to return more than one value is useful: there are many, many instances of functions that have to resort to modifying variables outside of their own scope through confusing pointer arguments because they are limited to only one return value. The only obvious way around this limitation is to return a structure, but this adds a fairly large amount of unnecessary boilerplate code, making it cumbersome to use.

So, instead of being limited to having only one return value, multiple return values can be declared, using commas to separate them:

return-type-1, return-type-2, return-type-3 function-name( argument1, argument2, … )

Example:

public uint32, char foo( char bar, float baz )

Aside from memory usage, there is no defined limit on how many values may be returned.

When writing the code for such a function, the return statement works in much the same way:

return 16, 'Q';

The function must specify a value for each of its return values, but in code calling the function, return values may be omitted (from right to left) or assigned to void if the caller is not interested in a particular return value:

uint32 bar;
char baz;

bar, baz = foo();
bar = foo();
void, baz = foo();

Function overloading is possible based on both the number and types of arguments, as well as the number and types of return values, by declaring multiple functions with the same name but different combinations of arguments and return values (the names given to arguments is irrelevant in this, only their number and types).

The main reason why you would want to create two versions of a function, where one has additional return values and the other one does not, is to avoid having to perform costly calculations to obtain these values, so the main use is optimizing code.

Calling Conventions

The addition of multiple return values necessitates changes to function’s calling conventions, and because of that, functions using this feature will by nature be incompatible with functions written in a different language. Though there will at least be a way to specify what calling conventions a function is to use (mainly for compatibility with existing libraries), the default calling conventions will either be language-specific, or language-specific only for functions that require it, and using exising calling conventions for the rest.

Calling conventions are generally at least somewhat platform-dependent, due to differences in what registers are or aren’t available on a particular CPU architecture. However, within the same platform, and possibly excluding interfaces with code written in other languages, programs written in this language should use the same calling conventions to maximize compatibility.

Coming Up Next…

The next post will go a little more in depth into the type system, including composite types and run-time type information.

Comments

Leave a Reply




A Soul Waking