Dromedary and a half - Natural languages in programming languages

Programming languages are constructed languages: they are designed for humans to communicate ideas, protocols, algorithms, etc. to computers. However, most programming languages include, in specific places, elements of natural languages. Comments are parts of a program that are ignored by a computer and used by programmers to explain their code to other programmers; they are reserved places for arbitrary content, often natural language, to be included. Keywords are elements of a programming language that use words from natural languages: “function”, “while”, “private”, “class”, “module”, etc.

More interestingly, identifiers are often arbitrary strings of alphanumerical characters. It is considered good practice to use meaningful words or group of words as identifiers. This makes the code self-documenting in that it attaches meaning to elements of a program.

Cases and position

Some natural languages write different grammatical cases differently. German, Russian, Latin and many other languages have this feature. English on the other hand relies mostly on position to distinguish between the role of different words. There are only a few cases left – e.g., the difference between “I”, “me”, and “my”.

Additionally, English verbs and nouns are often identical: “fish”, “search”, “return”, “ride”, “plant”, “drive”, “act”, “tie”, etc. and can only be differentiated by context. On top of which, English verbs nouns and nouns verbs.

Confusing identifiers

Cases and the ability to differentiate verbs from nouns is useful when using natural language words as identifiers in programs. Consider Python’s sort and sorted:

sort is an action, it is an active form that acts upon its argument and modifies it. It has side-effect: it sorts its argument.
sorted is a state, it is the result of a process. It returns a fresh collection with the same elements but sorted.

However, some other built-in functions have nouns as identifier: “bool”, “tuple”. For these, it is unclear whether they mutate their argument or not – does this function tuple (tuple-ise?) its argument or does it give back a tuple version of its argument?

Programming paradigms

In Object Oriented Programming, objects have methods (usually identified by verbs) and fields (usually identified by nouns). One of the way Java programmers make sure these are differentiated is to use explicit getters and setters for the fields. This way, the object only exposes methods/verbs. Additionally, they prefix getter and setter identifiers with get and set to make it unambiguous.

In Functional programming, functions are values which blurs the line between what constitute a verb and what a noun. I don’t know of any guidelines to avoid confusion; I don’t know to what extent such guidelines would be effective.