User:Dcoetzee/Wikicode/Specification

A while ago, on Talk: Binary tree, dze27 said:

Also, perhaps we need some sort of consistent pseudo-language for specifying pseudocode. Something like they do in Introduction to Algorithms, or smilar dze27

The pseudocode in Intro to Algorithms requires some rather complex typesetting (I've had to duplicate it before). This would be better for readers, but not conducive to editing by writers. I agree there needs to be consistency; I've written pseudocode on a number of pages, but I don't claim mine is ideal. I try to focus on a few aspects I consider important:

Clarity: the most important; it should be clear what the code does
Brevity: the code should not occupy much article space, and should consist mainly of algorithm content as opposed to unnecessary syntax
Ease of writing: the code should be easy to write and edit
Universality: the code should be clear to programmers familiar with nearly any common language, and not too similar to any one

To facilitate these I've adopted a few conventions, such as Wirth-style := for assignment, use of indention in place of blocks and end marks, and so on, but I appreciate any suggestions for improvements.

Derrick Coetzee 18:30, 4 May 2004 (UTC)

From that discussion I later decided to write this proposal for a standard pseudocode. Here's the somewhat incomplete proposal, intended to meet the above general goals. Please criticize, expand, and help wikify and finalize it. Please be bold in editing the proposal! Although I (DCoetzee) wrote the initial version, I do not claim it, we share it.

The Wikicode Standard

This is an informal standard for a pseudocode which I call wikicode. The purpose of wikicode is to easily facilitate both writing and reading of clear and concise descriptions of algorithms and data structures. All wikicode will be typeset in a fixed-width font, for example by prefixing each line with spaces. Do not wrap the code in HTML <pre> tags; this disallows additional markup inside the code block. If wikicode occurs in the middle of encyclopedic text, this will be indicated by wrapping it in HTML <code> tags, which also use a fixed-width font.

Functions

A possibly recursive function is defined using the emboldened function keyword. It is given a list of arguments; types may or may not be specified for the arguments, but if they are they will be indicated by the argument name, a colon (:), and then the type name (see Types). Unabbreviated names should be preferred, unless the name is explained in the text or comments:

 function sort(list, comparisonPredicate)
     (indented body of function)

 function addModK(left : int, right : int, modulus : int)
     (indented body of function)

Results are returned from functions using the return keyword, as in return 5 or simply return if there is no result. Functions are called by simply specifying the name, followed by a parenthesized list of arguments:

 addModK(5, 3, 7)
 sort(myList, func(x,y) x < y)
 sin(5)

Here, sin determines the sine of a value.

As shown above, anonymous functions may be created with func. You can also generally assume all sorts of useful math, string, etc. functions are available, as long as you explain them, in comments or in the text (as also shown above). However, when manipulating the built-in lists, sets, and maps, please use their standard interface (see their sections).

Comments

Comments, surrounded by parentheses and typed in italics, occupy an entire line or a suffix of a line. Parentheses are used in analogy with English text, to suggest that comments are parenthetical:

  (Do something silly)
  i := i + 0       (add nothing to i)

If a comment is so long it requires multiple lines, it is preferable to embed it in the text somehow instead; however, if necessary they may be done simply by placing the entire comment inside parentheses, italicizing each line, and indenting appropriately:

    (This is a very long and unnecessary comment which is somewhat difficult
     to edit because it would require the editor to fix the line wrapping,
     which is bound to be annoying.)
    i := 5

I'm hoping for a better proposal for multi-line comments.

Variables

Variables are mutable, are referenced by name, and are assigned new values using the := operator. See Names for naming guidelines. Local variables need not be declared, but if you wish to you can use the var form for this:

 var w               // number of free munchkins
 var x := 2
 var y : int := 5
 var z : int
 var a := 2, b : int, c, d : int := 5

Variables need not be initialized when declared, but may not be used before being initialized.

Conditionals and Loops

Conditional tests can be done with the if-else statement, with if and else emboldened:

 if x = 5
     (x is 5 here)
 else
     (x is not 5 here)

There is no then or endif, and there need not be an else branch. The condition must have have bool type (see section Types).

There are only four types of loops, the while loop, and the do-while loop, the for-to loop, and the for each-in loop. The while loop iterates while the condition holds; the until keyword may also be used to iterate until the condition does not hold. No extra parentheses or keywords are needed:

 while x ≠ 0
      (body of while loop)

The do-while loop, unlike the while loop, always executes at least once before testing its condition:

 do
     (body)
 while done = false

The for-to loop steps an integer variables along a range of integer values in increasing order. For decreasing order, for-downto can be used.

 for x := 1 to 10
     (body, executed 10 times)
 for x := 10 downto 1
     (body, executed 10 times)

Finally, the for each-in loop is used with lists and sets (see their sections) and iterates over the elements of the list in order or set in some arbitrary order:

 for each n in numberList
     (do something with n)
 for each n in numberSet
     (do something with n)

The keyword end loop can be used to break out of any loop at any time.

Operators

From highest to lowest order of precedence, the standard operators are:

Op	Typed	Description
`^`	`^`	Exponentiate numeric values
`×,÷`	`×,÷`	Multiply/divide numeric values
`+,-`	`+,-`	Add/subtract numeric values
`=,<,>,≤,≥,≠`	`=,<,>,≤,≥,≠`	Comparison
`∈`	`∈`	Collection membership
`not`	`not`	Logical not
`and,or`	`and,or`	Logical and, or
Need more operators

All of the entities used above are rendered correctly by all modern browsers. All operators are left-associative except ^. There is no integer divide; instead use truncate(x ÷ y), or just floor(x ÷ y) for positive quantities.

New operators may be used at any time, if they are explained. If you do need to invent your own operators, consider using:

∼,≈,≡ (&sim;,≈,&equiv;) for equivalence relations
add more good candidates here

Types

There are only a few built-in types available, as well as mechanisms for creating user-defined types. Types (like comments) are always typeset in italics; no ambiguity arises because types should not occur on a line by themselves or be preceded by //.

The simplest built-in type bool has one of two built-in values: true or false. Any bool value is a valid condition for a conditional or loop construct. The built-in type int is an infinite-precision integer type including positive and negative values and zero. The type real represents infinite-precision, infinite-range real numbers, while float represents a floating point number with whatever characteristics you like; if they are important, explain them in the text.

The type character includes all conceivable writable characters and is written by surrounding the character with single quotes ('), and string is a finite-length string of such characters, surrounded by double-quotes ("). Double-quotes may be used inside a string, as long as it's clear from context where the string ends; the outer double-quotes may be emphasized with bold ("hel"lo") if this is not clear.

Although these types are unrealistic, they are useful for two reasons: some of these types can be simulated to a large extent in practice (using arbitrary-precision arithmetic libraries), and, more importantly for pseudocode, the use of such types simplifies many algorithms, eliminating some relatively unimportant implementation concerns.

There is also a primitive list type, a primitive set type, and a primitive map type. See their respective sections for more details.

A new record type (called a struct in C) may be defined using the record keyword, may be recursive, and must include field names but not necessarily types:

 record employee
     name : string
     age : int
     mentor : employee
     stuff

Records, as well as tagged unions below, are only referred to by reference, and references may contain the special value null to indicate they refer to nothing. Record fields are accessed using ., as in e.name, and such values may be assigned to. Multiple fields may be listed on a line, if commas are used:

 record employee
     name : string, age : int
     mentor : employee, stuff

Types may also be constructed using the tagged union keyword, which defines a tagged union, much like ML's datatype. Tagged unions may be recursive, and assign a tag name to each branch:

 tagged union tree
     Leaf:
     Node:  left : tree, right : tree

Fields are specified after each tag; only one set of these fields is available at a time, depending on the tag. The tag of a tagged union object may be determined using the cases keyword, and within the cases, fields of the correct branch of the tagged union may be accessed:

 cases t
     Leaf: (it's a leaf)
           (do stuff)
     Node: (it's a node)
           (do stuff with t.left and t.right)

If a type is not specified for an entity, such as an argument, local variable, or record field, it can assume any value. Operations such as + may "fail" (make the pseudocode invalid) if they may be applied to a value they do not apply to.

List

A list stores a sequenced list of values. The primitive list operations are:

 list(1, 2, 3)             (the list containing 1, 2, 3 in that order)
 emptyList                 (the list containing no elements)
 insertFront(list, value)  (inserts new value at front of list; push)
 insertEnd(list, value)    (inserts new value at end of list; enqueue)
 removeFront(list)         (removes value from front of list; pop/dequeue)
 removeEnd(list)           (removes value from end of list; undo enqueue)
 isEmpty(list)             (returns true if and only if list is empty)
 getFront(list)            (gets the value at the front of the list)
 getEnd(list)              (gets the value at the end of the list)
 list[i]                   (gets ith element of the list, zero for front)
 x ∈ list                  (x is an element of the list; written &isin;)
 size(list)                (gets number of elements in the list)
 listToSet(list)           (get a set containing all elements of the list)

You can also use for-in to iterate over a list from front to back.

The type array is a special synonym for list which implies that the operations above have the following associated costs, where n is the number of elements in the array:

 insertFront(list, value), removeFront(list)     O(n)
 insertEnd(list, value)                          O(1) amortized
 removeEnd(list)                                 O(1)
 isEmpty(list)                                   O(1)
 getFront(list), getEnd(list), list[i]           O(1)
 x ∈ list                                   O(n)
 size(list)                                      O(1)
 listToSet(list)                                 O(n)

There is also an array constructor array(1, 2, 3) and an object emptyArray equivalent to the list versions but meant to suggest they produce arrays.

Set

A set stores an unordered sequence of values with no duplicates. The primitive set operations are:

 set(1, 1, "a", 3)         (the set containing 1, "a", and 3)
 emptySet                  (the empty set; don't use the math symbol)
 insert(set, value)        (insert a new value into the set)
 remove(set, value)        (remove a value from the set)
 x := extractElement(set)  (removes a value from the set and returns it)
 x ∈ set                   (x is in the set; written &isin;)
 size(set)                 (gets number of elements in the set)
 setToList(set)            (get a list containing all elements of the set)

You can also use for-in to iterate over a set in some arbitrary order.

Map

Maps are associative arrays. They take keys and produce associated values. All keys which have not been assigned values map to the special value unassigned, but to actually determine if a key is in the map it is preferable to use x ∈ keys(map). The primitive operations are:

 map(1 -> "a", 'k' -> 7)   (the map mapping 1 to "a" and 'k' to 7)
 emptyMap                  (the empty map)
 map[x]                    (get value associated with x)
 map[x] := y               (make x map to y from now on)
 map[x] := unassigned      (unassign x)
 size(map)                 (gets number of assigned keys in the map)
 keys(map)                 (gets a list of the keys in the map)
 values(map)               (gets a list of the values in the map)

You cannot iterate over a map; if you need to do this, iterate over the list of keys instead, looking up corresponding values as needed.

Names

No two named entities, including functions, variables, and types, may share the same name in a single code listing (although duplicate names are possible in a single article). Valid names may include letters, numbers, greek letters, and mathematical markup, such as subscripts, superscripts, and primes (typed as apostrophes), but names may not start with a number. Names are case-sensitive, but names that differ only by case are strongly discouraged.

Multiword names will use mixed case, always beginning with lowercase. Underscores (_) or other word separators will not be used.

Line length

If possible, no line should exceed 78 characters, but there may be up to 100 characters on a line. Also, if lines can be made shorter without adding additional lines or sacrificing clarity, this should be done.

Indention

The entire code listing will be indented an initial 2 spaces. A function body or the body of a conditional or loop construct must always be indented exactly 4 additional spaces. For example:

 function foo(x, y)
     if x ≠ y
         (Doesn't terminate for negative y)
         while x > y
             x := x - y
         return x
     else
         foo(x + 1, y)

Any other indention performed is up to the writer, but should be done in a clear way that keeps line length under the maximum while retaining clarity and not falsely associating unrelated elements.

The following are some helpful suggestions about indentation that are not mandated. If a function call's arguments are large enough that they cannot fit on one line, a line break should either follow an operator, a comma, or an opening parenthesis. If it follows a comma, the next line is indented to the column following the initial opening parenthesis of the call. If it follows an operator or parenthesis, it is indented 2 additional spaces. For example (don't really use names like the following):

 function foo(x, y)
     if x > y
         otherFunctionWithGreatBigName(thisArgumentFillsUpTheRestOfTheLine,
                                       argument1ToThePlusOperator +
                                         argument2ToThePlusOperator,
                                       callYetAnotherFunctionWithBigName(
                                         argToYetAnotherFunction))

Naturally, since this is less clear, clarity of naming and line length should be traded off intelligently.

Links

Links may be used anywhere they are useful in pseudocode, particularly in comments and around calls to functions not specified in the current listing. However, if used on an emboldened or italic word, it should be used in addition to the existing markup. It may also be desirable to link tagged union or record the first time they are used conspicuously in an article. Also, use common link sense: link only strongly relevent articles.

A page will be created describing wikicode for readers, and each page which uses wikicode must immediately precede its first conspicuous use with {{msg:wikicode}}, a message which will provide the reader a link to this page.

Please add your own comments here, but don't hesitate to edit the proposal!

Some thoughts

(Style?) Instead of declaring type of variables after the name, perhaps they should be done before: integer left, integer right, integer modulus for example.
- This is a rather difficult issue; placing types before the name works fine for single-word types, but can become rapidly more complicated as types do, until it's no longer clear where the type ends and the variable name begins; the italics might help there though.Derrick Coetzee 23:31, 5 May 2004 (UTC)
  - Bold types? Dysprosia
    - Can we count on italics or bold being always available in the reader's browser? If not, some sort of flag will be needed. See comment comment below. ww 22:35, 6 May 2004 (UTC)

Comments should always have some kind of pre-delimiting thingy. Italicised comments solely on one line will be confusing (above, for instance, "Do something silly"). The italics don't matter so much.
- My previous complaint about this was bogus. I don't yet have a good solution for multiline comments, and I've mandated parentheses around comments now as well as italics. Derrick Coetzee 03:12, 6 May 2004 (UTC)
  - They look great now :) Dysprosia 03:44, 6 May 2004 (UTC)
    - Can we count on italics or bold being always available in the reader's browser? I suggest some sort of delimiter (at both ends of the comment) as mandatory, with italics being strongly encouraged. It's always easier when reading to ignore something that's present (possibly logically superfluous, and inelegant too) than to supply an equivalent -- in this case delimiting marks. Easier to read... ww 22:35, 6 May 2004 (UTC)

I don't like this "var x" thing. It should not be the sole way of designating variables in a pseudocode form such as this. If one is demonstrating weak typing, sure, but otherwise, one should probably use a declaration form such as integer number, real your_bank_balance, boolean is_true, etc
- Ah, you're misunderstanding here. A type specification is allowed, as the examples with : int demonstrate, they're just not required. You may be looking for them before and not after (see second point). Derrick Coetzee 03:12, 6 May 2004 (UTC)
  - No, I mean that in tandem with my other thought that types should come before the variable (so var x : int => integer x. var x can be used for weak typing examples. Dysprosia 03:44, 6 May 2004 (UTC)
    - I'm going to suggest that, as clarity is more important than easy of lexical analysis, or parser construction that parameters be very clearly marked as that's a source of difficulty in reading. I suggest that some like IN: foo, bar OUT: foo, bar for params that are being passed by reference, and USES: beeble, brox for params passed by value. For similar reasons, it should probably be madatory that params and variables and constants should be explicitly typed. An old and ugly idea, for which apologies. More keystrokes, but better clarity for the reader. ww 22:35, 6 May 2004 (UTC)

Maps should be a little more general, not just mapping numbers to anything, but also mapping strings and other types to anything as well.
- On this point, I didn't mean to imply they were so limited; I'll expand the example set.Derrick Coetzee 23:31, 5 May 2004 (UTC)

Indenting shouldn't be so rigidly described however. The way wikicode is structured won't make things too unreadable.
- Done; is this more like what you had in mind? Derrick Coetzee 03:12, 6 May 2004 (UTC)
  - A standard is reasonable, but mandatory doesn't quite cope with the odd cases which arise. Not sure how to do so, beyond exhortations for clarity nice examples here, and some edit prowling by others. Fixed numbers of spaces are better than tabs, of course, in respect to readability, but present troubles in some fonts browsers may be using. Can we always assume availabilty of a monospaced font? ww 22:35, 6 May 2004 (UTC)

This is all just some random thoughts, so feel free to disregard what I say as you see fit ;) Dysprosia 23:09, 5 May 2004 (UTC)

I'll have to take another look at this later (just taking a quick re-examination atm) Dysprosia 03:41, 6 May 2004 (UTC)

Thanks a lot for your careful examination and comments, will make some related changes soon. Derrick Coetzee 23:31, 5 May 2004 (UTC)

I don't think brackets on a function call should be optional. Why have two ways to do the same thing? CGS 23:44, 5 May 2004 (UTC).

The idea is to reduce clutter in some cases; it's done this way in languages like ML, for example. It may in fact not be justified though. Change it if you like. Derrick Coetzee 00:06, 6 May 2004 (UTC)

Update: Done; I think you're right that the benefit of consistency outweighs any chance of added clarity here. Derrick Coetzee 03:13, 6 May 2004 (UTC)

I can see how something like this would be useful to illustrate simple algorithms. But even there, you can't necessarily compare the efficiency of two algorithms without knowing a little bit more about how data structures are meant to be implemented. For instance, "lists" in real programming languages are sometimes linked lists (as in Lisp) and sometimes vectors (as in Python). With linked lists, inserting an element in the middle is cheap, but taking the list's length is expensive; with vectors, it's just the other way around.

This is a pretty general problem; for the same reason it's not clear how efficient the map type is. My philosophy here is that the operations I describe are just an interface to an abstract data structure, and that in translating to an executable code (or performing analysis) you would choose an appropriate real data structure based on the set of operations being used. On the other hand this might be potentially confusing to the reader; I'm not sure which way to go on this. I certainly don't want a proliferation of collections for every concrete collection data structure. What do you think? Derrick Coetzee 15:43, 6 May 2004 (UTC)

Well, if we assume that everything is by-reference, you don't need linked lists built-in, you can do them with records just like in every high-school Pascal class. :P Let the built-in list type be like an extensible array or vector: the natural way to loop through it is to iterate over it with an index variable, not to cdr down it, but you can still push things on the ends like a deque. --FOo 20:52, 6 May 2004 (UTC)

I'd caution here that too much abstaction will inevitably cause comprehension problems. Recall that the Reader assumed in many cases will not be familiar with level of abstraction clarifying tricks and if so is more likely to be confused than helped. I have no solution to this, of course, but have repeatedly run across it when attempting to address said type of reader/listener. ww 22:35, 6 May 2004 (UTC)

To Fubar, I'd like to emphasise that the purpose of these types is just to simplify examples that deal with collections but don't fundamentally depend on how they work; linked lists would still be built by hand for example in the linked list article.

Here's my compromise I've come up with: each abstract type has associated synonyms that suggest particular implementations, and so assign effective costs to each operation that can be used in analysis. See the list section for my first example. Derrick Coetzee 22:53, 6 May 2004 (UTC)

I also don't see anything here about pass-by-value vs. pass-by-reference. This matters, too: can functions alter their parameters in the enclosing scope of the call?

Certainly required in some cases. C's management of it is abysmal, in my view. ww 22:35, 6 May 2004 (UTC)

At least one more operation on maps is needed: keys(map), to extract a set or list of the keys. And how is the map unassigned value distinct from the record null value? Is one meant to smell like a Lisp NIL and the other like an SQL NULL? :)

Yeah, I just noticed the lack of keys operation myself, fixed that. As for unassigned versus null, the reason null is not used for both is so that a key can map to null without being unassigned. Derrick Coetzee 15:22, 6 May 2004 (UTC)

That sounds a little confusing, it leads to the question, "If I can store a record null in a map, can I store a map unassigned in a record?" Thus, each different subscriptable type ends up with its own way of saying "nobody home"?

The way various real languages do it may be illustrative: In Perl, you can ask if a map key is defined. In Python, you can use the in operator on any collection type -- for maps ("dictionaries") it asks if a key is defined in the map. --FOo 20:52, 6 May 2004 (UTC)

The keywords tagged union and cases are related to one another, but don't look related from the names. Perhaps type union and type case? --FOo 14:18, 6 May 2004 (UTC)

Thanks a lot for the feedback, will address your other points a bit later. Derrick Coetzee 15:22, 6 May 2004 (UTC)

Block delimiting

Instead of using indenting, it may be more clearer to use begin and end blocking words to make the function body more clearer, if the block is larger than one line.
- This doesn't seem unreasonable; however, I don't really want to have end keywords for conditionals and loops, because of the useless space they occupy (without much added clarity). I'm following the style of Introduction to Algorithms in that. What do you think? Derrick Coetzee 23:31, 5 May 2004 (UTC)
  - I know and agree that they're ugly, but I think they are more clear, in that the code block is written more explicitly. Dysprosia 03:41, 6 May 2004 (UTC)
    - I would suggest indent as primary choice (easier to read...) and some block delimiter as alternative where indentation is confusing. Latter should be explicitly termed last resort in the spec, even though equivalent. ww 22:35, 6 May 2004 (UTC)

For block delimiting, I would prefer to use parentheses (like in ML) to indicate a list of statements. Shown below are examples without delimiters, with begin and end tags, and with parentheses.

 sum := 0
 while n > 0
     sum := sum + n
     n := n - 1

 sum := 0
 while n > 0
 begin
     sum := sum + n
     n := n - 1
 end

 sum := 0
 while n > 0
 (
     sum := sum + n
     n := n - 1
 )

Out of these examples, I like the parentheses best.... jaredwf 16:18, 8 May 2004 (UTC)

But with the indentation the parentheses are redundant. CGS 09:29, 9 May 2004 (UTC).

It is redundant, but I think it is clearer to have the extra space around the block to separate it from the surrounding code. Either way is fine with me, but I still prefer the well-spaced code. jaredwf 09:50, 9 May 2004 (UTC)
Indentation is easily broken, especially whe you are modifying the code. There is one TCL language that tries to do without blocking markers. It is pain in the ass to fix cut'n'paste errors in it. Mikkalai 21:17, 13 May 2004 (UTC)

May I remind you still another way that saves one line of code

 sum := 0
 while n > 0
     sum := sum + n
     n := n - 1
 endwhile

or a variation:

 sum := 0
 while n > 0
     sum := sum + n
     n := n - 1
 elihw

Moreover, a disadvantage of parentheses is that they visually "blur" the boundaries of a block, you know, kind of gray-scale halftones between black and white. Mikkalai 21:24, 13 May 2004 (UTC)

Whatever approach, IMO block termination keyword is a must. Suppose you had

 while n > 0
     sum := sum + n
     read_next(n)
 report(n)

By a single keystroke, I convert it into

 while n > 0
     sum := sum + n
     read_next(n)
     report(n)

It requires a man who knows an algorithm in depth to catch the bug. Mikkalai 23:31, 13 May 2004 (UTC)

Iteration

The control statements look good, though, perhaps for the foreach type code, one could actually write for each blah in...
- Done, good idea. Derrick Coetzee 03:12, 6 May 2004 (UTC)
  - Id like to see the spec as follows: for each blah in container, since in some sophisticated algorithms one may want to write, e.g., for each blah in heap or in some custom-tailored container. (cf: STL iterators) Mikkalai 23:46, 13 May 2004 (UTC)