Language syntax

Top-level syntax

Hoodospel source code is a sequence of commands separated by newlines. Each command consists of a command name followed by a sequence of argument tokens followed by a command prefix with argument tokens which are used to specify more then one command argument. Both first and prefixed arguments are generally optional.

Between command name, first arguments, prefixes and prefix arguments there may be whitespaces (tabs and spaces). They may also precede command.

command ::= command-name ( argument+ )? ( prefix argument* )*

Both command prefixes and command name are non-empty sequences of latin capital letters and underscores starting with a capital letter.

prefix  ::= [A-Z] [A-Z_]*
command-name ::= prefix

Hoodospel also supports comments. Comment may be started at any place where some token is expected. Comments are identified by preceding hash character.

comment ::= "#" .*

There are eight kind of tokens which may form an argument: variables, numbers, single-quoted, double-quoted, plain and figure braces strings, functions and parenthesis expression.

argument ::= variable | number | string | function | parenthesis_expr
string ::=   single-quoted-string
           | double-quoted-string
           | plain-string
           | figure-braces-string
  • Variable is a sigil followed by a non-empty sequence of latin letters, digits and underscores. Two sigils are supported: $ indicates environment variable, & indicates hoodospel variable.

    variable ::= hoodospel-variable | env-variable
    hoodospel-variable ::= "&" varname
    environment-variable ::= "$" varname
    varname ::= [a-zA-Z0-9_]+
    
  • Numbers start with either a digit or a sign: _ for negative numbers and + for positive numbers. There must be at least one digit in number.

    number ::= ("_" | "+")? [0-9]+
    
  • Single-quoted strings are sequences of characters starting and ending with a single quote. To escape a single quote you should double it. No other escapes are possible.

    single-quoted-string ::= "'" ([^'] | "''")* "'"
    
  • Double-quoted strings are sequences of characters starting and ending with a double quote. The following escape sequences are accepted: \xXX (but not \x00), \uXXXX (but not \u0000), \UXXXXXXXX (but not \U00000000), \\, \", \r, \n, \t.

    Meaning of the escape sequences:

    Sequence Meaning
    \xXX Byte 0xXX.
    \uXXXX Unicode character U+XXXX
    \UXXXXXXXX Unicode character U+XXXXXXXX
    \\ Backslash
    \" Double quote
    \r Carriage return (0x0D)
    \n Newline (0x0A)
    \t Tab (0x09)
    double-quoted-string ::= "\"" ([^"\\] | escape-sequence)* "\""
    escape-sequence ::=   "\\x" ( hex-digit x 2 )
                        | "\\u" ( hex-digit x 4 )
                        | "\\U" ( hex-digit x 8 )
                        | "\\" [\\"rnt]
    
  • Plain strings start with either a unicode character, a lowercase latin letter, a back or forward slash, a dot or a dash. Following characters are considered a part of plain string as long as they are not whitespace characters, parenthesis, brackets or figure braces.

    plain-string ::= [a-z/\\.\-] [^\[\]{}() \t]*
    
  • There is a special kind of plain strings: figure braces strings that contain only figure braces.

    figure-braces-string ::= "{"+ | "}"+
    
  • Functions are just like prefixes, but unlike them functions start with a colon:

    function ::= ":" [A-Z] [A-Z_]*
    
  • There are also parenthesis expressions:

    parenthesis_expr ::= "(" argument* ")"
    

Command arguments

Different commands accept different arguments. There are kinds of arguments:

  • Lval arguments designate arguments which may be assigned to. Rlval arguments designate existing variables which may be assigned to. Both always contain a single variable token.
  • Empty arguments are for command prefixes. They designate that prefix does not accept any arguments: only the presence of the prefix matters.
  • Expression arguments are the only ones that may contain more then one token. In fact they may contain any number of argument tokens. Note that parenthesis in parenthesis expressions must be balanced.
  • Pattern is an expression which must result in a string value treated like described in pattern syntax section.

    There is no difference between expressions and patterns from the parser point of view.

  • Message is an expression which must result in a string value followed by other values treated like described in messages section.

    There is no difference between expressions and messages from the parser point of view.

  • Version argument is a single token: a single-quoted string looking like 'M', 'M.m' or 'M.m.p' (where M stands for major version number, m stands for minor version number and p stands for patch level).

Expression evaluation

Expressions are written in a reverse polish notation. They are processed as following: evaluator processes tokens one by one.

Some functions referenced by function tokens take fixed number of arguments, in this case this predefined number of arguments is popped from the stack. But there are also functions with variable number of arguments (only up to ten arguments are supported). In this case top value in the stack defines number of arguments that will be popped from the stack. Supported numbers: any non-negative integer, any string that will take all values on the stack be function arguments and }, }}, }}} and so on string which will make evaluator process the stack until corresponding {, {{, {{{ and so on respectively is found. E.g. the following constructs are the same:

PRINT MESSAGE (    abc def ghi / 4   JOIN )
PRINT MESSAGE (    abc def ghi / all JOIN )
PRINT MESSAGE (  { abc def ghi / }   JOIN )
PRINT MESSAGE ( {{ abc def ghi / }}  JOIN )

All will print abc/def/ghi.

Pattern syntax

Hoodospel uses ERE-like patterns. The following metacharacters are supported:

Single atoms:

.
Matches any character except for newline.
[…], [^…]
Collections: matches any ([…] form) or none ([^…] form) of the characters from the collection.
()
Capturing groups. You may specify up to ten of them.
^
Start of the line. Zero-width.
$
End of the line. Zero-width.
\…

Escape sequence. Escape followed by any of the metacharacters matches this metacharacter literally. Other supported escapes:

Escape Meaning
\xXX Byte 0xXX, except for x00: it is not supported.
\e Escape.
\n Newline character.
\r Carriage return character.
\t Tab character.
\b Backslash character.

Note

anything else is undefined

Quantifiers:

{N}, {N,}, {N,M}
Matches from N to M occurences of preceding atom. First form matches exactly N (M=N), second form matches N or more (M=∞).
*
Matches zero or more occurences of preceding atom.
+
Matches one or more occurence of preceding atom.
?
Matches zero or one occurences of preceding atom.

Other:

re1|re2
Branch: matches either re1 or re2.

Messages

Messages are strings in a printf-like format. That is regular text interleaved with %{flags}{conversion} atoms.

Supported flags (they must be given in order below):

+

For numbers: prepend + sign to positive numbers.

For strings: ignored.

-
Left-align the converted value. Default is right alignment. Only useful if field width was specified.
#
Convert the value to alternate form. Only meaningful for x or X (makes it prepend 0x to the result), o (makes it prepend additional zero unless first resulting character was already zero), e or E, f, g or G (makes it print decimal point even if no digits follow it).
0
Pad value with zeroes instead of spaces.
N or *
Specifies field width. N is a sequence of decimal digits not starting with 0. If * is specified then width is taken from the next argument.
.N or .*
Specifies precision. For d, i or u, x or X and o this specifies minimal number of digits printed, for e or E and f this specifies the number of digits to appear after the radix character, for g or G this specifies the maximum number of significant digits and for s this specifies the maximum number of characters.

Supported conversions:

u, i, d
Integer argument is converted to decimal notation (signed in case of %i and %d). Behavior is undefined when trying to use %u for negative integers.
o
Integer is converted to octal notation. Behavior is undefined when trying to convert negative integers.
x or X
Integer is converted to hexadecimal notation. If X is used then hexadecimal digits A till F are capitalized otherwise they are printed in lower case. Behavior is undefined when trying to convert negative integers.
e or E
Number is converted to [-]A.Be±C scientific notation. If E is used then capital letter E is used for the exponent, otherwise e is used.
f
Number is converted to [-]A.B decimal notation.
g or G
Number is converted to either scientific notation or decimal notation depending on its value. G uses E for scientific notation.
s
String conversion: embeds given string.