Friday, 27 May 2011

Reprint: Colour in Language Syntax Analysis,1987

The Use of Colour in Language Syntax Analysis
Department of Computer Science, BruneI University, Uxbridge, Middlesex, UB8 3PH, U.K.
SUMMARY: The use of colour to reflect the syntax of typed-in language statements can make a language processor much more user-friendly. This paper shows how colour has been used in MicroCODIL and suggests ways in which the approach could be applied to other systems.
KEY WORDS:   Colour   Language Syntax   MicroCODIL   User Friendly Systems

Newcomers to computing often have difficulty with the formality of conventional language syntax. A mistake which seems minor to the user may produce a series of error reports. In theory the compiler (or interpreter) could guess what a 'loose' statement means in many situations, but if it did it would have to issue a warning message. Such messages interrupt the flow of the text on the screen and they discourage the user from being 'lazy'. This means that most language processors force the user to conform to the precise format required. Although such an approach may appeal to the language purist, a novice at the keyboard will find that the system is user-unfriendly.
    The problem arises because either the language processor must ensure exact conform­ ity with the language syntax, or it must indicate the nature of any assumption it has made. With either approach an error or warning message has to be issued to indicate the position and nature of the problem. With paper output or glass teletypes there is little that can be done. Regardless of terminal type any language processor which accepts input line by line has the problem of indicating the position(s) on the line of errors/ambiguities.
If the terminal/software allows characters to be processed as they are typed in it is possible to 'refuse' to accept illegal characters. In systems which do this it is usual to make a 'beep' to tell the user when an error has occurred. This catches the user at the point that he made the mistake, giving him an immediate chance to correct his error. Such an approach works well with obvious errors, but is of little use as an aid to correcting ambiguous situations where the language processor can make a reasonable guess as to what is intended.
    The advent of cheap microcomputers which drive colour television sets makes it possible to have an error-correcting input-syntax analyser which shows the user how it has interpreted the characters by echoing them in a colour appropriate to the type of token involved. A good example of this is the word processor package WORDWISE1. This uses red to indicate the start and end of text messages and green to indicate control functions. The word-count line is given in blue, which means that it stands out clearly from the main text (in white) when the screen is full. The result is very effective and it is very easy to distinguish between text and the control functions, even when the functions are required in the middle of a string, such as when superscripts are used.
    This paper shows how the use of colour has been used dynamically to inform the user of the interpretation of language statements. The language used was MicroCODIL,2-4 a non-procedural information processing system designed for teaching modern information technology in schools. MicroCODIL is based on CODIL5-7. In terms of file processing capability MicroCODIL is a subset of CODIL, but the microcomputer environment has allowed significant improvements to the user interface, including the use of windows and colour.
Figure 1. A typical MicroCODIL screen display
    The basic structure of MicroCODIL, and its applications, which are not relevant to the material in this paper, are described in the References.2-7 The default screen format is given in Figure 1. The relevant portion of the screen is the 'scrolling input area', where the user keystrokes are echoed, in colour. the mechanism used to generate the colour is described below. The 'user window area' is used for a number of purposes, and in some contexts it contains information using the same colour coding as the input area. In many cases the completed input is copied to this area.
Table I.   The Structure of a MicroCODIL item.
Feature Description
Level Absent or integer (maximum 15)
Item name Alphabetic (up to 16 characters)
 Optional, enclosed in brackets
(1) item name
(2) integer
(3) probability (0·005 to 0·995 in steps of 0·005)
@ or £ symbol if used
: symbol if used
The value is treated as an expression to be evaluated
Set partition
> = < symbols, or combination (default = if there is a value)
Any characters (may be restricted by picture)
, ? .  symbols (default .)
    The input structure whose processing is described in this paper is the 'item', and the syntactic components of an item are given in Table I.
Figure 2. Some typical MicroCODIL items
    Some typical MicroCODIL items are shown in Figure 2. The first six items have level numbers and are shown as they would appear in a statement describing a particular book. The last four items represent 'queries', each of which is true in the context of the initial items.
    The BBC Microcomputer is made by Acorn Computers8 in conjunction with the British Broadcasting Corporation. It is widely used in British schools. It contains a 6502 microprocessor and has a powerful 16K ROM-based operating system and a paged bank of 16K ROM software including a sophisticated BASIC with procedures, local variables and an integrated 6502 assembler. It has 32K bytes of RAM, but only part of this is available to the BASIC programmer. It has 8 display modes, giving a variety of colours and resolutions, but the high-resolution multicolour displays encroach heavily on the available RAM area.
    MicroCODIL is a comparatively large package, and efficient use of the RAM is essential. As a result the teletext display mode is used. The teletext character generator provides full colour text and backgrounds with the minimum use of RAM. There can be no more than 40 characters on a line and each change of foreground or background colour along the line occupies one character position.

    In describing the approach used to colour the syntax structures it is useful to describe the routines handling character input and output.
  1. All character input is handled by a routine which 'loads' a single character buffer (if empty) with the next character from either the keyboard or a text file. Tests are included for end of file and the escape key. Teletext colour codes can be 'input' when using the BBC screen editor and are bypassed. Other unexpected ASCII control characters are rejected with a 'beep'. Because the routines are used to read text files as well as keyboard input they become time-critical when implemented in the interpreted BBC BASIC. Most of the routines described in this paper have been written in 6502 assembler.
  2. When a character is displayed, the display routine precedes it with a byte containing either a teletext colour code or binary zero. When a teletext colour code is used the character appears in the appropriate colour, and the code is changed to binary zero for succeeding characters. Sending a binary zero to the screen has a null effect and the character displayed assumes the same colour as its predecessor. The routine also scrolls the 'scrolling input area' if the bottom line is full.
  3. The above routines are used by a 'picture' input routine. This reads characters from the keyboard/file until either a character is encountered which does not correspond with the picture byte or a specified number of characters is reached. The accepted characters are displayed in the appropriate colour, and backspaces are handled as long as the number of characters read does not 'go negative'. The 'picture' is either one of the characters given in Table II, or is treated as an exact match. This routine is used to read a string of characters corresponding to the picture byte, test and read the current input byte, or to read and discard sequences of spaces.

    Table II. 'Picture' control characters used in input
    Picture character Meaning
    e Characters form an arithmetic expression
    l Logic characters > = <
    p Punctuation characters . , ?
    A Alphanumeric (including space)
    N Numeric digits (i.e. unsigned integer number)
    R Real number
    W Word (alphabetic character, no spaces)
    X  Any 'printable' ASCII character
    other Only exactly matching character accepted
    The upper case letters may be used in user-defined pictures
    Table III.  Colour picture combinations for input
    Item name
    item name
    All three
     can occur
    End of qualifier
    ) 1
    : 1
    Set partition
    (If expression)
    as defined
    (see text)
    When the user types in an item, each field in turn is input using the colour and picture information given in Table III. For instance, at the start of an item the current colour is set to yellow, the picture character is set to 'N' and the length to 2. If the user types in one or two numeric digits these are echoed in yellow, and the resulting 'level' number is stored internally. If no digit is input as the first character, the character remains in the buffer and the level number is set to zero. The process is then extended to the next part of the structure, which is the 'item name'. The alphabetic characters of this are echoed in cyan, until a non-alphabetic character is reached. This process is repeated until the syntactic possibilities are exhausted and the current character in the input buffer is (hopefully) a (return).
    The above simplified account needs to be extended in a number of ways. The problems to be considered are as follows.

The role of spaces
Except within the 'item value', spaces are treated as terminators to the current syntactic unit. They are not echoed onto the screen, but the places where they would most naturally occur are occupied by the teletext colour characters, and hence appear as spaces to the user.

Correcting errors
Within any group of characters it is possible to use the backspace to delete backwards down the string until no relevant character is left. A further backspace 'deletes' the input line, scrolling it up the screen. The BBC Microcomputer has an in-built editor using cursor control and copy keys and this can then be used to re-input. The input from the use of the copy key will contain teletext colour codes, and these are discarded.

Terminal punctuation
Items should be terminated with either a comma, a question mark or a full stop. Unfortunately these characters may also occur within an item value, and the system cannot know whether these characters are part of the white value or yellow punctuation characters until the next character is keyed. Special code is needed to carry out the backtracking to rectify the ambiguity.
Choice of colours
To a certain extent the choice of colours is arbitrary. However, there are a number of factors to be considered. The first is colour contrast. Blue and red show up best on a white background, whereas yellow, green, cyan and white show up well on a black background. This is reflected in the monochrome displays generated by the BBC computer, and some users must be assumed not to have colour. As a result it is natural to use blue and red in the systems message window and the other colours in the main and input windows. Red is often associated with danger, and needs to be used in moderation. Its use is restricted to indicating errors and the special characters in the middle of an item, which is where there is the greatest likelihood of error on input. The 'unprocessed' text of the value is echoed in white, which is the normal colour for text input on the BBC computer.

    All components, except the item name and the terminal punctuation, are optional, but there are certain restrictions which must be observed. These form the basis of a series of assumptions that the system can make and indicate to the user as he types. By taking advantage of these features the user can minimize his keystrokes - with an expanded coloured text appearing on the screen. The relevant features are as follows.

Alphabetic case
Alphabetic characters are accepted, and echoed, in the case in which they are typed. However the system converts all item names to upper case for internal tokenizing. All coloured items displayed in the user window area have the item names capitalized.

Closing brackets
If a qualifier is present and a 'spurious' character is located, a green ) will be inserted onto the screen and the input character will appear in a different colour, depending on its nature.

Set partitions
The item name defines a set, the item value defines a member of the set, and the symbols>, = and < are used to define a partition of the set. The equality symbol is by far the most commonly used (in a conventional database system it is the only type of relation between the item name and the stored value) and a red = is inserted.

If punctuation is omitted the system inserts a yellow full stop at the end of the item. The effects of these assumptions can be seen in Figure 3.
Figure 3.  Expanding lazy input in MicroCODIL
    User errors are indicated in three ways. Unexpected control characters are rejected with an audible beep, as are unexpected characters between the item value and the (return) at the end of the line. (This will normally only occur if an explicit picture restricts the length or composition of the value.) Specific error messages are displayed in the system message window for items with no item name or with a missing value. However, the use of colour will highlight any unexpected interpretation of the input - which will often be due to a miskey which is still syntactically valid, but not what the user intended. Figures 4 and 5 illustrate the effect.
Figure 4.  How an incorrect input is shown in MicroCODIL
In each of these examples the slip is highlighted by the colours in which the input characters are echoed. In Figure 4 the user is in the middle of typing an item name when an unexpected red = and a white value appear. In Figure 5 the interpretation of the characters is meant to switch from an item name to a value, but there is no change of colour to reflect the user's change of intent.

Figure 5.  Correcting an input error in MicroCODIL
If the system detects the 'end' of an item before the user presses the (return) key the item display is completed (with terminal punctuation) and additional keystrokes are echoed in red. The user may then press the (return) key to accept the item, or (delete) to reject the item, and possibly reinput a corrected version with the aid of the BBC computer's cursor 'editing'.

As was mentioned earlier, the use of the teletext mode limits the number of characters per line to 40, and also means that each change of colour takes up a character position on the screen. This effectively limits the use of colour, as frequent changes use up more of the limited screen space. However, the technique should readily translate to other languages and screens which are more generous in handling colour. There is one additional problem which would need to be tackled. In MicroCODIL the item names are all treated as equivalent, and are displayed in the same colour. In other languages it would be very useful to use colour to distinguish between the different types of language tokens. This would be no problem when displaying a file in colour, but makes it impossible to know what colour a candidate token is until it has been completely typed in. This suggests that there should be a standard input colour for such alphabetic tokens, with the colour being 'updated' when sufficient information is available to characterize it.

Using colour to highlight the different structures of a language can greatly add to the readability, and should make it easier for newcomers to computing to understand what is happening. The approach also makes it possible for the computer to signal the way it has interpreted a particular piece of text without having the to interrupt the screen display with an explicit warning or error message.

  1. Computer Concepts, 'Wordwise Mannual' 1982. (This paper was drafted using the Word wise package.)
  2. C. F. Reynolds, 'MicroCODIL as an I.T. teaching tool', University Computing, 6, 71-75, (1984).
  3. C. F. Reynolds, 'A microcomputer package for demonstrating information processing concepts', J. Microcomputer Applications, 8, 1-14, (1985).
  4. C. F. Reynolds, MicroCODIL Manual, 1986. The software, demonstration applications, and manual are available from CODIL Language Systems, 33 Buckingham Road, Tring, Herts., U.K.
  5. C. F. Reynolds, 'CODIL, part 1, the importance of flexibility', Computer Journal, 14, 217-220, (1971).
  6. C. F. Reynolds, 'CODIL, part 2, the CODIL language and its interpreter', Computer Journal, 14, 327-332, (1971).
  7. C. F. Reynolds, 'CODIL as an information processing language for university use', IUCC Bulletin, 3, 56-59, (1981).
  8. J. ColI. BBC Microcomputer User Guide, British Broadcasting Corporation, 1982.

No comments:

Post a Comment