Friday, 27 May 2011

Reprint: Designing for the Pragmatic User, 1974

Proceedings, The European Computing Congress, pp 991-1006, 1974

Designing an interactive language for the pragmatic user
Department of Computer Science Brunel University, Uxbridge.
The act of programming involves the precise definition and subsequent implementation of an abstract model of an application. However a very large number of people see the world in concrete rather than abstract terms. Any data processing problem that such a pragmatic user might wish to carry out for himself is almost certainly poorly defined (in programming terms) and is often open-ended. Even if the problem were well defined the user would normally be unable to map it into a suitable abstract model without considerable professional help. For these reasons conventional programming languages are totally inapplicable for this class of user.
    This paper shows how, by dramatically discarding those aspects of language design that involve abstraction, it is possible to define a language system that is both extremely simple yet general enough to handle poorly defined and open ended problems. Details are given of the implementation of the CODIL language.

1. Why is Data Processing so Difficult?
If we look at the current state of interactive computer systems we find many different simple languages for numerical work. Almost anyone with a numerical problem which they understand and can solve themselves is capable, with the minimum of assistance, of running the problem on a computer. The same is not true of non-numerical work in that only a small minority of people would, with a similar amount of tuition and assistance be able to computerise their day-to-day information processing problems .
Why do we take it for granted that a laboratory assistant will use Basic to analyse his experimental measurements while we recoil in horror at the idea of a sales clerk writing his own invoicing program. Before we can design simple easy-to-use languages for non-numerical information processing problems it is essential to understand why, in computer terms, such problems appear to be so much harder than numerical problems.
This paper looks at the difficulties of designing a language suitable for pragmatic users with poorly defined non-numerical problems which they handle in purely concrete terms. It shows how by ruthlessly stripping information processing to its bare essentials it is possible to get a system which is simple enough for the proposed user and yet general enough to handle open-ended problems. Finally it looks at the design of the CODIL language (1, 2) to show how such a system has been implemented and how many of the facilities left out of the design process may be reintroduced in a modified form without disturbing the simplicity or generality of the system as a whole.

2. A Thumbnail Sketch of our User.
Before we start to design a suitable language it is essential to get a suitable definition of our user. The importance of this was emphasised by M.V. Wilkes in 1967 (3) when he said:
"From the system designer's point of view, the important thing about multiple access is that we now regard the users as being within the system instead of outside. One of the peripheral devices that can be connected to a computer is now seen to be a console with a human operator working at it, and knowledge of the operator's characteristics is as important to the system designer as a knowledge of the characteristics of, say, a magnetic tape deck."
The pragmatic user we are to consider has, like all other human beings, been processing information automatically from babyhood without any specific education on the subject. He will be able to tackle a vast range of problems such as crossing the road, reading a newspaper, playing games, carrying out his normal employment, etc. The most notable feature of his inbuilt information processing capability is its extreme flexibility and its ability to deal with novel situations. On the other hand he started to learn simple arithmetic at the age of 5 or earlier and after 10 years intensive indoctrination will have left school without, in many cases, achieving a pass in a-level mathematics. Apart from simple weights, measures and prices he never uses any of it. In fact he will probably have a distinct inferiority complex about mathematics and may be unhappy about anything that requires him to use an overt mathematical technique. Even worse he will be mystified by anything which assumes ability to analyse a problem in abstract terms.
His potential problems will involve the storage, retrieval and processing of information from any of a very wide range of application areas. However the chief difficulty is that he will know nothing of the limitations of conventional programming techniques and will expect much of the flexibility and open-endedness of the only information processing equipment (i.e. people) of which he has experience. This means that if he is to use computers constructively the language he uses must not only be very simple but also give him the degree of flexibility and open-endedness he will expect.
How well is this category of user catered for?
If we look at the recent review on man-computer dialogues by J. Martin (4) we find that virtually all existing systems fall into a small number of categories:
(1) Data collection systems, where the user answers questions posed by the system.
(2) Question answering systems where the user interrogates an extant data base.
(3) Systems where it is assumed that the user has some knowledge of mathematics and/or programming.
(4) Combinations of the above.
There is an almost complete absence of languages in which a pragmatic user can set up and run his own problems from scratch. The aim of this paper is to show how it can be done.

3. Simplicity versus Generality?
Obviously any language designed for the pragmatic user must be very simple and very easily learnt. It must be robust in the extreme and produce meaningful diagnostics when provoked. A beginner must be able to get results quickly so as not to be discouraged and the number of system facilities that have to be learnt must be small.
On the other hand our user is going to have open-ended problems, which will often be poorly formulated (in conventional programming terms), and which may change dynamically with time. These may cover a wide range of non-numerical information processing applications and may involve some mathematics although this will not be the system's primary purpose. Quite clearly a system that allows the user considerable flexibility in some areas must not leave him hanging in mid air if he tries to wander over the edge of the system's capabilities. This means that the limitations of the system must be readily apparent to the user so that he does not try anything that is impossible.
This leads to an apparent impasse. The user requires extreme simplicity on one hand and considerable generality on the other. However if we start from a language like COBOL there seems little possibility of designing either a data division simple enough for our user or one powerful enough to be able to handle dynamically changing data structures without recompiling programs or invoking other overheads. The possibility of simplifying and extending the facilities simultaneously is so remote as to be in the realms of science fiction.
The situation is no better if we start from a language such as Algol 68. While this allows more 'general' definition of structures its highly abstract approach makes it totally unsuitable for the non-mathematical user. Again interactive languages of the Basic or Joss types are simple enough for our user to carry out numerical calculations on arrays of numbers (if he is sufficiently numerate) but have nothing to help him with essentially non-numeric problems of the open-ended kind being considered.
In all these cases there is apparently a choice between simplicity or a wide range of facilities and that, because simplicity and generality are at the opposite ends of a spectrum one cannot have both at once. This is true only if you consider that a system becomes more general every time you add a new facility. On this basis a system increases its generality as the number of things it explicitly allows you to do increases and this approach is clearly enshrined in most manufacturers' operating systems. On the other hand there is no doubt that the bigger and more complex the operating system the more limitations there are of the type - "facility A may not be used in conjunction with facility B, C and D." etc. This suggests that the application of lateral thinking might be useful and that a system becomes more general as you decrease the number of things it implicitly prevents you from doing.
4. What shall we throwaway?
In designing a system for our user we want to maximise both simplicity and generality. To achieve this goal we must decide which 'facilities' of conventional systems can be discarded in order to increase both simplicity and generality. The basic problem is where to wield the axe and how drastic should the paring be. The reader is warned that in the following paragraphs the pruning is quite deliberately carried to the limit and many fundamental concepts are consigned to the bonfire in the process. The justification for such apparent violence is the straightforward and incontrovertible one that the result is a very simple, very general working system! In addition many of the abandoned facilities can be reintroduced into the simplified framework if they are required.

4.1 Processing Efficiency.
There are a large number of languages designed to handle efficiently problems which are well defined and programmed by professionals, do not change with time, and which involve highly repetitive processing. We are seeking a way to let the pragmatic user handle poorly defined open-ended problems himself and until this capability is provided it is irrelevant to even consider "the efficiency of processing".

4.2 No Hardware Dependent Features
For obvious reasons any facilities which are oriented towards a specific processor are not acceptable. The user should not be aware of storage devices as such but simply have access to a virtual memory containing information of interest to him. In addition the system should work equally well in either online or batch mode so that the user may freely switch between them if appropriate. This simplifies the system from the user's point of view in that there is only one set of rules to learn and generalizes the system in that it may be used interchangeably over a range of machine environments.

4.3 Minimise the use of Abstract Concepts.
As our user is not a mathematician he will not want to use the more abstract mathe~3tical concepts that would be used by more numerate users, and there is no need for the proposed system to provide these. However he may also find that certain concepts that underly existing programming languages are difficult to understand in the context of his problem. This is because he is not used to interpreting his own information processing activities in terms of an abstract model and for much of the time he considers individual cases rather than generalizations. This is in complete contrast to the mathematician who is trained to make such abstractions and who often insists that they are made. For instance a statistician will endeavour to ensure that his information is well formed before analysis and this may lead to results which seem bizarre to the non-mathematician. For instance hospital doctors recently protested about a form which, among other things, asked them to fill in the disease the the patient was suffering from. They pointed out quite correctly, that patients often suffer from several diseases at once. Obviously we do not want to force him to standardise his information into a mathematically precise framework if:
(a) he doesn't know how to do it
(b) it won't fit into one because the information is not well-formed
(c) it will fit but the requirements may change with time
Clearly the ability to handle poorly structured data will help the pragmatic user and it extends the generality of the system by allowing a wider range of information to be manipulated. In fact to continue our ruthless pruning policy we will eliminate any facility that may only be used on well formed information. One of the victims of this action will be the concept of an array.

4.4 Eliminate Formal Record Structures.
If our user wants to handle poorly structured data there will clearly be difficulties in defining conventional record structures using techniques analogous to a COBOL data description. From the point of view of our user such a data description means something else to learn. From the point of view of open-endedness a record description not only defines the structure of the information to be handled but also limits the system so as to prevent it from correctly handling information that does not precisely conform to the predefined standard. (Obviously this type of restriction is essential if our user is merely a "data monkey" shovelling information into a large commercial system designed and programmed by computer scientists to carry out a particular job at maximum cost/effectiveness. However this is not the type of user environment we are considering. )

4.5 Throw out Arithmetic Expressions
While we are in a masochistic mood let us throw out arithmetic expressions because our user will not want to use them very often. This is done in order to emphasise the minor role that such a facility will play in a system designed principally for non-numerical information. Under no circumstances must we allow the problems of such expressions to prejudice the overall design requirements for the simplest possible system. It will be shown later that it is very easy to reintroduce expressions but the next step in the pruning process is difficult to make unless expressions are temporarily removed from the scene.

4.6 Why make a distinction between program and data?
A mathematician will find it very easy to distinguish between the concepts of program and data. He is, for instance, extremely unlikely to confuse an algorithm for calculating factorials with the positive integers on which the algorithm operates. However as we move to less well defined non-numerical problem areas the distinction often becomes blurred. For instance, if I say "Brunel University is at Uxbridge" you interpret this as an instruction if you had asked me how to get to this conference. However the grammar of the sentence in no way corresponds to that of an explicit command.
The difficulty is best illustrated by the problem of sales contracts. Now a sales contract is an agreement between the customer and the vendor detailing the conditions under which goods and money change hands. It will contain clauses beginning with phrases such as "If the Vendor fails to ..." and it is clearly a verbal statement of the rules covering the transaction. If we insist that information can always be partitioned into program and data it would, at first sight, seem obvious that a contract is an algorithm. Unfortunately no existing commercial data processing language allows this to be done. Instead considerable clerical effort is expended, in many organisations at least, to convert as many of the contracts as possible into a small number of standard data formats. At the same time programmers and systems analysts may be struggling with very complex procedures, riddled with continually changing exception routines, in order to reconstruct the algorithm implied by the data. This scrambling/unscrambling process is taken for granted simply because there is no alternative within the framework of a conventional data processing language, which insists that you make a distinction between program and data.
If we eliminate the distinction between program and data our user has even less to learn about and he can also handle open-ended problems where distinctions of this kind would be a handicap.

5. What is left?
After this concerted attack it is important to make sure that we have not thrown out the baby with the bath water. By discarding tools such as the array and the arithmetic expression we have made it harder to model an application in abstract terms but far easier to handle open-ended problems that are not readily modelled. Unattractive as such an unsophisticated approach may seem to someone with a mathematical background, it is exactly what is required for our target user. He will find this basic lack of abstraction a great help in allowing him to transfer his problem to the computer in pragmatic terms.
However he can only do this if we have left him sufficient tools. So what is left? In fact there are three important facilities that we have not discarded. The first is the ability to store items of information in such a way that the relations between them is not lost. The second is the ability to make logical deductions and the third is the ability to initiate system functions to control the input, output and storage of the information.
In the next section the basic design of the CODIL language 1S used to show how a simple, flexible system can be derived from this foundation. Once this has been done Section 7 shows how some of the facilities discarded earlier may be reintroduced in the new framework.

6. The Basic Design of the CODIL Language.
As there have been a number of publications describing various aspects of the CODIL language (1, 2, 5) the following descriptions are intended to be no more than an outline of the main features.

6.1 Data Items, Statements and Files.
The building blocks of the CODIL system are called data items. The commonest form of a data item is "data name value" but in general each item may represent either a set (i.e. data name only), an element of a set, or a simple partition of a set. Thus all the following are valid data items:
CITY = LONDON } element of set
AGE = 27 }
YEAR > 1963 partition of set
SOLVENT ANY all members of set
UNIVERSITY the whole set
A statement is simply a list of data items in any order. For instance the following is a valid statement:
In the same way a file is simply a list of statements in any order. However the order of items within statements and statements within files may be chosen in order to minimise redundancy. This means that items common to several statements need only be input, stored and output a minimum number of times, and in order to show the structure files are normally printed in a tree-like format, as in the example overleaf, although there is nothing in the processor to require the items to be held as a tree.

6.2 Simple Deductions
The CODIL interpreter works by comparing statements. For instance the statements "UNIVERSITY = BRUNEL; TOWN = UXBRIDGE". can be combined with "CONFERENCE = EUROCOMP 74; UNIVERSITY = BRUNEL". to give "CONFERENCE = EUROCOMP 74; UNIVERSITY = BRUNEL; TOWN = UXBRIDGE". The process centres round a stack called the "Facts" which is loaded on a last-in-first-out basis but which is normally addressed associatively by data name. Statements held in the Facts are compared with other statements, called "Criteria", according to a very simple algorithm. If the Criteria item is not the end of a statement it is compared with the Facts and only if it is true is the next item in the statement selected. If a terminal criteria item is selected it is by definition true and it will be automatically added to the facts unless its name corresponds to a system function or a file name, in which case it is obeyed.
This simple deductive procedure has the advantage that it is very easy to explain and does not require the user to make a rigid distinction between program and data. Any field may be validly read as either Facts or Criteria and it is sometimes useful to read a single file as both simulhaneously(5) Because the Facts are addressed associatively there is no need for items to be in any particular order, and multiple and no hit situations are automatically handled.

6.3 System Functions.
The number of functions provided with the language are quite deliberately small in order to keep the system simple. The principal ones are listed in Table 1 and it should be noted that there are no equivalents to "GO TO" or "MOVE" and that at the basic level at least "IF" has been banished. In addition the OBEY function has been modified so that a simple file name is also interpreted as "OBEY = filename" and this allows the user to create his own functions if he wants to. (Currently two CODIL files are provided to the user which allow him to search and sort files with the minimum of trouble.)
CREATE Used to set up a file to receive items direct from the terminal.
DELETE Used to delete selected items from the Facts
END Used as a terminator for various activities.
INPUT Reads value of item from terminal
OBEY Selects file as Criteria
PRINT } Selects and prints information on the terminal
READ Selects file as Facts
REPEAT  Used to control loops
STORE Transfers item(s) from Facts to the store file
 STORE FILE Used to set up a file to receive items from the Facts
Table 1.  The Principle System Functions in CODIL

7. Reintroducing the Discarded Facilities.
In order to simplify information processing sufficiently to meet the design requirements it was necessary to discard many things which are taken for granted by the average programmer or systems analyst. The aim of this section is to show that, given the basic data structure and deductive processing of the CODIL interpreter very little is lost, in that it is possible to resurrect the concept in a different form which is more applicable to our target user.

7.1 Programs and Data.
While there is no system distinction between files of "program" and files of "data" there is nothing to prevent the user from using files in whatever way he chooses. For instance, if he never OBEYs a file it can be considered to be "data" while if he frequently OBEYs a file but never READs it (except to amend it) it can be considered to be "program". This means that if the user has a problem which he can define in these terms there is nothing in CODIL to stop him.

7.2 Inter-set relationships.
All data names in CODIL are global in that the same name may be used on any file, or direct from the terminal, whenever the same set is intended. This is in marked contrast to a language like COBOL where it is essential to give different names (or qualify all references) when references to the same set occur in different structures. However sometimes it is necessary to define a relation between different sets. For instance the set of CAR OWNERs overlaps with the set of CAR DRIVERs and in certain circumstances this can be recorded by introducing a qualifier "IS" as in the item "CAR OWNER IS = CAR DRIVER".
Such items may occur anywhere on any file and can thus appear in either the Criteria or the Facts. If the item is encountered as a non-terminal Criteria item it is equivalent to "IF CAR OWNER = CAR DRIVER" and if it is a terminal item to "LET CAR OWNER = CAR DRIVER". These last two formats are allowed as an aid to anyone who has computer programming experience but all three are synonymous and the first is preferred as being context independent. Whenever such an item is encountered the Facts are searched to find an item with a data name corresponding to the 'value' of the set relation. If one is located the value is substituted in the original item, the process being repeated as required.

7.3 Arithmetic Expressions
As the inter-set relations are evaluated when they are accessed it is quite natural to use the same framework for arithmetic expressions. For instance the item TOTAL PRICE IS = UNIT PRICE * QUANTITY can appear as either Criteria or Facts and be evaluated as either a conditional expression or to produce a value to be stored or used. At the same time the facts may well contain the item UNIT PRICE IS = BASIC PRICE - REBATE. However the important thing is that at some stage it must be possible to reduce the item to the standard format of "dataname = value" and there can be no equivalent to the format "if arithmetic expression = arithmetic expression".

7.4 Record Structures.
Just because it is unnecessary to define a record structure in CODIL there is no reason why a file shall not contain only statements with a standard ordering of identically named data items. When this happens the user may take advantage of this to produce regular tabulations using the print functions, or may compress the input to a list of values and separators by using the input function.

7.5 Other Possible Features.
Recent work on CODIL has shown that it is possible to implement many conventional high level language facilities within the framework of the interpreter. Features recently introduced on an experimental basis include the indexing of multiply valued items, an ELSE qualifier and a CONTAINS relation. One difficulty is to know how far such features should be introduced. For instance, if a user's problem is sufficiently well structured for him to want to use array structures it is likely that a more mathematically orientated language would be appropriate. However it is impossible to determine the break even point without extensive trials. The other difficulty is that many such extra facilities are context dependent. For instance if the interpreter compares "TITLE CONTAINS COMPUTER" with "TITLE = COMPUTERS AND THOUGHT" it is obvious that the result of the comparison is "TRUE". On the other hand it is not obvious what action the interpreter should take if it encounters the item in a different context. This arises because, for these facilities, it is not meaningful to eliminate the distinction between program and data, and any difficulty can be avoided by restricting their use to an appropriate context.

7.6 Efficiency Considerations.
CODIL is currently working on an ICL 1903A computer at Brunel University. For convenience of implementation it has been written in a high level language, despite the fact that this plays havoc with the efficiency of the principle activity - which is processing indirectly addressed variable length character strings on a word machine. Even so the interpreter works well in 14K works (with no program overlays) and when used interactively it is difficult to use more than 1-2% of elapsed time. However all files have to be addressed sequentially (because their structure is not known in advance) and certain sections of the interpreter will need to be written in assembler so that bigger files can be handled

8. Conclusions.
CODIL meets the needs of the pragmatic user in that the language is very simple and easy to learn and it requires a very low level of abstract understanding on his part. In addition the systems flexibility and ability to cope with dynamically changing situations is closer to the user's view of the real world than are the predefined abstract models of more conventional systems programming. This ability to handle poorly defined problems extends the generality of the system in one direction but it is not possible to get something for nothing. The increase in the breadth of problems that can be processed is compensated by a decrease in depth, in that it is harder to introduce abstract concepts, especially where these are based on the assumption that the problem is a closed system. However the need for programming aptitude tests suggest that the majority of people are unable to handle the degree of abstraction of conventional computer languages. If we are to design systems which can be used by the masses in everyday situations it is essential that we concentrate on their needs and abilities and do not pander to the sophisticated abstract requirements of professional computer scientists and mathematicians who are already well supplied with a plethora of excellent languages tailor-made for their needs.

(1) C.F. Reynolds, CODIL, Part 1, The importance of flexibility, Computer Journal, vol.14, No.3, August 1971, 217-220.
(2) C.F. Reynolds, CODIL, Part 2, The CODIL language and its interpreter, Computer Journal, vol.14, No.4, November 1971, 327-332.
(3) M.V. Wilkes, The design of multiple-access computer systems, Computer Journal, vol.10, No.l, May 1967, 1-9.
(4) J. Martin, Design of man-computer dialogues, Prentice­ Hall, 1973.
(5) C.F. Reynolds, An evolutionary approach to artificial intelligence, Datafair 73, British Computer Society, April 1973, 314-320.

1 comment:

  1. NOTE: This relates to an early version of the interpreter and later version handle files and systems functions in a totally different way.