Proceeding of the Workshop on Computing Skills and Adaptive Systems
Liverpool, March 1978, pp 77-87
A Psychological Approach to Language Design
by Chris Reynolds
Brunel University
ABSTRACT
CODIL is a computer language designed for non-mathematically oriented users with open-ended information processing applications. It originated from observations on how people tackled this type of problem and can be readily described in psychological terms. The hierarchical memory involves structures equivalent to short term and long term memory, together with an adaptive memory with learning properties. Processing is controlled by a simple but powerful decision making routine.
The language has now been developed to an operational level and is currently being tested on a wide range of applications ranging from teaching information retrieval principles to artificial intelligence tasks. A number of these are briefly described.
Introduction
Modern computer technology has grown up around the concepts of the stored program computer. To use a computer the user requires an application program that represents a precisely predefined set of rules delimiting the appropriate universe of discourse. In some cases his problem will be shared with others and there will be a readymade package available. In other cases the user will either have to write his own program or commission someone else to do it for him. This may work well for mathematicians and others who work in mathematically oriented disciplines, but it is far from satisfactory in most other areas. To the average individual the information processing problems posed by the world about him are open-ended, ambiguous and incompletely understood. To many such people a problem that is sufficiently clearcut to be dissected out and precisely predefined is no longer of interest as they already understand it. The difference is one of cultural approach and can be related to the distinction between divergent and convergent thinkers. It is probably not too far from the truth to suggest that modern computers are designed by, and for, convergent thinkers who are primarily interested in precisely defining closed problems and who have little tolerance for those who don't know exactly what their problem is or how to define it.
The research described in this paper is specifically oriented towards the person who has open-ended and poorly defined problems. It started from the study of two application areas (1) where conventional computing techniques were unable to cope with the complexities. The first involved the manual provision of management information in an international research and development organisation working in an interdisciplinary field. The second was concerned with the handling of nonstandard sales contracts in a very large sales organisation serving a wide range of customers with a variety of products. While the formal aspects of these applications were very different it was noted that the way that the people concerned tackled the open-ended and poorly defined areas was remark~ ably similar. Through serendipity it was realised that this common area could be formalised as an application-independent framework that could be implemented on a computer. Initial work on this led to some prototype software and a computer language called CODIL (2,3,4).
This has now been developed into a general purpose information processing system oriented towards mathematically unsophisticated users with open-ended problems. The design considerations have been described elsewhere (5,6). Attempts to describe the language in the formal terminology developed for conventional programming languages have been unsatisfactory simply because these techniques do not allow for the "human" approach to open-endedness and ambiguity. As a result the language is described here in terms of current models of human information processing. Details are also given of the wide range of applications on which the software has been tested.
The Representation of Knowledge
The architecture of the CODIL system revolves around a hierarchical memory structure which contains "items" of information. This is illustrated in Figure 1.
Figure 1
At the top of the hierarchy is a single item referred to as the criteria item. This is highly transitory and represents the current focus of attention. Next comes a list of ephemeral items that describe the current context. This is called the facts and can be related to human short term memory. Below this comes an adaptive memory which has certain "learning" features associated with it, while at the bottom comes the CODIL file store or data base. These files are permanent and may be carried forward from run to run. They may be related to human long term memory.
Chunks of Information
In any human or computer memory system there will be logical chunks of information out of which the knowledge contained in the system is built. The basic building blocks of the CODIL language are called items and they are designed to represent information at the level seen by the typical user. For instance, a clerk working in a sales environment will use items such as CUSTOMER NUMBER = 12345 or TOWN = UXBRIDGE. In each case the item name represents a set while the value associated with it identifies a member of that set. The user may also want to indicate ranges with items such as QUANTITY 5000 or special situations such as DAY NOT WEDNESDAY. In other cases he may not know the value but be able to relate it to the value of other items. This is done by using expressions such as TOTAL PRICE IS = QUANTITY * UNIT PRICE or CAR OWNER IS = CAR DRIVER. In other circumstances the human user will want to compress a complex idea into a single item. CODIL handles this by allowing file names to be used as items. For instance ROOM TEMPERATURE might represent a temperature range and scale while BAD DEBTS might be a list of all customers with bad debts. This level of representation seems to be the most appropriate for the people and applications studies. However, it should be realised that the fundamental architecture of the CODIL interpreter is independent of the exact format of the items and the facilities provided were chosen with a computer terminal in mind.
The Criteria Item
At any one instant the CODIL interpreter is examining one item of information in detail. This item may have come from the outside world - in computer terms from the computer terminal and in human terms from the ,sense organs via the sensory short term memory. Alternatively it may come from the file store/long term memory. This item is the equivalent of a stored program computer's current instructions. Its use will be discussed later.
The Facts/Short-Term Memory
Human short-term memory (STM) is believed to be able to handle about 7 chunks of continually changing information. The "facts" in the CODIL interpreter play a very similar role. They are a list of items, called a statement, that describes the current context of the system as currently observed. The list is associatively addressed (i.e. by name) and contains an arbitrary maximum of 31 items, a number that is over-generous for all normal applications.
In human STM the chunks of information decay from disuse and there is a mechanism to allow this to be simulated within the CODIL interpreter. However, when people are using a computer they want it to have a more reliable and predictable memory than that of a human being and as a result this form of garbage collection is of little use except for modelling human STM behaviour. An alternative scheme involving a form of global/local variable housekeeping has been adopted. This works well especially when processing files, but means that occasionally the user has to explicitly delete unwanted items from the facts.
In addition to the 31 normal items there are two groups of system items which the user can access if so desired, although they are normally invisible to him. The first group represent the input/output buffers and make it possible for the contents of these to be examined or changed as appropriate. There is no deliberate attempt to relate these buffers to human sensory short-term memory - although they could be used for this purpose in modelling exercise. The second group are all small integers representing various control parameters. Many of these are concerned with mundane matters, such as the width of the line printer being used or whether the system is running in batch or in interactive mode. Others control the dynamics of the adaptive memory and the way that conflicts are resolved when there is more than one item with the same name in the facts. It would seem likely that similar status information is available in human information processing.
File Store/Long Term Memory
The bulk of knowledge is held in the CODIL file store - which is approximately equivalent to human long-term memory. Each file consists of a list of statements, each of which can be considered to be a copy of information held in the facts. The majority of files are held sequentially and are held in a data compressed form in which common leading items are removed. This leads to the typical indented file format of a CODIL file when it is printed out - see Figure 2. Large files can be indexed and a sort facility is available with statements being retrieved in KEY order. In theory the actual storage structure of such files is irrelevant to the rest of the system as long as the storage and retrieval interface is compatible with the facts.
ASPECT = INFRARED SPECTRA,SUBSTITUENT = HYDROXY,COMPOUND = PYRIMIDINE,REFERENCE = BROWN, HOERGER & MASON, 1955A,POSITION = 2.POSITION = 4.REFERENCE = TANNER, 1956,POSITION = 5.POSITION = 6.COMPOUND = PTERIDINE,REFERENCE = MASON, 1957.ASPECT = DIPOLE MOMENTS,SUBSTITUENT = HYDROXY,COMPOUND = PYRIDINE,REFERENCE = ALBERT & PHILLIPS, 1956,POSITION = 2.POSITION = 4.
Figure 2. An Extract from a file of chemical information (7).
The Adaptive Memory
Lying logically between the facts and the file store there is a special file that was originally included to assist in modelling learning processes. It contains a limited number of statements and is used in the same way as any of the files in the file store. The important difference is that each statement has an associated "activity" whose value reflects the way the statement has been used. Statements with the highest activity are held at the front of the file while those with the lowest activity are lost from the end. Facilities include the ability to vary the rates of learning and forgetting, selective reinforcement, and windows to ensure that statements are only accessed if their activity exceeds a given threshold.
Compared with conventional languages such a facility may seem to be an esoteric luxury. In fact it is invaluable for most of the artificial intelligence studies tested, and has been used to process frequency information in other application areas. It has also been used to reorganise the statements in a small "program" to get them into the correct order to produce valid results (8). In addition it is fully compatible with the rest of the CODIL interpreter and is easy to use. It would appear to mimic the learning processes that occur somewhere between human short-term and long-term memory.
The Decision Making Routine
Processing is controlled by a decision making routine designed to work in as simple and natural a manner as possible as seen by the typical user. It works on the principal of production rules (9,10), and its organisation has been described in evolutionary terms (11). It is driven by items read from the computer terminal (or batch stream) or from files held in the file store. The basic algorithm is given in Figure 3, and is described overleaf
- The criteria item is tested to see if it is at the end of a statement. If it is, it is treated as a command, otherwise it is treated as a condition.
- Non-terminal items in a statement are treated as conditions when read as the criteria item. The fact statement is searched for any matching item(s) and a comparison is made (referential failures are treated as false). Variations are possible, using item parameters, to allow the comparison of substrings, maximum and minimum values, string length, etc. If either the criteria item or the relevant fact items are expressions these are evaluated (if possible) before a comparison is made. When the criteria item is a file and there is no matching item in the facts the decision making routine is entered recursively, using the file as a source of items all of which are treated as conditions. This ability to use a file as a dynamically changing condition appears to have no equivalent in conventional programming languages.
- Items at the end of a statement are treated as commands. They can be either inbuilt system functions, files, or normal items and each is treated differently.
- The CODIL interpreter contains about a dozen system functions. These are mainly concerned with the control of input, output and internal files. When encountered the appropriate routine is entered, using the value to select the information to be transferred.
- If the item is a file the decision making routine is entered recursively using the file as a new source of criteria items. If the item has a file as a value the second file is read into the facts statement by statement, with the first file acting on each statement in turn.
- All other items are written to the facts. Normally the item will over-write any existing item of the same name, as occurs in an assignment in a more conventional system. However, the use of control parameters makes it possible to vary this, and it is also possible to carry out string operations, etc., in a manner analogous to the processing of conditions.
- If a false condition is encountered, or the end of a statement has been reached and processed, the next criteria item will come from the next statement, unscrambling any recursive entries to the decision making routine as appropriate.
- The next criteria item is obtained from the computer terminal or a file as appropriate and the process is repeated indefinitely.
The lack of any specific IF, assignment and GO TO commands should be noted. The need for explicit looping is much less than with conventional programming languages and when required is considered as the reading of a dummy file containing an appropriate number of null statements.
Applications
The important question about such an approach is what will it do and trials are being carried out at Brunel University and elsewhere to explore the language's potential. The chief applications being examined are listed below to show how they are being used to answer this question. The interpreter has been written to run on an ICL 1903A computer under the GEORGE 3 operating system. It is reasonably portable and can be used on almost any 1900 computer from a 1901 upwards and is virtually operational on the CDC 6600 at ULCC. The interpreter is written so that it can be used in interactive or batch mode.
Teaching
The first question is how well does the software interface with users. More trials are needed, particularly on a computer system with a fast response time. However, it has been used at Guildford County College of Technology to teach HNC chemists the principles of information handling and this is being extended to include biologists as well.
Information Storage and Retrieval
The next question concerns the ability to store and retrieve information in a variety of ways. Test applications include chemical information (12) biological information (13) and document reference files. The first of these involved the abstracting of part of a graduate level university text book into CODIL and using the resulting poorly structured files to answer questions on specific topics and to produce monographs on selected chemical compounds. In this capacity the language provides a general purpose data base system for research workers with poorly structured information (7). This work is being followed by examining three applications, each currently involving some half a million characters of data.
Computer Usage Statistics
CODIL is designed to handle irregularly structured data and the processing of weekly computer statistics was chosen to test out its performance on a conventional data processing application, picking up data from another system. This involves a peak throughput of about 1000 "transactions" a week. It identifies the student or staff member responsible for each file or job and generates a series of tabular summaries on the facilities used.
Hospital Clinical Data
Following a series of hospital trials (14) at Hillingdon Hospital CODIL is being used to handle clinical research data for cardiac arrest and cardiac surgery patients (15). This is providing valuable experience on the use of the language in non-trivial open-ended situations. For technical reasons this has had to be done in batch made but it is hoped to provide a suitable terminal facilities earlier this year.
Genealogy
CODIL is being used to handle biographical and relationship data. The data base currently includes details of some 1300 individuals and the generation of a detailed family tree can be considered as a major parts explosion with the file being used re-entrantly. This is providing valuable experience on the problems of providing adequate file indexing and update facilities within an amorphous file framework.
Artificial Intelligence
A number of artificial intelligence-type applications have been run and this has helped to clarify the common areas between CODIL and psychology. Three areas of the work are described below.
Adaptive Production Systems
Recent work by Waterman (16) on learning to recognise patterns in a string of characters has been repeated (10,17). A small CODIL file has been set up that uses production system principles to generate production systems with a wide range of learning characteristics which have not yet been thoroughly explored. This is an application that would not be possible without the built in adaptive memory.
Simple Sentence Recognition
One of the limitations of CODIL when considered as a model for human information processing is that, while it may be easy to read and write, it is not English. This suggested that it might be worth asking how easy it would be to set up a natural language processing system in CODIL. Initial trials have shown that CODIL can support very useful dictionary facilities (18) but the almost complete lack of explicit structural syntax represents , significant problem that needs further study.
Heuristic Problem Solving
CODIL was designed for people with open-ended but logically straightforward information processing problems and it was decided to see how easy (or difficult) it was to use the language in more sophisticated ways. It was decided to write a small heuristic/deductive problem solver by making appropriate use of the facts/ short term memory, the adaptive learning memory, and the decision making units. The result was extremely successful (8). A wide range of problems have been solved, including 15 consecutive New Scientist Tantalizers. Compared with other published problem solvers it appears to be much smaller and faster and to have a far more convenient user interface.
Other Applications
A wide range of small scale tests on other applications have been carried out. These include several tests of commercial data processing problems including a very small subset of the sales accounting tasks that helped to suggest the approach in the first place. In addition some studies have been carried out using CODIL to support relational data base facilities.
Conclusions
This research started out with the aim of finding a user-oriented technique for handling open-ended problems. This aim has been achieved in as far as there is now operational, and reasonably portable, software in use at a number of sites and it is hoped to extend field testing still further in the near future. In addition there has been some valuable spin-off in terms of psychological modelling and artificial intelligence research.
Acknowledgements
I would like to thank my colleagues whose comments have made an invaluable contribution to my work. Also the many students and others who have used the CODIL interpreter. In addition the S.R.C. have made a valuable contribution toward the problem of getting more portable software and organising field trials, etc.
References
- Reynolds, C.F., "A new look at the problem of open-ended applications", Proc. Pragmatic Programming and Sensible Software, Feb. 1978, pp. 239-251.
- Reynolds, C.F., "An information language for man-computer interaction", Man-computer Interaction, lEE Conference Pub. No. 68, 1970, pp. 211-217.
- Reynolds, C.F., "CODIL Part 1, The importance of flexibility, Comp. J., 1971, 14, pp. 217-220.
- Reynolds, C.F., "CODIL Part 2, The CODIL language and its interpreter", Comp. J., 1971, 14, pp. 327-332.
- Reynolds, C.F., "Designing an interactive language for the pragmatic user", Proc. European Comp. Conference, 1974, pp. 991-10006.
- Reynolds, C.F., "Matching the computer language to the user", BruneI University Comp. Science Dept. Report CSTR/14. 1977.
- Reynolds, C.F., "A data base system for the individual research worker", Proc. Int. Sym. Tech. Selective Dissemination of Information, I.E.E.E. 1976, pp. 1-8.
- Reynolds, C.F., "TANTALIZE, a conversational problem solver written in CODIL", Second conference on recent topics in cybernetics, 1975.
- Davis, R. and King, J., "An overview of production systems", Machine intelligence, 8, pp. 300-332.
- Reynolds, C.F., "The design and use of a computer language based on production system principles". CSTR/15, 1978. BruneI Univ. Comp. Science Dept. Report
- Reynolds, C.F., "An evolutionary approach to artificial intelligence", Datafair 73, 314-320, 1973.
- Reynolds, C. F., "Processing poorly structured chemical information" CODIL Technical Note No. 2. BruneI University, 1976.
- Reynolds, C.F., Handling cave fauna records on a computer", Trans. Cave Research Group of Great Britain, 1971, 3, pp. 160-165.
- Neal, L.R., "The Computer handling of medical information for research purposes", Proceedings, Medinfo 77, 1977.
- Reynolds, C.F., Sutton, G. and Shackel, M. "Using CODIL to handle poorly structured clinical information", Medical Informatics Europe (October 1978).
- Waterman, D.A. "Adaptive production systems", Complex Information Processing Working Paper No. 285, Psychology Dept. Carngeie~Mellon University, 1974.
- CODIL Newsletter No. 5, BruneI University 1976.
- Reynolds, C.F. "A simple semantic sentence analyser", CODIL Technical Note No. 5, BruneI University 1976.
- Reynolds, C.F. and Ornrani, D. "Formalism or Flexibility?", Int. Conference on the management of data (June 1978).
No comments:
Post a Comment