TANTALIZE

A Conversational Problem Solver written in CODIL

C. F. Reyno1ds

Department of Computer Science Brune1 University

Presented at the Second One Day Conference on Recent Topics in Cybernetics held by The Cybernetics Society at Chelsea College on 1st September 1975.

ABSTRACT

CODIL is an unconventional computer language designed for people who know little or nothing about computers or mathematics. It is particularly suited for open-ended or poorly defined information processing problems. The language interpreter models certain aspects of human decision making, short term memory and learning.

TANTALIZE is an interactive/batch heuristic problem solving aid that has been used to solve a wide range of formal puzzles. It offers a choice of heuristic constraint processing techniques in an easy to use and readily extendible format.

Taken together CODIL and TANTALIZE provide a good man-computer interface, the ability to process non-trivial real life applications and highly formal problems and a working mode1 of human information processing. This combination of facilities represents a significant advance in Artificial Intelligence research.

1	Introduction	[NOTE: The computer listings which formed part of the original paper were, in places, of very poor quality and have been manually transcribed *While the text has remained unchanged, apart from the contents and linkages, the opportunity has been taken to add a few notes, such as this one.* *The CODIL Interpreter used was the one developed for the ICL 1903 in the 1970s and included some feature that were not included in later versions of the software.]*
2	CODIL
2.1	The Basic Software
2.2	The Man-computer Interface
2.3	Real-life Applications
2.4	The Information Structure of CODIL
2.5	The Decision Making Unit
2.6	The Facts/Short Term Memory
2.7	The CODIL Learning Capability
3	TANTALIZE
3.1	The Basic Software
3.2	An Old Hindu Problem
3.3	School Colours
3.4	Heuristic Search Facilities
3.4.1	Aims
3.4.2	In depth Searching
3.4.3	Shortest Path or In-breadth Searching
3.4.4	Distance Functions
3.4.5	Goal-Directed Searching
3.4.6	The Pigeonhole Technique
3.4.7	A Comparison of Searching Techniques
3.5	Restricting the Problem Space
3.6	Simplifying the Problem Description
3.7	Extending the Power of the Problem Solver
3.8	The Problems Solved
4	Conclusions
4.1	What has been achieved
4.2	Comparison with Other Systems
4.3	The Future
	References
	Further Reading

1 INTRODUCTION

The most noticeable feature of human information processing is its flexibility over a highly diverse range of applications. The most noticeable feature of artificial intelligence research to date is its lack of success in either providing a general purpose information processing facility or in understanding how human beings may process information. The reason for this is almost certainly the great emphasis that has been put on developing mathematically sophisticated algorithms that operate on comparatively trivial universes. Chess. playing programs provide a good example of this. They operate on a board of a mere 64 squares arranged symmetrically, 32 pieces of two different colours and six different types, and a limited and precise set of moves. The interest arises from the high degree of sophistication needed in designing the algorithms. However, chess playing, while highly enjoyable, is a far from typical human activity, and there could be no grounds for saying that the optimal algorithms for long range planning in such a trivial environment can have any immediate relevance to the forward planning needed by a child learning to cross the road. In the child's case neither the universe nor the rules can be considered to be trivial or unambiguous and the goal must be to find a reasonable solution based on incomplete information. The problem does not have the symmetry and rigidity needed to use clever processing tricks and even if it did, the child is not likely to be sophisticated enough to take advantage of such techniques .

Similar objections apply to one of the frequently quoted practical applications of artificial intel1igence research, HEURISTIC DENDRAL (Buchanan et al, 1969). Its success depends on a sophisticated graph package which prepares candidate chemical structures. Such a package could, given the right interface, be of considerable use to an extremely wide range of chemists. Unfortunately the problems of interfacing the package to handle various types of measurement and many different classes of chemical compound have not been solved. As a result HEURISTIC DENDRAL is confined to one extremely simple class of spectrum and one comparatively simple category of chemical compound. As with chess playing programs, the success has been achieved by applying a highly sophisticated algorithm to an artificially simplified universe, and the result, while of use to a small group of chemists, says nothing about the far more important problem of how a chemist handles chemical information.

[NOTE: Most of the A.I. packages at the time were presented with a high degree of hype, and I choose to comment on HEURISTIC DENDRAL because my training as a chemist meant that I was in a good position to assess how little it really had achieved.]

The approach taken in this paper is in complete contrast to the above. CODIL was designed as a practical language for people with no mathematical or computer knowledge. It provides them with facilities for handling poorly defined and open-ended problems simply because one cannot expect such users to be able to predefine problems in the way needed for more conventional computer languages. The keynote has been simplicity. flexibility and the absolute minimum of mathematical sophistication, (at least as seen by the user).

From the beginning it was felt that the language's flexibility might have some relevance to the problems of human and artificial intelligence, but it is only in the last year or so that this has become obvious. In fact progress has been so rapid in a number of directions that it has been impossible to explore any but the most obvious possibilities, much less to prepare detailed reports for publication. The aim of this paper is to summarise some of the progress relevant to artificial intelligence research and give a more detailed account of one of the most significant developments - a heuristic problem solving aid called TANTALIZE.

A bibliography of the more important pub1ications of CODIL, together with some recent technical reports is listed at the end of this paper.

<	* A Number of Sets describing the Animal Kingdom are created
<	CREATE = ANIMAL: MAMMAL: REPTILE: FISH: BIRD: AMPHIBIAN: INVERTEBRATE: END.
<	CREATE = BIRD: CHICKEN: PEACOCK: OSTRICH: END.
<	CREATE = MAMMAL: MAN: DOG: CAT: HORSE: COW: ELEPHANT: MONKEY: PRIMATE: END.
<	CREATE = APE: CHIMPANZEE: GORILLA: END.
<	* "GORILLA" is tested to see which set it belongs to
<	GORILLA.
<	GORILLA ?
TRUE
<	APE?
TRUE
<	MAMMAL?
FALSE
<	* The Set "PRIMATE" is given members
<	CREATE = PRIMATE: MAN: APE: MONKEY: END.
<	MAMMAL?
TRUE
<	BIRD?
FALSE
<	ANIMAL?
TRUE

Figure 1 - Simple Set Processing

[NOTE: CODIL was originally conceived as the symbolic assembly language of a user-friendly information processor - and the above example is at that level. A similar example - written in the symbolic assembly language of a conventional store program computer would be most unfriendly.]

2 CODIL

2.1 The Basic Software

The current version of the CODIL interpreter is written in Cobol to run under the George 3 operating system on the ICL 1903A computer at Brunel University. It uses about 20k words of core and user files are paged onto a disc file (there are no program overlays). Any user of the computer may access the language via a job control macro called CODIL. At the present time a modified version is being prepared which can easily be adapted to run on any of the 1900 series, with a significant subset being available on machines as small as a 16k 1901.

2.2 The Man-computer Interface

CODIL has been designed to have as simple a man-computer interface as possible and Figure 1 illustrates how a schoolchild might define groups of animals and carry out simple tests on them. Apart from a small number of system commands and a very simple basic syntax each user defines his own language which he uses to carry out further interactions. Alternatively he uses packages such as TANTALIZE or LESSONS, or communal files, and in such cases there will already be an existing vocabulary. The emphasis CODIL places on context rules out the possibility of using anything approaching natural language at the empty interpreter level - initially there is no vocabulary or context to talk about. On the other hand there is no reason why users should not develop their own packages to converse about particular applications in a meaningful approximation to natural language.

2.3 Real-life Applications

CODIL was designed to be at its best handling non-trivial real-life problems where open-endedness and/or uncertainties in the problem definition are important. Figure 2 shows a mini-example of a poorly structured file of chemical data being searched. It has recently been used to handle clinical records for a research project at a local hospital. Obviously the more uncertain the problem the greater is the possibility of ambiguity and inconsistency, but this only reflects the ambiguity and inconsistency of the real world as seen by a participant. If there is no uncertainty and the structure of the application is known in advance there is usually no great difficulty. This is because a system designed for open-ended problems can trivially cope with closed problems - as long as the user does not wish to exploit his knowledge that the problem is closed. Under such circumstances CODIL can be considered as a naive user's Cobol, in that it provides an easy file handling and information processing system that can be used for similar, small scale, problems. A CODIL package to give a relational data base facility (Codd, 1970) is currently being developed by a research student.

<	OBTAIN = CHEMISTRY.
<	CREATE = QUESTION
C<	ASPECT = MOLECULAR ORBITAL CALCULATIONS,
C<	POSITION ANY,
C<	PRINT IS = COMPOUND.
C<	$TABULATE = 15.
C<	PRINT IS = POSITION.
C<	$TABULATE = 20.
C<	PRINT IS = CHARGE.
C<	$TABULATE = 30.
C<	PRINT IS = METHOD.
C<	PRINT.
C<	ASPECT = ULTRAVIOLET SPECTRA,
C<	STORE ALL.
C<	END.
<	STORE FILE = ANSWER.
<	QUESTION = CHEMISTRY.
ACRIDINE 1 -0.03 LONGUET-HIGGINS & COULSON ACRIDINE 2 0.05 LONGUET-HIGGINS & COULSON ACRIDINE 3 -0.01 LONGUET-HIGGINS & COULSON ACRIDINE 4 0.04 LONGUET-HIGGINS & COULSON ACRIDINE 5 -0.03 LONGUET-HIGGINS & COULSON ACRIDINE 10 0.22 LONGUET-HIGGINS & COULSON
<	PRINT ITEM = ANSWER.
FILE = HETEROCYCLIC CHEMISTRY, COMPOUND = ACRIDINE, ASPECT = ULTRAVIOLET SPECTRA, FORMAT = REVIEW, PAGE = 300, PAGE = 303, PAGE = 307. FORMAT = GRAPH, PAGE = 312. DERIVATIVE = CATION, PAGE = 321. DERIVATIVE = AMINO, POSITION = 3, SOLVENT = HYDROCHLORIC ACID, PAGE = 309.
Figure 2 - Searching a file of chemical information [NOTE: In retrospect this is an appalling example - which relates to some of the research I did in my Ph.D. some years earlier and which tries to introduce various features of CODIL without any explanation. ]

2.4 The Information Structure of CODIL

The basic building blocks of CODIL are items such as GORILLA, COMPOUND = ACRIDINE, PKl > 12, etc. These are ANDed together to form statements terminated, on input, by colons or full stop/new ine. A file is simply a list of statements and the file name is an item in its own right. Any file may be used (sometimes simultaneously) as a command, as a condition name (a major extension of the Cobol condition name concept), as data, or as a parameter to any of the file handling commands. There is no reason why a user should not designate some files as "programs" and some "data" and many conventional applications naturally lend themselves to this approach. However the power of a package such as TANTALIZE owes much to the irrelevance of these conventionally essential distinctions to the CODIL interpreter.

Figure 3

At anyone time the information held by the interpreter is not equally accessible or permanent and this is represented in figure 3. At the top of the pyramid is the highly transitory "criteria" item, which represents a window on one of the files or the computer terminal. It is the interpreter's equivalent to the current program instruction of a conventional CPU. Next comes the "facts" statement - a list of up to 31 items which are addressed associatively by name. These items are normally handled on a last in/first out basis and are often read from files. Apart from some system control items (whose names start with a $) these are the only items that can be directly addressed. Next comes the self-organising "memory" - a single file in which there is a limited addressing facility at the statement level. All other files are held in a "data base" and can only be addressed by name. Finally there may be external files held by the George 3 operating system and these have to be OBTAINed into the data base before they can be used. Apart from a limited facility affecting the facts, all references within CODIL are by name, and all addressing, paging files on and off disc, etc., are carried out automatically.

Figure 4

2.5 The Decision Making Unit

How a criteria item is processed depends on its context and the basic algorithm is given in figure 4. If the item terminates a statement it is "obeyed") if not it is compared with the current fact items and a true/false decision is made. This algorithm was chosen because it is easy to demonstrate to mathematically unsophisticated users. No deliberate attempt was made to model human information processes but the idea of automatically comparing identically formatted items of information is more anthropomorphic than the often discussed "programmed computer" model. The decision making unit can be interpreted in terms of simple comparator units linked together in series and parallel and there is no reason why a CODIL-like processing algorithm should not have evolved from successively simpler algorithms. It does not require the development of separate mechanisms for program and data neither of which would be of any use until the other had evolved

2.6 The Facts/Short Term Memory

All directly addressable items of information are held in the facts which) in the current implementation, can hold up to 31 items. This is a completely arbitrary limit and the idiotically low value -(as compared with the amount of work space available with conventional programming languages) was chosen because there was no demand for anything larger: It has been observed that the facts seldom contain more than a dozen items, and the number of items is often closer to six. Even the problem solver, TANTALIZE, is only rarely embarrassed by this limit, although it uses the facts as a stack for local/global variables. This extremely low requirement for directly addressable variables) coupled with the comparatively short lifetimes of items contained therein) are highly reminiscent of human short term memory and it is not unreasonable to draw attention to the similarity between the two. (For anyone wishing to explore the similarity a trivial amendment has been made to the interpreter that leads to the automatic deletion of items from the facts if they are not accessed within a predefined period.)

2.7 The CODIL Learning Capability

A recent addition to the CODIL interpreter is the self-organising file called MEMORY. To the decision making unit it looks exactly like any other file but it has the following additional properties:-

It contains a limited predefined number of statements.

Associated with each statement is a counter containing its "activity" and statements are usually stored in order of decreasing activity.

System variables are used to control the rate of learning, and forgetting, the level of activity needed for recall, and to provide positive or negative reinforcement of recently used statements. By varying the value of these items it is possible to construct a wide range of learning heuristics.

The result is an integrated learning facility within the CODIL system. It was originally devised to help to model learning situations but it can also be used to optimise the order of files, for internal sorting and for diagnostic tests involving frequency counts. Figure 5 illustrates the use of MEMORY to optimise a file for calculating the costs of holding stock. A description of the problem is input in top down order, copied to MEMORY, and the completely general recursive routine, CONTEMPLATE, is written and entered. CONTEMPLATE attempts to execute MEMORY and continues the process until there is no further change in the facts. By suitable reinforcement the statements in MEMORY are ordered on the number of times they have been obeyed and the result is a file in the logically correct order for sequential execution.

<	$SET UP = 15.
<	$MOVEIS NONE.
<	CREATE = STOCK COSTS.
C<	PRINT ITEM = TOTAL COST.
C<	TOTAL COST IS = CAPITAL COST + COST OF ORDERS.
C<	CAPITAL COST IS = VALUE OF GOODS + COST OF STORAGE.
C<	VALUE OF GOODS IS = AVERAGE STOCK * UNIT COST.
C<	COST OF STORAGE IS = AVERAGE STOCK * UNIT COST.
C<	AVERAGE STOCK IS = MINIMUM STOCK + ORDER SIZE/2.
C<	COST OF ORDERS IS = NUMBER OF ORDERS * ORDER SIZE.
C<	NUMBER OF ORDERS IS = ANNUAL DEMAND/ORDER SIZE.
C<	ANNUAL DEMAND IS = MONTHLY DEMAND * 12.
C<	UNIT COST = 25.
C<	UNIT STORE COST = 5.
C<	ORDER COST = 200.
C<	MONTHLY DEMAND = 100.
C<	MINiMUM STOCK = 50.
C<	ORDER SIZE = 200.
C<	END.
<
<	CREATE = COPY: STORE ALL: END.
<	STORE FILE = MEMORY.
<	COPY = STOCK COSTS.
<
<	CREATE = CONTEMPLATE.
C<	$INTEGER IS = $FACTS.
C<	MEMORY.
C<	$ACTIVITY = 1.
C<	$INTEGER IS < $FACTS; CONTEMPLATE.
C<	END.
<
<	CONTEMPLATE.
TOTAL COST = 5700
<
<	PRINT ITEM = MEMORY.
UNIT COST = 25. UNIT STORE COST = 5. ORDER COST = 200. MONTHLY DEMAND = 100. MINIMUM STOCK = 50. ORDER SIZE = 200. AVERAGE STOCK IS = MINIMUM STOCK + ORDER SIZE/2. ANNUAL DEMAND IS = MONTHLY DEMAND * 12. VALUE OF GOODS IS = AVERAGE STOCK * UNIT COST. COST OF STORAGE IS = AVERAGE STOCK * UNIT STORE COST. NUMBER OF ORDERS IS = ANNUAL DEMAND/ORDER SIZE. CAPITAL COST IS = VALUE OF GOODS + COST OF STORAGE. COST OF ORDERS IS = NUMBER OF ORDERS * ORDER COST. TOTAL COST IS = CAPITAL COST + COST OF ORDERS. PRINT ITEM = TOTAL COST.
<

Figure 5

Optimisation of the calculation of stock costs using MEMORY

3. TANTALIZE

3.1 The Basic Software

TANTALIZE is a collection of CODIL files that can be used to solve a wide range of formal puzzles. It is held as a ready formed data base on disc, having been prepared by reading in about 1000 cards. It is designed to be used in interactive mode, in which the user answers questions and types in keywords in order to describe the problem and indicate the type of solution desired. Emphasis has been put on keeping the problem definition process as simple as possible, while offering as wide a range of alternative techniques as possible.

3.2 An Old Hindu Problem

The basic features of TANTALIZE can best be illustrated via examples of how it works. To begin with a very simple problem will be discussed

<	SOLVE: OLD HINDU PROBLEM
TYPE IN A TEXT DESCRIPTION 0F THE PROBLEM, ETC
2<	* While three watchmen were guarding an orchard a thief slipped in
2<	* and stole some apples. On his way out he met the three workmen one
2<	* after the other, and to each in turn he gave half of the apples he
2<	* then had, and two beside. Thus he managed to escape with one apple.
2<	* How many apples had he stolen?
2<
2<	* Taken From "Mathematical Recreations" by Maurice Kraitchik
2<
2<	SINGLE ACTIONS.
2<	IDENTICAL NOT HIS ESCAPE.
2<	ONE SOLUTION.
2<	END.

SOLVE initializes the problem solving process. A test description of the problem is typed in as a comment together with three key words. SINGLE ACTIONS indicates that no action may occur more than once in the solution (i.e. the thief cannot pass the same watchman twice). IDENTICAL NOT HIS ESCAPE shows that all the watchman's actions are equivalent (but not HIS ESCAPE). ONE SOLUTION is used to stop the solver as soon as one of the several equivalent solutions is found.

ACTION = ?
I<	FIRST WATCHMAN & SECOND WATCHMAN & THIRD WATCHMAN & HIS ESCAPE
TYPE IN SHARED FILE
C<	AFTER IS = BEFORE.
C<	BEFORE IS = (AFTER *+ 2) 2.**
C<	END.
TYPE IN A DESCRIPTION OF HIS ESCAPE
C<	AFTER = 1.
C<	END.

Given the initial key words TANTALIZE asks the user to provide the names of the actions. a description of the common action file, and a description of HIS ESCAPE

DESCRIBE ILLEGAL CONDITIONS
C<	END.
TYPE IN GOAL
C<	EVERY ACTION.
C<	END.
WHICH ITEMS DEFINE THE STATE OF THE PROBLEM
ITEM = ?
I<	NONE

Three further questions are then asked. There are no illegal situations to complicate this problem., the goal is to EXECUTE every action (i.e. to pass all watchmen and make his escape) and there is no items of which the system should take special cognisance.

THE FOLLOWING REPRESENTS A SOLUTION OF OLD HINDU PROBLEM
START & HIS ESCAPE & FIRST WATCHMAN & SECOND WATCHMAN & THIRD WATCHMAN
	SOLUTION FOUND = 1.
	PROBLEM TITLE = OLD HINDU PROBLEM.
	ACTION = START,
	ACTION = HIS ESCAPE,
	AFTER = 1,
	ACTION = FIRST WATCHMAN,
	BEFORE = 6,
	ACTION = SECOND WATCHMAN,
	AFTER = 6,
	BEFORE = 16,
	ACTION = THIRD WATCHMAN,
	AFTER = 16,
	BEFORE = 36.
SOLUTION FOUND = 1.
<

The question asking process used about 4 seconds of computer mill time and when complete a compilation process of about 5 seconds is undertaken to organise the information into an appropriate format for the solution process. The final problems solving time took a further 2 seconds including the printing out of the answer.

3.3 School Colours

The next example involves one of the many New Scientist Tantalizers (Hollis) that have been solved, after which TANTALIZE is named. SUPERSOLVE is used because the solution used the self-organising MEMORY whose size has to be predefined.

<	SUPERSOLVE:5: TANTALIZER NO 226.
TYPE IN A TEXT DESCRIPTION 0F THE PROBLEM, ETC
2<	* TANTALIZER No 226 NEW SCIENTIST
2<	* SCHOOL COLOURS
2<	* "Tell Me, Professor Pinhole, which school does your daughter Alice
2<	* go to?"
2<	* "Let me think. Is it the one with the orange hat and the turquoise
2<	* scarf? or with the khaki blazer and orange emblem? or with the pink
2<	* blazer and orange scarf? or with the khaki scarf and pink emblem? or
2<	* with the khaki hat and turquoise emblem? I fear I cannot recollect."
2<	* "Good Heavens, Professor! However many schools are there?"
2<	* "Just four and I have one daughter at each. Bess goes to St
2<	* Gertrude's, Clare wears a turquoise hat and Debbie wears a khaki
2<	* Emblem. St Etheldreda's flaunts a pink scarf, St Faith's and orange
2<	* blazer and St Ida's a pink hat."
2<	* "And whose are those clothes flung down on the floor over there?"
2<	* "The turquoise hat and the khaki blazer belong to different girls.
2<	* As for the turquoise blazer, well, I think you might work out whose
2<	* that is for yourself."
2<	* Martin Hollis
2<	REQUIREMENTS.
2<	END.

The single key word REQUIREMENTS triggers off a modified set of questions.

SELECT = ?.
I<	HATS & SCARVES & BLAZERS & EMBLEMS & DAUGHTERS & SCHOOLS.
TYPE IN HATS
SELECT NAME = ?.
I<	HAT: ORANGE & KHAKI & TURQUOISE & PINK.
TYPE IN SCARVES
SELECT NAME = ?.
I<	SCARF: TURQUOISE & ORANGE & KHAKI & PINK
TYPE IN BLAZERS
SELECT NAME = ?.
I<	BLAZER: KHAKI & PINK & ORANGE & TURQUOISE
TYPE IN EMBLEMS
SELECT NAME = ?.
I<	EMBLEM: ORANGE & PINK & TURQUOISE & KHAKI
TYPE IN DAUGHTERS
SELECT NAME = ?.
I<	DAUGHTER: ALICE & BESS & CLAIRE & DEBBIE
TYPE IN SCHOOLS
SELECT NAME = ?.
I<	SCHOOLS: ST GERTRUDE & ST ETHELDREDA & ST FAITH & ST IDA
TYPE IN REQUIREMENTS
C<	HAT = ORANGE; SCARF = TURQUOISE.
C<	BLAZER = KHAKI; EMBLEM = ORANGE.
C<	BLAZER = PINK; SCARF = ORANGE.
C<	SCARF = KHAKI; EMBLEM = PINK.
C<	HAT = KHAKI; EMBLEM = TURQUOISE.
C<	DAUGHTER = BESS; SCHOOL = ST GERTRUDE.
C<	DAUGHTER = CLAIRE; HAT = TURQUOISE.
C<	DAUGHTER = DEBBIE; EMBLEM = KHAKI.
C<	SCHOOL = ST ETHELDREDA; SCARF = PINK..
C<	SCHOOL = ST FAITH; BLAZER = ORANGE.
C<	SCHOOL = ST IDA; HAT = PINK.
C<	END.
TYPE IN A DESCRIPTION OF ANY ILLEGAL SITUATION
C<	HAT = TURQUOISE; BLAZER = KHAKI.
C<	END.
TYPE IN A DESCRIPTION OF THE GOAL
C<	DAUGHTER (NUMBER) = 4.
C<	END.
THE FOLLOWING REPRESENTS A SOLUTION OF TANTALIZER NO 226 SOLUTION FOUND = 1, PROBLEM TITLE = TANTALIZER NO 226, START SELECT = HATS, HAT = PINK, SCARF = ORANGE, BLAZER = PINK, EMBLEM = KHAKI, DAUGHTER = DEBBIE, SCHOOL = ST IDA, HAT = TURQUOISE, SCARF = KHAKI, BLAZER = ORANGE, EMBLEM = PINK, DAUGHTER = CLAIRE, SCHOOL = St FAITH, HAT = KHAKI, SCARF = PINK, BLAZER = TURQUOISE, EMBLEM = TURQUOISE, DAUGHTER = ALICE, SCHOOL = ST ELTHEDREDA, HAT = ORANGE, SCARF = TURQUOISE, BLAZER = KHAKI, EMBLEM = ORANGE, DAUGHTER = BESS, SCHOOL = ST GERTRUDE.

The question answering session, together with some pre-processing of files, takes 12 seconds in this case, while the compilation process takes 42 seconds. This is because extensive pre-processing of the requirements is undertaken to ensure that unlikely combinations of items are not generated. Once this has been done the solver only takes 6 seconds to find the solution. (Without the pre-processing phase the time taken to find the answer could easily be several hours).

3.4. Heuristic Search Facilities

3.4.1. Aims

As TANTALIZE is designed as a problem solving aid it is clearly essential that it offers a choice of heuristic search facilities. In deciding how this should be done three main factors have been taken into consideration:-

TANTALIZE should allow as many of the established heuristic search techniques as possible (see for example Michie, 1971). Facilities are included for in depth searching, shortest path searches,· and directed searches using system generated or user defined distance functions.
It should be possible for the search technique to change during the problem solving process. Thus a shortest path search can be speeded up by using a distance function produced by backtracking from the goal. This produces an effect analogous to the bidirectional search of Pohl (1971), although the algorithms used are different. Alternatively a directed search that fails to find the goal quickly might take a broader look at possible alternative leads reverting to the full power of a directed search only when a suitably promising lead presents itself.
Most existing problem solvers produce results by exploiting the high speed and effectively unlimited working storage of modern digital computers, the resulting brute force solutions tell us nothing about how humans solve problems. TANTALIZE is already restricted by the limited size of the facts and it is felt that it should contain facilities which prescribe other aspects of brute force approaches. . At the present time these take the form of adjustable limits to its ability to remember possible backtracking paths. With tight limits a good distance function will still rapidly find the goal, while a weak distance function will rapidly become lost and terminate (In theory TANTALIZE should then generate an alternative distance function or' ask the user to provide a new one, but to date such a facility has not been implemented).

3.4.2. In depth searching

In many of the problems the choice of a heuristic search technique is not the most critical aspect in the description of the problem, and as a result the default technique in TANTALIZE is a simple in depth search. In this an action is selected and if it is legal, successive actions are selected and the process repeated until either the goal is reached or no further progress is made, In the latter case the solver automatically backtracks until alternative actions can be tried.

This type of search is the one used in the two examples discussed above. In the case of the Old Hindu Problem, there are a number of equivalent answers and in depth searching is sure to find one of these very quickly.

In the School Colours problem a subsidiary aim is to show that the solution is unique and the whole problem space must be searched. As the in depth searching facility is the fastest available to TANTALIZE, this is again the method of choice. However this technique leaves all intermediate information in the facts and in some cases, particularly where the technique is not really suitable, there is a possibility of the limit of 31 items being reached.

TYPE IN A TEXT DESCRIPTION OF THE PROBLEM, ETC.
2<	* There a two water jugs capable of holding 5 and 8 gallons
2<	* respectively. Either jug may be filled at the tap, or emptied into
2<	* the drain, or poured into the other jug. The object is to place
2<	* exactly 2 gallons in the 5 gallon jug.
2<	SHORTEST.
2<	END.
ACTION = ?
I<	FILL LITTLE JUG & FILL BIG JUG & EMPTY LITTLE JUG & EMPTY BIG JUG &
I<	POUR LITTLE JUG @ POUR BIG JUG
TYPE IN A DESCRIPTION OF FILL LITTLE JUG
C<	LITTLE JUG = 5: END.
TYPE IN A DESCRIPTION OF FILL BIG JUG
C<	BIG JUG = 8: END.
TYPE IN A DESCRIPTION OF EMPTY LITTLE JUG
C<	LITTLE JUG = 0: END.
TYPE IN A DESCRIPTION OF EMPTY BIG JUG
C<	BIG JUG = 0: END.
TYPE IN A DESCRIPTION OF POUR LITTLE JUG
C<	LITTLE JUG IS > 8 - BIG JUG,
C<	LITTLE JUG IS = LITTLE JUG + BIG JUG - 8.
C<	BIG JUG = 8.
C<	ELSE,
C<	BIG JUG IS = BIG JUG + LITTLE JUG..
C<	LITTLE JUG = 0.
C<	END.
TYPE IN A DESCRIPTION OF POUR BIG JUG
C<	BIG JUG IS > 5 - LITTLE JUG,
C<	BIG JUG IS = BIG JUG - 5.
C<	LITTLE JUG = 5.
C<	ELSE,
C<	LITTLE JUG IS = BIG JUG + LITTLE JUG..
C<	BIG JUG = 0.
C<	END.
TYPE IN A DESCRIPTION OF ANY ILLEGAL SITUATION
C<	END.
TYPE IN A DESCRIPTION OF GOAL
C<	LITTLE JUG = 2: END.
WHICH ITEMS DEFINE THE STATE OF THE PROBLEM
ITEM = ?
I<	BIG JUG & LITTLE JUG
BIG JUG = ?
I<	0
LITTLE JUG = ?
I<	0
THE FOLLOWING REPRESENTS A SOLUTION OF WATER JUGS
FILL LITTLE JUG & POUR LITTLE JUG & FILL LITTLE JUG & POUR LITTLE JUG
	ACTION = FILL LITTLE JUG, ACTION = POUR LITTLE JUG, ACTION = FILL LITTLE JUG, ACTION = POUR LITTLE JUG, BIG JUG = 8, LITTLE JUG = 2.
SOLUTION FOUND = 1.

Figure 6

The Water Jug Problem

3.4.3. Shortest Path or In-breadth Searching

In problems such as Filling the Water Jugs, figure 6, the aim is to find the shortest of a number of possible answers. By using the SHORTEST key word TANTALIZE carries out a systematic search stage by stage. Each action is applied in turn to the initial problem state and the resulting intermediate states are stored, to become new initial states in the subsequent pass. This continues until an answer is found.

3.4.4. Distance Functions

For any goal directed search, a distance function is needed to assess the nearness of the intermediate solution state to the goal. This can be a user supplied function that provides a RATING value for the problem state. Alternatively there are two system generated distance functions available. DISTANCE sets a RATING proportional to the number of items in the goal that are correct in the problem state. BACKTRACK runs the solver backwards from the goal to a given depth and records the distance of each generated state from the goal. The resulting function forms a potential hollow that guides the solver rapidly to the goal. BACKTRACK may be used in conjunction with either DISTANCE or a user defined function.

3.4.5. Goal-Directed Searching

The basic Graph Traverser type search is activated by the word DYNAMIC. This uses the self organising MEMORY to hold the intermediate states, and the basic cycle of operations is to select the state with the highest rating, apply all feasible actions, testing for GOAL, ILLEGAL, etc., and storing new states back in the MEMORY with an activity proportional to their rating.

The backtracking ability of this technique can be reduced in two ways. The first is to restrict the size of MEMORY so that it cannot hold all possible states. The second is to apply a forgetting function to the learning memory (the numeric value associated with DYNAMIC is used), to encourage retention of recent, rather than old, intermediate states.

3.4.6. The Pigeon Hole Technique

One of the aims in designing TANTALIZE is to allow the search techniques to vary during a search. To achieve this a pigeon hole technique is used to process intermediate problem states. When this is done a maximum and minimum value for PRIORITY are used. If the RATING for an intermediate state exceeds the maximum, an in depth search is carried out for as long as RATING continues to exceed the maximum. If the RATING is less than the minimum the intermediate state is discarded Other states are assigned to one of five pigeon hole files depending on the actual value of RATING. At appropriate intervals the highest rated pigeon hole is emptied into a work file and used to continue the search.

This technique produces an in depth search if the RATING always exceeds the maximum priority and a shortest path search if the RATING is constant and within the priority range. If a distance function is used progress is made along a moderately broad front. This can be narrowed by use of the MAXIMUM RATING key word which 'boosts' the RATING of any state which exceeds the previous maximum. In many case this narrowed front is only a single state and this approximates to a goal directed search in terms of the number and order of states examined.

TYPE IN A TEXT DESCRIPTION OF THE PROBLEM, ETC.
2<	* In this problem the aim is to move a knight from a given
2<	* starting position to a given finishing position.
2<	DYNAMIC.
2<	END.
ACTION = ?
I<	NNE & ENE & ESE & SSE & SSW & WSW & WNW & NNW
TYPE IN A DESCRIPTION OF NNE
C<	X < 8; Y < 7; X IS = X + 1: Y IS = Y + 2: END.
TYPE IN A DESCRIPTION OF ENE
C<	X < 7; Y < 8; X IS = X + 2: Y IS = Y+ 1: END.
TYPE IN A DESCRIPTION OF ESE
C<	X < 7; Y > 1; X IS = X + 2: Y IS = Y - 1: END.
TYPE IN A DESCRIPTION OF SSE
C<	X < 8; Y > 2; X IS = X + 1: Y IS = Y - 2: END.
TYPE IN A DESCRIPTION OF SSW
C<	X > 1; Y > 2; X IS = X - 1: Y IS = Y - 2: END.
TYPE IN A DESCRIPTION OF WSW
C<	X > 2; Y > 1; X IS = X - 2: Y IS = Y - 1: END.
TYPE IN A DESCRIPTION OF WNW
C<	X > 2; Y < 8; X IS = X - 2: Y IS = Y - 1: END.
TYPE IN A DESCRIPTION OF NNW
C<	X > i; Y < 7; X IS = X -1: Y IS = Y + 2: END.
TYPE IN A DESCRIPTION OF ANY ILLEGAL SITUATION
C<	END.
TYPE IN A DESCRIPTION OF GOAL
C<	X - 8; Y = 8: END.
WHICH ITEMS DEFINE THE STATE OF THE PROBLEM
ITEM = ?
I<	X & Y
X = ?
I<	0
Y = ?
I<	0
THE FOLLOWING REPRESENTS A SOLUTION OF KNIGHTS MOVE
NNE & ENE & NNE & NNE & SSE & NNE
	SOLUTION FOUND = 1, DYNAMIC = 0, ACTION = NNE, ACTION = ENE, ACTION = NNE, ACTION = NNE, ACTION = SSE, X = 7, Y = 6, ACTION = NNE, X = 8, Y = 8.
SOLUTION FOUND = 1.

Figure 7

Moving a knight across the chess board

[NOTE: The descriptions of the moves could be simplified by leaving out the conditional tests and making moves off the edge of the board illegal.]

3.4.7. A Comparison of Searching Techniques

A full comparison of these different approaches to heuristic search is not possible in a short paper of this nature. Instead figure 7 gives a basic description of the simple problem of moving a knight from one corner of a chess board to another. Figure 8 shows the route taken for a number of different key word descriptions. The first, in depth, search finds the answer by accident after taking a tortuous route. SHORTEST finds the goal after having examined every other square on the board (this is the worst case for this type of search). The other techniques pursue more or less direct routes, all of which are satisfactory and for this simple problem there is not a lot to choose between them.

Figure 8: Knights moves using different heuristics

3.5. Restricting the Problem Space

Important as search techniques are in many situations, it is often more profitable to minimise the problem space to be searched. In some cases it is sufficient to find one solution and then stop, as , was done in the Old Hindu problem given earlier. In some such situations a facility such as OPTIMISE, which optimises the order in which actions are selected by comparison with GOAL, may give a significant improvement, by :increasing the probability of a solution being found early in the search.

3.6. Simplifying the Problem Description

Bigger savings can be achieved by only selecting actions when they are likely to produce profitable results. This is done automatically for all actions containing conditions, but additional tests are possible. For instance, SINGLE ACTIONS ensures that named actions are not used twice in any solution, while CHECK ensures that executing the action will change the current problem state. Other key words help to avoid the permutation of actions when a combination is adequate to find or eliminate a possible route to the goal. By choosing key words appropriate to the problem in hand, it is often possible to reduce~ considerably the number of intermediate problem states to be examined and this often makes it possible to simplify the search. This is because the test for repeating situations, PRIOR MOVES, can be dropped as redundant. As the time taken to carry out this test increases rapidly as the problem size increases its elimination plays an important role in reducing run time.

The use of the SELECT facility for generating mutually exclusive alternatives again reduces overheads compared with the use of actions to generate the same information and this is particularly true when a number of selections are automatically linked together. Here again a series of key words can be used to produce similar constraining effects to those described for actions. In addition there is a powerful constraint processing facility, used in the School Colour problem, in which specific requirements, together with appropriate illegal conditions, are used to ensure that unsuitable combinations are never generated and that the generation order is such as to reduce the problem space to a minimum. This often serves to reduce an impossibly large search space to a trivially small one.

3.7. Extending the Power of the Problem Solver

Because TANTALIZE and the problem descriptions are both written in CODIL, it is possible to use virtually the whole power of the CODIL language in the action descriptions or in supplementary files. In a simplest case it is usually possible to redefine the initial conditions, ILLEGAL or GOAL, and then TRY AGAIN after an attempt to find a solution. If the problem GOAL is inexactly defined, for instance when finding a minimum condition, the GOAL may be dynamically redefined during the problem solving process to be less than the last solution found. By redefining the file TERMINATE, the user can modify the terminal procedure and can automatically arrange to modify other files and TRY AGAIN. This would allow the system to be set up with an explicitly defined series of subgoals to be solved in a given order. A wide range of other such modifications are practicable and if a given type of modification to the workings of TANTALIZE proves to be common it can always be adopted as a permanent option.

3.8 The Problems Solved

At the present time about 60 different problems have been solved and in some cases up to a dozen different versions of a single problem has been tried to explore the different features of TANTALIZE. Despite this variety there are many features, and particularly combinations of features that have not been properly exploited.

Many of the problems examined are New Scientist Tantalizers (3) of which over 30 different ones have been solved to date. All but 2 are solved in less than one minute processor time on the 1903A computer. Many took less than 20 seconds. In addition a number of problems solved by other workers have been solved. These include:

Bridges of Konigsbery (Ernst & Newell, 1969)

Confusion of Patents (Fikes, 1970)

Couples Tennis Problem (Fikes, 1970)

Eight Queens on a Chess Board (Fikes, 1970)

Magic Square (Fikes, 1970)

Missionary and Canibals (Ernst & Newell, 1969)

Monkey and Bananas (Ernst & Newell, 1969, Fikes, 1970)

Smith, Robinson and Jones (Weston, 1970)

Three Coins (Ernst & Newell, 1969)

Water Jug Problem (Ernst & Newell, 1969,Fikes, 1970)

In all cases where a comparison is meaningful, the solution time is very much faster (never more than 90 seconds) and in almost every case the problem definition is much simpler. (It should be noted that TANTALIZE in its present form, is not: able to cope with problems that require a capability for algebraic manipulation.)

The final group of problems have mainly been chosen to illustrate the novel features of TANTALIZE/CODIL. They include alternative transliterations of Colonel Gadafy's name, acting as an umpire in a game of cows and bulls, generating the nursery rhyme "One Man Went to Mow”, matching a single payment against a series of outstanding invoices to find the best fit; minimising the cost of holding stock and searching externally held files on various subjects. A number of simple algebraic problems, such as the Old Hindu problem described earlier have also been described.

4. CONCLUSIONS

4.1 What has been achieved

CODIL can best be considered as a highly unconventional general purpose information processing system aimed at unsophisticated users who have non-trivial real life problems. In developing this language various observations have been made concerning unplanned similarities with human information processing activities. These observations include similarities between the facts and short term memory, the naturalness with which powerful learning facilities are included, and the existence of a simple yet extremely powerful decision making capability which can be interpreted in evolutionary terms. The ability of the language to handle open-ended, ambiguious and poorly defined problems, coupled with a relatively poor capability (compared with more conventional computer languages) in the formal mathematics area is also relevant.

TANTALIZE represents the first deliberate use of CODIL for artificial intelligence research, and the result is a powerful yet easy to use problem solver system which offers a very wide range of facilities. It has already solved a wide range of problems in a reasonable time and in addition there is plenty of scope for writing a new and much more powerful successor. However, perhaps the most surprising thing about it is the way CODIL rallied to a task so far removed from its design aim of serving unsophisticated users who do not even need to know what an algorithm is.

4.2 Comparison with Other Systems

Over the past few years many attempts have been made to compare the basic structure of stored program computers with human information processing techniques. The basic CODIL philosophy of a controlling decision making unit and a memory hierarchy which includes learning facilities, provides a very much better global model than any based on the idea of program and data as separate concepts. In fact the sooner the stored program computer model of human information processing joins the earlier telephone exchange model and the dodo, the better. However at this level of comparison it should be realised that CODIL still forms an exceedingly crude working model and it should be considered as no more than an important step in the right direction.

There are problems in making comparisons between the approach described here and the simulations of specific human information processing activities. For instance, how can one compare a model of a specific aspect of short term memory, which has been designed to test a highly specific theory, and do nothing else, with a system designed to perform on as wide a range of tasks as possible without any attempt to model memory at all? The approaches are complementary and the work of reconciling them represents a major task for the future.

Compared with other problem solvers, the TANTALIZE problem descriptions are almost always simpler, the time to process problems is often more than an order of magnitude less, and with a mere thousand source cards TANTALIZE provides an unrivalled range of heuristic search and other facilities. Its comparative weakness in algebraic manipulation is more than compensated for by the ease with which it can be extended by the use of CODIL to carry out a wide range of operations.

4.3 The Future

CODIL and TANTALIZE together provide a system with a good man-computer interface, an ability to handle poorly defined real life problems, a global model of human information processing and a powerful heuristic problem solving facility. This represents a breadth of ability unique to artificial intelligence but it is merely the spin-off of an unfunded research project to develop a language for untrained users.

Given adequate support there are many possibilities for further research. For instance TANTALIZE represents no more than about 5 man-months effort and could greatly benefit from being rewritten in a more powerful form. CODIL was in no sense designed for artificial intelligence work and modifications to the interpreter could greatly improve efficiency and also provide further facilities. In addition packages analogous to TANTALIZE could be developed to provide facilities for developing and testing more refined models of learning and other specific aspects of human information processing. Detailed comparison with more specialised work on artificial intelligence problems will almost certainly suggest ways in which the basically unsophisticated structure or CODIL can be extended to accommodate more sophisticated~structures and techniques. If this work is done in parallel with the development of CODIL for practical applications a reasonable aim would be for- a system that could greatly reduce the need for systems analysts and programmers.

References

(1) B. G. Buchanan, G. L. Sutherland, and E. A. Feigenbaum. Rediscovering some Problems of Artificial Intelligence in the Context of Organic Chemistry, Machine Intelligence 5, ed. B. Meltzer & D. Michie. 253-281, 1969.

(2) E. F. Codd. A Relational Model of Data for Large Shared Data Banks, Comm. A.C.M. 13, 377-387, 1970

(3) M. Hollis, Tantalizer, New Scientist. (A series of puzzles published weekly in this popular science journal over a period of about 8 years.)

(4) D. Michie, Heuristic Search, Comp. J. 14, 96-102, 1971

(5) I. Pohl, Bidirectional Search, Machine Intelligence 65, ed. B. Meltzer & D. Michie. 127-140, 1971

(6) G. W. Ernst and A. Newell, GPS: A Case Study in Generality and Problem Solving, Academic Press, 1969

(7) R. E. Fikes, REF_ARF - A System for Solving Problems stated as Procedures, Artificial Intelligence 1, 27-120, 1970

(8) P.E. Weston, To Uncover, To Deduce, To Conclude, Comp. Stud. Hum. Verb. Behav. 3, 77-89, 1970

TANTALIZE - A Conversational Problem Solver