Medical Informatics Europe 78, editor J. Anderson, pp 465-474, 1978
Using CODIL to handle poorly structured clinical information
C. F. Reynolds, Department of Computer Science, Brunel University, Uxbridge, Middlesex. and M. Shackell and G. Sutton, Hillingdon Hospital, Hillingdon, Middlesex.
This paper looks at the problems of using computers to handle clinical information in the context of a small research team where changing requirements and comparatively low volumes can make conventional systems analysis and programming techniques hopelessly uneconomic. The way in which such difficulties can be overcome are discussed, and examples are given of an operational system, using CODIL as the implementation language, which handles clinical information on cardiac patients. The ease with which such an approach can be extended to other areas is discussed.
This paper examines the problems of the medical researcher who wants to keep case notes, or other information, on a computer. He would like a data base package tailored to meet his specialist needs at minimal expense. Unfortunately he will often encounter one or more of the following difficulties:
- He is uncertain what his information processing requirements are until he has had experience of using a suitable system.
- His requirements will change with time as a result of medical advances, his increasing experience of the problem being put onto the computer, etc.
- The logical structure of the information may be complex, with different types of information being collected from different sources at different times.
- Some of the information will be incomplete or ambiguous, particularly if it depends on outsiders to provide data on an ad hoc basis.
- The total volume of information involved will often be very small compared with administrative data processing jobs being run on the same computer.
- Existing packages do not fit his requirements without either modifying the package or his requirements.
- He doesn't have a friend in the data processing department who will help him out for the kicks.
- He doesn't have access to unlimited research funds.
As a result the majority of doctors who might like to use automatic information processing techniques to help in their research activities are unable to do so.
CODIL is a computer language specifically designed for the class of user who might be faced with the difficulties listed above. This paper shows how the language has been used in the Cardiac Department of Hillingdon Hospital, assesses the progress made to date and examines the ease with which the findings can be applied to other research groups both inside and outside the medical arena.
2. CODIL as a data base system
CODIL is a computer language (1,2) specifically designed to allow non: computer oriented users to set up and run applications involving ambiguous or poorly defined information with the minimum of help from computer professionals. (3,4,5). In order to provide some degree of familiarity to the user it was decided to build the system around a simplified model of human memory and decision making (5,6,7) which is easy to explain to users and which has proved remarkably efficient to implement. By concentrating on a flexible man-computer interface the package is totally application independent and has been used for tasks as diverse as teaching HNC chemists the basics of information processing through to heuristic problem solving (8). However its chief operational use has been in the field of information storage, processing and retrieval, (9,10) including the application described here.
While CODIL provides the user with powerful "data base" type facilities it is important to rea1ise that there are fundamental differences between it and more conventional data base systems (11). Conventional systems are constructed around a troika of program, data definition and data, with rigid demarcation rules as to the type of information each represents. This approach has proved very successful when the need is to process large volumes of uniformly structured data in routine ways, possibly to meet the needs of some government regulation or administrative edict. On the other hand it is unsuited to comparatively small applications involving poorly structured information. You have only to ask any doctor which parts of his medical knowledge represents program, data definition and data to realise how alien conventional data base software is to the average human being.
Because it is modelled on human information processing CODIL rejects this troika and replaces it with a unified concept of information. All information within CODIL is self-structured and needs no explicit file description. Within the simple syntax of the language the user is free to describe the application in the way he finds most acceptable, and there is no need for him to make arbitrary distinctions between program and data, although if he wishes to think about his problem in this way there are no constraints on him having distinct files of "rules" and "data". In the same way if he wants a file to have a uniform structure there is nothing to stop him, and he is free to vary this as appropriate to cope with missing or extra information should the need arise.
3. Background to the Hillingdon system
In describing the use of CODIL at Hillingdon Hospital it is important to realise that the interests of the various authors are distinct. At the hospital end the aim is to establish a computer file of clinical records of cardiac patients for research purposes. At the University end the aim is to monitor the operationa1 use of the CODIL language.
The work has progressed in a number of stages. Between 1973 and 1975 a pilot study was carried out in conjunction with two undergraduate projects (12,13, 14). This was the first time that CODIL had been used with operational data generated by non-computer oriented users and for technical reasons it had to be carried out in batch mode. The results were generally satisfactory although it was found that the batch punching documents used were over-complicated and tended to give the doctors using the system the impression that they had less flexibility than in fact they had. As a result of this, and trials with different applications, improvements were made in the CODIL interpreter. Unfortunately, it was still impossible to provide a terminal in the hospital, although interactive working was possible within the University. For this reason a far simpler, and more general, punching document was produced and starting early in 1977 clincia1 information on patients admitted to the cardiac care unit has been recorded. This was extended by the hospital staff to cover a retrospective analysis of patients who had undergone cardiac operations over a period of six years. A range of questions and surveys have been carried out using the data. This is the stage reached by the system described in this paper. Future plans include the provision of a terminal in the hospital, initially to allow file interrogation but eventually to permit direct file update for the cardiac care unit.
4. Description of the Cardiac Information System
In the following paragraphs the various procedures involved in setting up and using the system are summarised. The examples are all artificial to avoid the use of confidential patient data, but have been chosen to illustrate the principle features of interest.
4.1 Direct input procedures.
One of the most obvious differences between CODIL and more conventional data base systems is that there is no need for any prior file definitions before a file is set up. If a user wants to set up a file called HISTORY containing information on a patient's cardiac history all he has to do is to connect with the CODIL interpreter and type:
← CREATE = HISTORY.← PATIENT = SMITH; YEAR = 1965; DIAGNOSIS = UNSTABLE ANGINA.← PATIENT = JONES; YEAR = 1972; MONTH = 3; DIAGNOSIS = MI.← ... ... ... ...← END.
To list the file in the CODIL default format he types in a simple command and the system responds immediately:
← PRINT ITEM = HISTORY.PATIENT = SMITH.YEAR = 1965,DIAGNOSIS = UNSTABLE ANGINA.PATIENT = JONES,YEAR = 1972,MONTH = 3,DIAGNOSIS = MI.
Further statements can be added by using CREATE(APPEND) or the file can be edited. In the following example the context editor is used to change all occurrences of DIAGNOSIS = MI to DIAGNOSIS = MYOCARDIAL INFARCT.
←FILE EDIT.FILE NAME = ?← HISTORY.TYPE IN AMENDMENT FILE← DIAGNOSIS = HI; DIAGNOSIS = MYOCARDIAL INFARCT.← END.
In practice this degree of flexibility is inappropriate for non-trivial files which are to be updated over a period of several years and used by more than one individual. There are also advantages in minimising the volume of batch punching and interactive input while continuing to use meaningful terminology. For this reason the hospital files were initialised as follows:
(a) Abbreviations. To avoid the need to type in long names a number of abbreviat ions were defined as follows:
← ADMISSIONS WARD(ABBREV) = AW.← AETIOLOGY(ABBREV) = AET.← ANGIOGRAM(ABBREV) = ANG.← ... ... ...
New names and/or abbreviations can be defined on any subsequent occasion. However to prevent the accidental use of an undeclared name it is possible to indicate that such names are to be reported or rejected.
(b) Semi formatted input. It was agreed at an early stage in the work that all patient information should be qualified by a serial number unique to the trial and that no names or hospital identity codes should be used to ensure confidentiality. In addition all statements were to contain date and time information to an appropriate degree of detail, together with a "form name" to identify the type of information being input. To handle this a simple routine is created to read in details of the NEXT patient:
← CREATE = NEXT.← INPUT = SERIAL NO. )← INPUT YEAR. )← INPUT = MONTH. ) Reads value of item from← INPUT = DAY. ) terminal or batch stream.← INPUT = HOUR. )← INPUT = MINUTE. )← INPUT = FORM. )← INPUT ALL. Reads list of items as required.← STORE ALL. Stores all items on file.← DELETE← END.
This file is used to read data off the form shown in figure 1, which is punched for batch input as follows:
NEXT:1234:78:9:4:::REFI:DOB=1913:SEX=M:DI=LV DYSFUNCTION: AET=CAD:ORD=INVESTIGATION:REFC=HAREFIELD:E
(The ":E" termination is not shown on the form as the actual routine NEXT used in the operation system has been written to minimise punching. The routine can also be used in interactive mode, in which case it will automatically prompt the user for the inputs required.)
(c) Updating the master file. New information is currently made available in batch mode and it would be inappropriate to add it directly to the file of accumulated patient data until there had been an opportunity of removing any data preparation errors, etc. To handle this problem an additional routine READ NEW DATA has been written which copies the input data to a file called NEW DATA, which can be edited, amended, searched for errors, etc. Once this file is free from errors it is merged with the master file. It is worth noting that the biggest part of the initialisation process is the list of abbreviations used by the doctor.
4.2 File listing facilities.
In addition to the default file listing facility - which can be bulky for large files - it is simple for the user to write his own routines for controlling the way in which information is printed. The following example involves the creation and use of the file, LIST PATIENTS, and shows some of the tabulation and other facilities available for controlling the layout of the printed page.
Although not shown in this example the CODIL interpreter automatically adjusts for undefined or mu1tiva1ued information with the minimum of user inconvenience.
4.3 File searching facilities.
Files are searched by comparing a pair of files. The first contains the definition of the search and the second the information to be searched. The following example prints out a list of all the patients who have died according to the file PATIENT FILE.
← CREATE = SEARCH.← FORM = DEATH; PRINT IS SERIAL NO.← END.← SEARCH = PATENT FILE.3 5 9 19 28 37 41 43 56 69 76
Queries can involve several different items of information. For instance the next example looks at all patients referred for investigation to Brompton or Harefield hospitals where the diagnosis included the word ANGINA. Information on the patient's serial no. sex, and age are printed.
← CREATE = SEARCH.← OBJECT REFERRAL = INVESTIGATION,
← REFERRAL CENTRE = BROMPTON,← OR REFERRAL CENTRE = HAREFIELD,← DIAGNOSIS CONTAINS ANGINA,← PRINT(4) IS = SN.← PRINT IS = SEX← PRINT IS = 1900 + YEAR - DATE OF BIRTH.← PRINT.← END←← SEARCH = PATIENT FILE.34 M 51104 M 64195 M 57205 M 53... ... ...
Questions involving several different events in the patient's history need to be formulated more carefully as the information is held in patient/date/time order and there is no limit to the number of statements about anyone patient. Some times this can be done by using an additional item to indicate that an earlier event has been noted relevant to the enquiry. In this case the item BROMPTON PATIENT is used to hold the serial number of the last patient referred to Brompton in 1973. This is used to select such patients who were seen at a follow up session in 1977.
← CREATE = SEARCH.← REFC BROMPTON; YEAR = 1973; BROMPTON PATIENT IS = SERIAL NO.← FORM = FOLLOW UP; YEAR = 1977; BROMPTON PATIENT IS = SN; PT IS=SN.← END.←← BROMPTON PATIENT = O.← SEARCH = BROMPTON PATIENT.197 209 667 905 ... ...
Complex questions are best asked by using files as conditions of indexes. For instance in the last example the result could have been stored as a list of serial numbers on a file called BROMPTON F U 77 rather than being printed out. This file could then be used to identify such patients immediately on any subsequent occasion. This can be done for any category that the user finds useful. Given files such as ANGINA PATIENTS, BUGGINS PATIENTS and SURVIVED 5 YRS, and using the experimental indexing facility, it is possible to list out all clinical information on Buggin's patients who suffer from angina who have survived for a period of 5 years.
← CREATE = SEARCH.← SURVIVED 5 YEARS; ANGINA PATIENTS; LIST PATIENTS PATIENT FfLE.← END.← SEARCH = BUGGINS PATIENTS.
The effect of using files in this way is analogous to a user being allowed to set up indexes on any items, or combinations of items, that he feels would be useful.
4.4 Data vetting.
The flexible nature of CODIL means that there is inherently less control over the format and structure of the information being stored as compared with a more conventional data base system. For areas of the application where the user requires more control over the structure and content of the information being processed there are a number of optional facilities. In addition the user can write his own checking routines. This account will restrict itself to two different approaches.
The first is a CODIL library routine, MAKE CHECK LIST. This scans through a file and extracts all items with a given name, cross referenced by the key item, which for this application happens to be the patient number. The result is an alphabetical list of values, with appropriate references to the patients. By scanning this list the doctor can quickly see the terms used in the systems, spot unintended synonyms, etc. He can then use the results to amend the master file or to chose suitable phrases for use in a search query. MAKE CHECK LIST can also produce a file containing all the valid values of the given item and this can be used to check whether a newly input item has a novel value or not.
The second approach can be considered as an extension of the file searching techniques used earlier, except that the files being used contain valid items or lists of items. For instance there might be a file called NORMAL REFERRALS that links the surgeon's name with the appropriate hospital.
← CREATE = NORMAL REFERRALS.← SURGEON = SMITH,← OR SURGEON = JONES,
← REFERRAL CENTRE = HAREFIELD.← SURGEON = BROWN,← OR SURGEON = GREEN,← OR SURGEON = YELLOW,
← REFERRAL CENTRE = BROMPTON.← END.
Given a file that contains the data checking appropriate to the input the above file could be used in one of two ways. Embedded in a statement such as:
FORM = REFI; NORMAL REFERRAL NONE; ERROR MESSAGE = ?? REFC/SURGEON.
it will produce an error message whenever a referral form does not contain a valid surgeon/referral centre combination. Alternatively the user might want to do no more than record the name of the surgeon, leaving the CODIL system to add the referral centre. This can be done using a statement as follows:
REFERRAL CENTRE NONE; NORMAL REFERRAL.
This will cause the appropriate referral centre to be added to the statement being examined.
5. Future plans with the Cardiac Information System
Despite the lack of a terminal at the hospital the system is operating in a satisfactory manner and currently holds details of some 500 or so patients. The next stage is to give the hospital staff unsupervised hands on experience by providing a terminal in the hospital. It is hoped to do this later this year and it will be initially used for file interrogation activities, switching to file update a few months later. Associated with the change over will be a switch from sequential to indexed files and the introduction of improved data vetting facilities under the control of the doctors.
In making these plans due note has been taken of other trials using the CODIL language. One of these involves the processing of the weekly job statistics for the University Computer Science Department. This has proved the ability of the language to handle comparatively routine tasks peaking at about 1000 transactions a week. Another application is built round a data base of genealogical information on about 1400 individuals - which contains poorly structured biographical information with the need to generate family trees, etc. This has proved invaluable in examining the need for file indexing more rigorously than would be possible with the cardiac system.
The system described above has proved valuable to all concerned. The medical staff have computer files of clinical data on their patients and there has been valuable feedback on the design and implementation of the software. The Cardiac Information System could be modified to handle any other clinical area simply by providing a different set of names/abbreviations - and of course changing any data vet facilities to match the requirements of the new system. In this connection it is worth commenting that the CODIL interpreter will run on almost any ICL 1900 computer, has been transferred to a CDC 6600, and should run on most mainframe systems with the minimum of alteration.
The next stage in this research is to examine the use of CODIL in field conditions, both inside and outside the medical environment. Because of the success of the language in artificial intelligence type applications the approach may well have some relevance to diagnostic systems.
The work described here would not have been possible without the earlier pilot trials involving Dr. D. Orton, then of Hillingdon Hospital, and Messrs. Palmer and Marmion, then undergraduates of Brunel University. Christine Godfrey of the University Computing Unit carried out the thankless task of punching the forms and other colleagues, too numerous to mention, have given help and advice. In addition a Science Research Council grant covers certain aspects of the work.
- C.F.REYNOLDS, (1971) CODIL Part 1, The importance of flexibility. Computer Journal, Vol 14,p217-220.
- C.F.REYNOLDS, (1971) CODIL, Part 2, The CODIL language and its interpreter, Computer Journal, Vol 14, p327-332.
- C.F.REYNOLDS (1974) Designing an interactive language for the pragmatic user, Proc. European Comp. Conf., p99l-1006.
- C.F.REYNOLDS (1977) Matching the computer language to the user, Report CSTR/14, Brunel University.
- C.F.REYNOLDS, (1978) A new look at the problem of open-ended applications, Pragmatic Programming and Sensible Software, 239-251.
- C.F.REYNOLDS (1978) A psychological approach to language design, Workshop on Computing Skills and Adaptive Systems.
- C.F.REYNOLDS (1978) The design and use of a computer language based on production system principles, Report CSTR/15, Brunel University.
- C.F.REYNOLDS (1975) TANTALIZE, a conversational problem solver written in CODIL Second one day conference on Recent Topics in Cybernetics.
- C.F.REYNOLDS (1971) Handling cave fauna data on a computer, Trans. Cave Research Group of Great Britain, Vol 13. p160-165.
- C.F.REYNOLDS (1976) A data base system for the individual research worker, Selective Dissemination of Information, IEEE, pl-7.
- C.F.REYNOLDS and D.OMRANI (1978) Formalism or flexibility? Int. Conf. on t1anafement of Data.
- L.R.NEAL (1977) The computer handling of medical information for research purposes, Medinfo 77, p65l-655.
- R.J.PALMER (1974) A computer based data retrieval system to aid research into myocardial infarction, B.Tech dissertation, Brunel University.
- K.MARMION (1976) The study and use of the CODIL language in a patient recording system in Hillingdon Hospital, B.Tech dissertation, Brunel University.