Wednesday, 27 July 2011

Brain Storms - 6 - CODIL and Natural Language

CODIL was designed to be the symbolic assembly language of an information processing system which reflected the way some people thought about open ended tasks (where it was difficult to predefine a processing algorithm). It arose, not from an academic attempt to model human thought processes, but from a pragmatic attempt to help sales staff control the processing of contracts in a very large commercial organisation, and was later generalised to handle a wide range of basically non-numerical tasks.

The interesting question is whether CODIL works because it was also modelling the “symbolic assembly language” of the brain, and it is therefore very relevant to ask whether the approach could be made to work on a neural net. Is it possible, starting from the CODIL model to reverse engineer and intelligent brain model that is capable of being explained in evolutionary terms?

Before going further it is appropriate to consider how a language like CODIL relates to natural language.

In a stored program computer system the symbolic assembly language relates to moving numbers (which represent data) between the memory and registers and carrying out processes and explicit logic tests. This mechanism is remote from normal human thought processes – which is why the stored program computer is a black box whose internal working is utterly incomprehensible to the average user of the “box”. Billions of pounds have been spent over the years to develop elaborate software packages, layer upon layer, like an onion, to hide the mysterious inner workings and provide a more friendly interface.

In CODIL (at least if considered in terms of the original hardware proposals) the symbolic assembly language represents not numbers, but sets and subsets, with the set names being those decided on by the user of the system. All addressing is done associatively by set names (which the user understands) and most processing is done automatically with the decision making unit (the equivalent of the stored program computer's CPU) carrying out simple set operations.

Take a simple CODIL statement:

1 MURDERER = Macbeth,
2    VICTIM = Duncan,
3       WEAPON = Dagger.

This can easily and quickly be recognised that someone called Macbeth used a dagger to kill someone called Duncan. While the statement is not written in English its meaning is immediately recognised by English speakers. If you look at the many other examples in the Publications about CODIL associated with the blog you will find that in many cases you can convert the statements into English with comparative easy. It is almost as if humans have the mechanism to convert such set descriptions into natural language., and there is no reason to think that this would not be true for any other human “natural” language. (I exclude formal mathematical systems from this observation.)

This raises the question of what extra “facilities” might one need in CODIL (and the brain) in order to convert set descriptions (held in parallel on a neural net in the case of the brain) into the serial string of spoken words, and also the reverse process. In terms of brain modelling what functional changes, if any, are needed to an animal's brain in order to handle language, excluding those functions explicitly connected with speech generation (i.e. muscle control of the voice box, tongue, lips, etc.) or simple capacity (number of neurons available.)

If one looks at CODIL (as implemented in MicroCODIL) the following statements will generate “Macbeth murders Duncan with a Dagger” in the DISPLAY window. (TEXT is a reserved set name – and instead of the item being moved to the FACTS the value is displayed on the screen.

1 MURDERER,
2    VICTIM,
3       TEXT := MURDERER.
3       TEXT = murders.
3       TEXT := VICTIM.
3       WEAPON,
4          TEXT = with a.
4          TEXT := WEAPON.

A more general approach would include statements such as

1 MURDERER,
2    VICTIM,
3       SUBJECT := MURDERER.
3       VERB = Murder.
3       OBJECT := VICTIM.
....

with further statements on how to modify verbs such as murder and word order depending on context, to generate sentences such as “Will Macbeth murder Duncan with a dagger?” or “Macbeth used a dagger to murder Duncan”.

When I was engaged in full time research over 20 years ago I was unable to explore the possibility of assessing how good the CODIL approach might be to generating or understanding natural languages or for translating between languages and if anyone is interested in following this approach I am happy to advise. The important thing is that if CODIL (as mapped onto a neural net) is a reasonable model of the brain's “symbolic assembly language” and it is capable of supporting something approximating to natural language, there is no need for any significant new “logic processing” capability between the great apes and humans – simply that humans have greater capacity – to handle the extra language information needed – and also have better sound-generating capabilities.

Held to ransom over Winzip Archive Files

When I acquired my first PC in 1992 I started to use WinZip to archive data - some of it transferred from other computers and dating from 1988 or earlier. However as storage has become so cheap, I haven't used it to archive important files since 2003 although I used it extensively to store records of ebay sales and purchases between 2003 and 2005.  However I have used it occasionally to unpack old files or zipped folders supplied by other people.

I recently received an invitation to try out a new version of WinZip (version 15.5) for free. I thought it might be useful to have a quick look - and decided not to purchase. Several months later I wanted to unzip some of my old archives and found that the trial version wanted payment - and HORROR of HORRORS - it had deleted the old version. I needed to access the files - so reluctantly I paid up. It then became apparent that the version I had brought would stop functioning if I didn't continue to make annual payments - and so I would loose access to 20 year old archives unless continued to pay up indefinitely. There is nothing to say in the terms and conditions that it will remove old (but licensed) versions of the software,  or about from the vague word "limited", indicate that you will be unable to access your ancient files unless you pay an annual fee ...

WinZip appear to have forgotten that the whole part of archiving information is to be able to retrieve it in the future. I am quite happy to pay an annual fee when I am actively using the software to archive new files - as long as there is a guarantee that the fee I paid to create the archive also allows me to recover the data at any time in the future.

To add insult to injury, the software came with the benefit of free software - which is no more that a sales promotion package in that if you run it you are told you can only use it properly if you pay even more money ....

The "access to old files" situation is rather like the trick that Microsoft played with Word some years ago - when an upgrade meant that it would no longer read text files I had archived in the 1990s. As a result I am keeping running an old PC which has a version of Word which will read the old files - and reformat them into a form that newer Microsoft software will still read. I have also switched to OpenOffice (Free!) which can read the old files (although with a few formatting errors).

The only benefit of all this is that I search my hard disc for all WinZip folders (5211 of them) and listed them in date order. As a result I have found a number of important folders containing files relating to CODIL from the 1980s which I had thought I had lost.

Monday, 18 July 2011

Rural Relaxation at College Lake

The Marsh at College Lake, Near Tring
Pyramid Orchid
Having opted out of the science rat race over 20 years ago I am beginning to realize how retirement has affected my view of the world. I have never been the best organizer of my own time - but at least when I was lecturing there was a fixed timetable of required activities to provide a framework for the day and the week.

Moorhen and chick on rippled waters
The trouble is that I have always been a workaholic with an overflowing in-tray - and I still am - despite being retired. With few fixed dates it is easy to put things off until tomorrow - and sometimes I suddenly realize that several matters have become urgent. Because of what happened to my daughters I need to keep my stress levels down - and the best way to do this is to take a country walk - and things such as this blog just have to wait.

Wayfaring Tree
Fortunately there are plenty of opportunities to relax in the area around Tring, but one of my favourites is College Lake, which is run by the Beds, Bucks and Oxon Wildlife Trust. When I first visited this large hole in the ground some 20 years ago work had started on establishing a nature reserve at one end - while quarrying was continuing at the other. Last year a fine new visitors centre was opened - which means I can end a walk with a cup of coffee and a piece of cake!

Teasel
There are a wide range of habitats - a marshy area, shown above, a deep water lake, woodland - including a newly planted area which will look wonderful in 50 or so years time, heath land areas and an area where crops are growing in the old way - with all the associated weeds! There are also areas of bare chalk, left from the quarrying which is beginning to be covered with vegetation, and rare breed sheep and cattle graze the grassy area. Whatever tine if year you visit there is always something for the nature lover to see. For instance only a few days ago I spent a happy half an hour watching a hobby flying over the march area catching dragonflies.

Photographs all taken in July 2011












Friday, 15 July 2011

Squeezed into the insurance company's box

We all have problems in trying to fill in computerized forms which give us the choice of options none of which fit the bill. As a result  I was very much amused by the blog post Misinformation and muppets by Azaria Frost. She describes the problems of getting car insurance when your occupation is not on the list. Several people made comments about how a different description such as "office worker" - which could apply to many people who also have a specialist title - can reduce your car insurance premium. One comment also explains that one of the reasons is because of the need to share information.

The problem is one that is pretty universal. While it is useful to classify information into boxes the real world is not like that. Often there is not clear borderline - for instances the classification of living things into families, species and subspecies in always going to turn up intermediate cases. Attempts to analyse data statistically can throw up problems. One of my first blog posts was I discover Babel's Dawn which had a review entitled Last Common Language was in Africa which I find very interesting. However when you look at original paper Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa in depth there are problems with the quality of the source data. This comes from classified data about a large number of different languages. While this does not invalidate the analysis the results must retain a considerable degree of uncertainty because of the classification assumptions which underlie the research.  


The important thing to remember is that in classifying and statistically analyzing any body of data there are likely to be difficult cases. Most computer systems simply sanitize the data by shoe-horning the "messy data" into a standard category - and "conveniently" forgetting that there was ever any uncertainty.  

Sunday, 10 July 2011

Brain Storms - 5 - Some factors in Choosing a Model


As CODIL is an artificial language which has been implemented on a stored program computer my experience has been that many people jump to conclusions about what is does and how it does it. For this reason it is worth spending a little time discussing the limitations of models in understanding science.

My own approach to scientific models is greatly influenced by the fact that as a student I studied chemistry at a university with a strong interest in the underlying theories, and went on to do a Ph.D. with a strong theoretical content. As such I was well aware that it helped to have different models for different purposes – and each model had its strengths and weaknesses – and it was often possible to represent the mathematics in different way. The difference between the Ancient and the Copernican models of the Solar System are little more than a change in the mathematical origin (which simplified calculations) and a change in the belief that humans must be at the centre of creation.

In the context of “Trapped by the Box” it is also important to realise that our brain constructs models of reality which also have their strengths and weaknesses. We are all aware that the brain/eye can be fooled with various optical illusions. The “image” of what we see is an incomplete view of the reality, which is filtered by what we expect to see. This was brought home to me when I visited a nearby zoo with a young child who suddenly pointed with an excited cry of “Look”. Everyone looked in the direction she was pointing and saw nothing that could possibly be relevant, such as a bear with its cub. Nothing – so the child pointed more directly and we could all see the ant on the barrier to the enclosure!

Everyone also has models for the human languages that they speak – and these can influence the way they think about the world. So what about computer languages. It is well know that if someone becomes very familiar with one programming language this will influence how they use a new language. In fact it is quite difficult to find anyone that does not take the stored program computer approach for granted, even if only to admit they they are not clever enough to write programs. The more knowledgeable will talk about the Turing Universal Machine to demonstrate the power of the almost universally accepted approach.

But all models have practical limitations and the Turing Universal Machine and the stored program computer are no exceptions. They are based on the assumption that you can define the algorithm (program) in advance. Whatever the theoretical position might be if we had infinite knowledge, there are many aspects of the world where we do not have enough prior knowledge to define an algorithm. It is in these areas that CODIL was designed to work. I would also suggest that this area of incomplete and inexact knowledge is the one in which the brain (both animal and human) evolved.

A detailed assessment of CODIL will come in later postings but it is useful to think of CODIL as the “symbolic assembly language” of a non “Von Neumann” computer. This table highlights some of the differences.

Feature
Conventional Computer
CODIL system
Information stored as:
Numbers
Sets
Recursion as part of memory structure
No
Yes – sets can contain sets
Information addressed by:
Numerical position in linear store
Associatively by Set Name
Distinction between program and data
Yes
No
Address calculation
By program
Automatic
User friendly at the processor level
No
Yes (people understand sets!)
Algorithm required in advance
Yes
No
Will basic approach map easily onto a neural net
No
In part very easily but more research needs to be done

I can just imagine some of the computer purists looking at this – noting that CODIL has been made to work on a stored program computer – and blowing a large raspberry. But as I discovered when I studied Chemistry – you need different models for different purposes – and often different mathematical models prove to be equivalent. For those who wish to dismiss the idea may I suggest they remember that there are sets of numbers and it should be possible (and I leave it to others to prove it) that the stored program computer is a special subset of the CODIL approach. For instance if you said all set names are unique numbers in sequence, and all sets had a single numeric value, and eliminated recursion, the CODIL memory would become equivalent to the stored program computer memory.  

Friday, 8 July 2011

Brain Storms - 4 - Requirements of a target model

The human brain consists of a very large number of interconnected neurons and is capable of handling a large number of sophisticated concepts. Animals (by which I include the great apes and possibly all vertebrates) have brains made up of similar neurons and can take actions which relate to their memories of the environment in which they live. As the earlier posting, The Black Hole in Brain Research, suggests little is known about the way in which changes at the neuron level use these memories to work their way through to the resulting complex decisions leading to actions.

One way of trying to find out what is happening is to try and build a model of the likely processes and memory structures and find out what the model can do, and how far it fits with observations. This post aims to outline one possible approach to modelling the relevant thought processes and some of the observations which a successful model will need to explain.

We know enough about the brain to know that we need a large number of processors each of which is linked to a memory which can store a pattern, with communication links between processors.. The simplest action would be that a processor receives a signal over a link from another processor, compares the signal with its memory, and depending whether there is a match sends a signal to another processor. This can be considered as a single neuron acting as a logic gate.

The aim is to see how far we can model different aspects of brain activity by expanding this simple model in a way that will support complex decisions and actions. In the subsequent postings I will be looking at how research on an unconventional computer language, called CODIL, carried out some years ago, provides a good starting point for such an approach. However the work done so far falls far short of a complete brain model and it is appropriate to list some of the “goals” which will need to be addresses so that that areas where the model needs to be improved can be identified.

  1. The model must start – like a new-born baby – knowing nothing and be able to boot-strap itself up to a fully working system.
  2. A built in learning mechanism is essential
  3. There must be a mechanism to recognise patterns
  4. The pattern matching process may sometimes be exact, but will sometimes need to be “fuzzy” especially in early learning stages.
  5. There must be some way of organising patterns into categories and events.
  6. Information relating to time and distance need to be handled on a relative way.
  7. At any one time there needs to be a focus of activity (around short term memory?) but some processes will be “subconscious”
  8. Counting (beyond 1, 2, many) and formal mathematics and logic will only be relevant as tasks which the model can learn to do.
  9. There will need to be some housekeeping activities – including forgetting!
  10. The model should be capable of evolving from simple models.

Human language raises some interesting points. The brain's neural net working in parallel while speech is a sequential process. If information is to be transferred from one human to another by speech there has to be conversion processes (parallel > serial > parallel), introducing syntax. An important question is whether there is any need for special facilities in the brain to carry out this translation type task. The model will need to be able to support different cultures and languages and people with very different skills or political/religious views.

There is one factor that needed to allowed for in discussing any model. This is recursion. I have not defined a “pattern” or how it is represented – but in discussing examples it will be useful to be able to say that a particular pattern identifies a concept equivalent to the English word “Elephant”. However, in the same way that the word “Elephant” can be broken down into individual letters the pattern may also be able to be broken down into separate sub-patterns. For this reason I will use the term “Node” to represent a processor/pattern combination without needing to ask whether it is based on a single neuron, or a linked network of neurons, where the pattern is actually defined by links.  

Tuesday, 5 July 2011

Brain Storms - 3 - Evolutionary factors starting on the African Plains

Image the plains of Africa three or four million years ago – populated by a range of medium to large sized mammals which (for verbal convenience) we will describe in terms of modern species. There will be herds of antelope and zebras grazing on the ground plants, and animals such as giraffe who can eat foliage from high in the trees. Other herbivores will include elephant and rhinoceros. Wart hogs will have a more varied diet, and there would be the carnivores and carrion feeders such as lions and hyaenas. All have basically the same body plan, biochemistry, and genetic coding mechanisms – which have been modified by evolutionary pressures in different ways in different species. It is reasonable to assume – at least in a brain storm such as this – that the basic body plan includes the processes that allow the brain to store and process information.