STEER: advising on AI in education

Our unique expertise in technology, cognitive science, mental health and the philosophy of learning enable STEER to advise on responsible applications of machine-technologies to education systems.

'AI already drives our lives. Directing where it takes us has nothing to do with our IQ, or our EQ, but on our control of our cognitive steering.'

Dr Simon Walker, CEO, STEER

In 2011 the UK education company STEER began a programme of research around cognitive processes which could be described as ‘machine-resistant’. The programme was led by Dr Simon Walker, an applied cognitive biologist.

STEER welcomed the potential benefits of machine learning. However, STEER identified that describing human cognitive functions that were unlikely to be replicated by machines was an educationally, economically and existentially important question. STEER speculated that such a study would inform sustainable and strategic education directions; safeguard uniquely human skills; qualify the remit of AI in education; inform moral development of AI and clarify themes of human identity.

To achieve this, STEER drew on specialisms from philosophy, psychology, cognitive and computer science, education and pedagogy. The STEER programme centred around a series of large quantitative school studies testing a theoretical data model which, previous research had indicated, might describe cognitive functions likely to be uniquely human. Whilst advances in machine learning were accelerating rapidly, the basic architecture and coding principles of machine learning had been established back in the 1980. Now applied at scale, and across networked machines, harvesting a wider variety of data, these processes were achieving rapid gains in replicating and beating some instances of human cognitive skill which had previously been considered uniquely human. Therefore, cognitive functions which did conform to this computational direction were considered machine-resistant.

The programme methodologies, outcomes and conclusions were published in a series of studies between 2012-15. The term ‘steering cognition’ or ‘steering’ was used to refer to the cognitive function described. STEER identified that steering contributed to both student mental health and learning-to-learn skills.

From 2013- STEER initiated a concurrent development programme to engineer a set of useable technologies which schools could embed to measure, track and improve steering in their students. Since then, more than 100 UK schools have implemented the technologies to track and support 30,000 children and 7,000 teachers over a multi-year programme. The programme is funded by schools who want to achieve better outcomes in student mental health, learning-to-learn skills and quality of teaching, by becoming a ‘steering school’.


Responsible policies for the application of AI in education

AI questions some basic assumptions about being human. The goal of education is to preserve, extend and enrich human endeavour and cooperation on the planet. In the light of this, AI raises challenging questions for educational policy makers, which involve careful and wise political, commercial, moral, societal, psychological, industrial, economic and philosophical consideration.

Market actors will continue to create novel, innovative technologies each of which should be evaluated against clear, educational and societal criteria. We encourage policy makers should take a long term perspective including the following considerations:

A basic question for AI-based applications the extent to which any new pedagogical technology enhances uniquely human cognitive abilities. Without a proper answer to this question, we run both a political and an existential risk. The political risk is that societies may perceive AI as a fundamental threat to human progress and control and oppose its use at every level. The existential risk is that our confidence in, and investment toward future artistic, intellectual, scientific, literary progress becomes uncertain if we cannot clearly distinguish the unique value of human activity.

How a human thinks differently from a machine must be answered at a mechanistic level, a cognitive level, a psychological level and a social level. It is important that these distinctions do not remain in the papers of academics but are communicated clearly and effectively to inform educational ideology and teaching practice. Teachers will need to understand how their maths, or science, or history lesson is contributing a kind of knowledge and skill which cannot be replicated by a machine, otherwise their confidence in their enterprise may ebb away.

Related to this is the need to explicitly teach students a theory of knowledge. Students will need to distinguish between data, information, knowledge, abstraction, felt experience such that they can discriminate between lower-level machine analytics and higher-level data interpretation.

Like any other field, educational AI is being developed in a market in which both private and stare actors are acting to exert influence and control. Understanding this, it is important to recognise that machines are morally blind, but not morally neutral. Machine learning algorithms must be first trained using a set of data in order to recognise patterns. The data they are trained on determines the patterns they detect and the meanings they ascribe to them. If the dataset is distorted (for example, a data set with a bias of negative data on the violence committed by women), then the machine will be trained to see women as more violent and predict that future pattern.

The implication of this for educational policy makers is that they must consider how their education will train, supervise and regulate each generation of engineers to develop socially responsible code. By way of historical analogy, in mediaeval times across Europe, a religious priesthood politically controlled society through their exclusive access to, and understanding of, Latin biblical texts. Machine coding creates a kind of new priesthood who author code which society relies on for our knowledge of the world. The new priesthood resides in anonymous warehouses and offices in California, Oregon, Moscow, Beijing, Pyongyang and London. Educational policy makers must consider specifically how those who occupy such warehouses are educated.

Publication of our 2015 study involving nearly 4,000 pupils across 20 UK secondary schools which answered the question: Model. Do pupils at schools which show Motorway Model characteristics exhibit narrower cognitive abilities than pupils at schools which show less of those Motorway characteristics? If so, what might the consequences be for employability beyond school?

Educational digital technologies inherently construct an additional interface through which a student engages with the learning process. Inevitably, the social- psychological dynamic of learning is changed. For example, facial or other biometric monitoring alters the dynamic fundamentally. The learner becomes scrutinised potentially on a continuous basis; they may not be aware of the uses of the data. As a kind of educational surveillance, this is likely to create a state of hypervigilance, the effects of which upon the student’s ability and willingness to learn, need to be much more fully investigated. In addition, adolescence is a stage when students socially monitor what they present and what they hide; students who hide more are associated with higher mental health risks. An inadvertent consequence of AI-enhanced student monitoring may therefore be to amplify the rising psychological health risks already faced by adolescents in many countries.

This one example illustrates how AI-technology should be scrutinised by a range of disciplines and practitioners.

Fundamentally, AI- based educational applications outsource cognitive load from the learner to an external digital environment. In investing in adaptive educational environments, policy makers must review the evidence of inadvertent cognitive regressions.

Some studies provide direct evidence that the more learners rely on [some] adaptive digital resources, the more unrealistically inflated their self-evaluation of their own cognitive skills becomes . These findings come against a wider background of declining and reverse-Flynn Effects (generational IQ gains) across developed countries, attributed by some researchers to social and environmental effects, including the ubiquity of technology.

To pose the key question: will AI construct adaptive educational roads for users to better tailor their learning, or will it provide the equivalent of an autonomous educational car in which the learner’s decisions are made by the machine rather than the driver?

The previous point then poses the next consideration for policy makers. AI offers responsive, adaptive educational road designs to the learner. This has the potential to accelerate the acquisition of knowledge as the learner follows the adaptive road, without expending the cognitive labour of searching for, discriminating and linking knowledge themselves. However, higher-order thinking involves the ability to navigate epistemological landscapes in which there are no pre-existing roads. Such thinking is essential beyond the classroom; it is required to solve all new social, cognitive and artistic challenges.

In addition, this is also the skill which will enable humans to collaborate effectively with machines in a future economy. Machine intelligence relies on the identification of the right question and the right dataset in the first place; to extend the analogy, the ability of AI to enhance learning involves the human capacity to steer its powerful engine in the right direction in the first place.

Education is the environment which must train this metacognitive steering skill, which centres around self-regulation, map-making and cognitive flexibility. However, to date education has found it easier to measure academic engine power and progress (grades) rather than metacognitive steering skills. AI puts pressure on policy makers to develop a partner curriculum to target, measure and improve metacognitive skills.

AI which is of value is AI which works in the real-world. For education, this means technologies which work given the limitations of available time, hard/software and teacher skills. Policy makers should ask the questions: How feasible is it for teachers to practically use this technology in-class / school? What constraints of time, skill or costs exist?

What intangible obstacles have been considered? Schools are uniquely complex environments, educating young people who are on a journey of physical, intellectual, social and emotional change. What can seem like a perfectly acceptable activity to an adult can be excruciatingly embarrassing for an awkward teenager. For example, the reluctance of students to use ‘wearable technologies’ in the social setting of the classroom. Equally, technology may enhance the ability of young people to overcome adolescent social anxieties, for example, collaborating in online rather than face-to-face discussion.


Why is steering cognition machine resistant?

Machine intelligence is progressing at a tremendous rate; it was once considered impossible for a machine to beat a Grandmaster at chess. In 2016, Google’s Deep Mind overcame Grandmasters in the unstructured game Go. Surely it is likely that, at some point, machines will be able to replicate the brain functions which constitute steering cognition?

This is indeed possible, but we believe it is a long way off, and indeed, may in fact never be achievable. The reason is that machine intelligence and steering cognition use two different processing architectures. Machine intelligence is built upon an algorithmic architecture, whilst steering cognition is built on associative.

The associative architecture of steering cognition

The central function of steering cognition is as a mental simulator, which enables the brain to ‘manipulate or turn round’ novel, external data to work out what kind of data it is, how to attend to it, where to locate it in our long term memory and how to act back out into our environment in response to it.

To visualise the function of steering cognition in the brain, think of the board game ‘Downfall’. The object of the game is to get counters of the right colour into the right container at the bottom, via a series of cogs. The counters are like ‘data’ from the outside world, which come in a huge array of forms and colours. The containers at the bottom are like our existing memory in our minds: our brain creates long term memories by categorising counters into groups with similar associations (shape, size, smell, feel) so that we can recognise a leaf as a leaf, or a smile as a smile.

To get the coloured counters from the varied outside world into the right containers that exist in our minds, we need an equivalent of the ‘Downfall cogs’ in between. Steering cognition is our ‘Downfall cogs’. The steering cognition cogs enable our brains to turn round the data, work out what kind of data it is, whether we have seen it before and how it relates to our existing categories.

These cogs take place in our working memory and involve our imagination: we use our imagination to mentally simulate whether we have experienced this data before, how we reacted to it if we did, how much attention to pay to it now, and how to relate it to our existing memories.

The brain’s steering cognition cogs have to be very flexible, constantly adjusting between and within the varied languages through which we ‘read the world’. We read ’emotional’ data, ‘social’ data, and ‘spatial/physical’ data, ‘numerical’ data and of course ‘linguistic’ data.

In the course of an everyday task, like shopping in the supermarket or chatting with friends over a drink, ‘the meaning’ is contained not in one language, but within the fluid combination of these different data languages (gestures, tones, words, numbers, ideas etc). Steering cognition prioritises, adjusts and regulates our limited attentional focus between these different kinds of data, in order to detect the meaning of the whole. When it fails to regulate appropriately, our attention and subsequent action can become biased, focused on some languages over the others. The ability to recruit imagination is critical to achieve such attentional regulation, because imagination allows us to ‘see ourselves in relation to’ new data experiences. The imagination ‘puts us into the picture’ so to speak in the first person; in this way it ‘recruits and associates’ past emotional, social, linguistic, numerical memories with the new experience; new data is initially not processed procedurally and atomistically, but holistically and integratively.

Often, those associations are oblique, ambiguous, unresolved and putative, waiting to be more fully crystallised as we anchor the new more closely into our existing structures of memory and meaning (which is why the imagination is also the realm of metaphor, symbol, allusion and inference). Tolerance of the unresolved is critical: By sustaining and retaining such putative ‘associations’ in our mental simulation circuitry, we can create time to make connections with almost any new experience or kind of data. This allows us to adjust to, process and incorporate into our internal memory, a much richer, combinative and unpredictable stream of information than any other animal species.

Machines, on the other hand, are restricted to data types for which they already have an existing internal coding architecture. Otherwise the external data ‘does not fit’. Internet companies have faced and had to overcome this problem in its simplest form. Different applications (Facebook, Instagram, Blogger) are coded in different data architectures. When we post a picture on Instagram and want to link it to our Facebook page, the data has to be translated through an external, third party interface programme (an API). It cannot be read directly.

Internet companies are having to create more and more APIs to link different applications- maybe one day they will have created 1000s. The brain has to have an ‘steering cognition’ API for an almost limitless array of external data grammars. This helps us appreciate how difficult it is for any cognitive processor (e.g. the brain or a computer) to process data from an external source that is of an unpredictable and different structure to that held in its internal database (memory). It is for this reason that machines are only intelligent when faced with narrowly predictable and routine environments; e.g. a chess game, maths problems, stock market trading judgements etc- the data comes in one format.

The algorithmic architecture of machine and analytical learning

The brain’s almost limitless ability to process varied data grammars relies on its capacity to ‘associate’ unrelated data via its mental simulator, the imagination. The imagination’s ‘associative processing’ is what makes steering cognition critically and uniquely human. Machines make judgements by following programmed algorithms (10000’s of procedures in a step by step sequence to arrive at an answer). In this, machines analyse data similarly to how the brain sieves through, analyses, computes and finds patterns in its existing retained memories. We have tests to measure the brain’s ability to process internal data algorithmically– we call them IQ tests. The capacity of the brain to perform such complex, procedural mental tasks quickly and accurately is given a term ‘general intelligence’. The term is somewhat misleading as it suggests it refers to the overall capacity of the brain to learn. However, general intelligence is not the intelligence of the brain overall; rather, it is the ability of the brain to process and use existing, learned data algorithmically. Machines have a superior potential to replicate and surpass the algorithmic (IQ) functions of the brain. Machine learning enterprises are and can only (currently) be coded by algorithms. They will inevitably become more powerful in mastering the processing of data sets the like of which they have encountered before.

But it is our steering cognition, rather than algorithmic processing, that enables us as humans to process data in the complexity of the real and living world.

This leads to a fairly confident set of five reasons why machines will not be able to replicate steering cognition:

1. Humans are involved in every situation we process: machines cannot be

When we ‘read’ a social situation we are also a contributor ‘to’ that social situation. An agent as well as a reader. Human knowledge therefore always involves conscious mental representation of ourselves, and as such requires the capacity to become aware of ourselves AND become aware of the state of the other persons around us. A machine will never be able to do this because, by definition, they are non-human and therefore will not be ‘included’ as a contributor to the external social dataset by another human being. Our ability to ‘imagine’ ourselves as a first person, and to ‘imagine’ the state of another (empathy) centrally requires a community of imaginative participants. A singular machine intelligence would need to be come a community of machine intelligences, and even then, would only potentially be able to understand its own kind.

2. Steering cognitive requires self-representation

Fundamentally, steering cognition requires the thinker to ‘see them self’ as an entity participating in a situation. The capacity to ‘self-represent’ is an emergent property of the brains interaction, understanding of which remains beyond the grasp of philosophers let along computer scientists. The most sophisticated neural machine architectures have been developed without any understanding of what constitutes, neurally, a state of self- conscious, self-representation.

3. Much external data requires associative and ambiguous processing

For example, what does a child waving mean? The answer is determined by wider context, personal history and other factors not immediately discernible from the dataset. This requires associating symbolic, gestural, metaphorical data to form a composite picture involving memory and personal story. Machines cannot do that. Philip K Dick’s Do Androids Dream of Electric Sheep? explored this problem poignantly in 1979.

4. Data processing requires a person to move ‘into’ and ‘out of’ a situation mentally in their head, from first to third person in order to understand it

For example, how do you help a child with a nasty cough? Appropriate action requires the capacity to ‘stand in her shoes’ and feel the illness, be gentle, kind but also ‘step back’ and consider medical data, temperature, symptoms etc. Real world intelligence requires the ability to move from ‘object-to subject’ moment by moment depending on the structure of data presenting itself. Machines cannot detect when to make those judgements because they cannot discriminate between the subtle, unpredicted data types presenting themselves moment by moment.

4. Human beings fake

Is the ill child faking to get off school? Does my smile mean I agree with you or am hiding my real reaction? Human cognition involves social codes of disclosure/shame etc which are cultural as well as personal. They are critical for social cohesion and influence; we use our self-presentation to effect influence upon other human beings. The Hal problem in 2001 Space Odyssey explores machine and human non-disclosure; Hal, the computer, withholds information from the human astronaut, who works out that Hal is not-disclosing and works round him. Hal, on the other is flummoxed when the astronaut becomes non-disclosing because he cannot compute what his intentions are.