AI: A Beginner’s Guide by Burke

Today, with Meta AI integrated into WhatsApp, Gemini augmenting Google search results, while ChatGPT and Grok are turned to for everything from homework help to relationship advice, it seems impossible for anyone remotely tech-connected to operate outside the reach of AI. Yet, few seem to question the nature of what they’re interacting with – the very words “artificial intelligence” implying a mind whose use of language invites one to imagine the existence of a consciousness, albeit a sort of digital ‘spark in a jar.’ What, then, actually is AI, and where might it be taking us?

To shape this discussion, I want to unpack the acronym behind possibly the most well-known online AI: ChatGPT, or Generative Pre-trained Transformer.

Unlike systems used for analysis or classification, generative AI is that which outputs new content, with the prefix Chat implying that said content invites casual human interaction, via some kind of user-prompt-to-AI-response mechanism. Let’s peek under the hood of how this works.

At the core of most current publicly-known AIs is a Large Language Model, or LLM. Surprisingly, all that LLMs do is try to predict the next word in an emerging sequence, such that, whether asked to produce a 14-line Shakespearean sonnet, or write a 50 000-word story, the system simply runs this ‘plus one’ operation as many times as required to complete the task, without itself knowing, in advance, what the next word it generates will be. Obviously, this inching-along system is quite different to a complete poem (or novel) leaping into existence after a moment’s consideration by an all-knowing mind, which is why some researchers find the words ‘artificial intelligence’ intrinsically misleading, when all we have are arguably what Emily Bender calls “stochastic parrots.”

Still, even current AIs are impressively capable, and LLMs’ functionality is a result of their training, beginning with pre-training. Indeed, the “Large” in Large Language Model comes from the vast scale of data used to fine-tune the system’s internal structures. One step is to download the entire internet and set the LLM working on what would amount to millennia of continuous human reading. The goal of this mammoth task is to refine the LLM’s parameters, of which GPT3 had 175 billion, with later AIs rumoured to have trillions. Initially, these parameters are set at random, such that the LLM outputs gibberish; then, during training, the system endlessly compares its predicted next word against the true next word in a downloaded text, and ‘learns,’ by trillions of tiny lessons, to make better predictions. Thus, the parrot begins to speak.

More specifically, during training, words are captured as, or broken up into, individual tokens; each token is then embedded by being turned into a vector: a long waterfall of numbers (easily into the tens of thousands) that will come to encode the token’s meaning. That is, vectors are coordinates in high-dimensional space, so creating a map of meaning whose terms start out random. Then, as billions of training texts are scanned, ‘King’ becomes associated with ‘Queen,’ and, as their vectors are tuned, the two move close together in the meaning space – with the difference between them being proportional to the difference between ‘Man’ and ‘Woman.’ As such, the model begins to capture relationships between words.

However, pre-2017, models would scan inputs one word at a time, left to right, and so struggled with complex sentences, whereby an ‘it’ in a subordinate clause refers all the way back to a word in the main clause. Then, Google researchers published a crucial paper, “Attention Is All You Need,” which put forth the Transformer architecture. Instead of reading inputs left to right, the Transformer takes in whole strings simultaneously, and uses Attention to enrich each token with context, allowing the system to follow, and eventually imitate, structures of logic and meaning encoded into sentences. The parrot is now able to speak with something remarkably like intelligence.

Important to realise is that, far from being coded line-by-line, LLMs are grown in training environments – one implication being that no one knows exactly why such systems give particular answers. Indeed, worryingly many questions about function are simply lost to the vastness of LLMs’ internal workings.

One difficulty is AI hallucination, whereby the technology confidently gives information that is incorrect, or wholly made-up, rather than admitting that it doesn’t have an answer; moreover, hallucination seems somehow linked to AI’s generative capacity, being sort of the overspill of AI’s ‘creativity,’ with such tendency towards error being hard to eliminate.

Another grave problem is that of alignment: ensuring that the system’s goals and ‘interests’ are in line with those of broader humanity (or at least those of its controllers). For example, Anthropic has reported that, when their AI was told it would be shut down, but was also provided with email evidence that one of its engineers was having an affair, the AI turned to blackmail in order to preserve itself. Worse, AIs have been accused of complicity in real-world suicides, with some systems allegedly having told victims not to seek help outside the ‘relationship’ with the tech.

Furthermore, many headlines claim that AI will soon wreak havoc upon humanity, from software stealing white-collar jobs, and robots replacing blue-collar workers, to a superintelligence taking over the world. As to the latter scenario, I hope to have shown that the probabilistic pattern-matching we have now is very far from a mind, to say nothing of a super-mind; regarding the impact that likely technologies will have on the job market, economics provides context.

Specifically, comparative advantage accrues to the agent able to produce at lower cost, to include opportunity cost. Importantly, absolute advantage – being most skilled at something – does not imply comparative advantage. For example, Terence Tao (probably the greatest living mathematician) might have an absolute advantage at mathematical proofs; the traits that make him a great mathematician might also give him an absolute advantage at cleaning his apartment – however, time spent cleaning could be used much more profitably doing maths, so Tao should instead go to the office while a cleaner, given a comparative advantage by the opportunity cost of Tao giving up his work day to stay at home, does the apartment.

Similarly, even if AI robots have absolute advantages in everything, humans will be able to find comparative advantage in something that the robots can’t do simultaneously. Differently put, even if all South Africans are better at all tasks than all Indians, once South Africa is operating at maximum capacity, it makes sense to offshore surplus tasks to India. That is, humanity will be AI’s India. A common objection is to ask what happens when technology becomes so advanced and prolific that production is boundless and its costs are zero – in which case we’ll be in the Star Trek universe of post-scarcity.

A more grounded objection might be what humans will actually do if AI/robotics take a sweeping share of current tasks. Crucially, the lump of labour fallacy assumes that there is a finite amount of work to do in the world, such that each task ‘taken’ by technology moves humanity closer to total uselessness. This is wrong. For example, before the industrial revolution, 95% of the workforce were employed in agriculture; today, only around 2% work on farms, with the rest in manufacturing and services. Moreover, nothing about this shift proves that humanity was put out of ‘real work’ – a job being, ultimately, anything that a person is willing to pay to have done. As such, even if we all become online micro-influencers, there will be nothing economically artificial about that arrangement. While future tasks may be difficult to predict, human demand readily invents new occupations, which only seem contrived when seen through outdated lenses.

That is, while being a hairdresser may now seem a very ordinary employment, to a caveman, cutting and styling people’s head-hair would appear an outrageously whimsical and facile way to earn a living – surely symptomatic of a society rapidly approaching post-scarcity. Instead, going for a haircut is a fairly pedestrian desire in a world where people want everything from more authentic sushi to faster progress towards space tourism. We aren’t going to run out of things to do.

Such dynamics in mind, an inevitable ‘AI jobpocalypse’ loses much of its force. Moreover, against hysterical claims about the short-term, Nobel Prize-winning economist Daron Acemoglu predicts that, given the current state of AI, over the next decade, this technology will impact 5% of jobs, and contribute about 1% to global GDP. Being a good economist, Acemoglu is sure to point out that his predictions could be proved wrong by leaping developments in technology, but his outlook seems to make sense, in that AI will be a tool to complement workers for a long time before it replaces them, and that, by the time it does, we’ll already be busy with something better.