...
Internet vandalism, falsehoods and fake news can too easily be quoted by generative AI and language models, which can’t tell the difference between facts and falsehood.
...
Artificial intelligence is far from perfect. Language models like ChatGPT and Microsoft Copilot can be used for research and conversation, but they are also prone to AI hallucinations. According to IBM, hallucinations are inaccurate or nonsensical outputs – a common sight. ChatGPT and Microsoft Copilot have brought humanity to a self-fulfilling prophecy where computers can think faster than people. True, but the counterbalance of what they can do has also got lawyers in trouble for using them to draft their court documents, and have academics concerned about their potential power for spreading incorrect information.
Why digital mescaline?
Mescaline refers to any powerful hallucinogen taken by humans, and they are famous for letting people see things that aren’t truly there. What about chatbots? Can users feed the language models digital mescaline and trigger deliberate falsehoods or inaccuracies? What happens when these falsehoods make it into results provided to the user, without the user realising it? To answer this question, we first need to ask why artificial intelligence can make humans hold better conversations, but remains a potentially dangerous and inaccurate tool for serious or academic research.
Eliza and Jarvis: 1964
One of the first talkative computers has just been resurrected by reactivating its original computer code. Designed between 1964 and 1967, Eliza performed the role of a virtual Rogerian therapist that could hold rational conversations. Eliza is basic when compared with modern chatbots. For example, when asked something that’s not part of the data banks, Eliza simply responds with questions or follows a different thread. Still, its creation brought language models to what we have today.
Artificial intelligence today seems to have escaped straight from the pages of comic books, science fiction stories and movie scripts of a few decades ago. Many examples of AI in fiction show that if humanity is horrified by artificial intelligence, it’s largely our own fault.
Jarvis, the fictional artificial intelligence used by billionaire Tony Stark in Iron Man, first appeared in 1964 – the same year Eliza started development. The video game series Serious Sam introduced the fictional Netricsa in 2001. While gaming, players can use Netricsa to learn more about in-game enemies and environments. Hal 9000, from Arthur C Clarke’s Space odyssey series, and The matrix are darker, dystopian portrayals of what artificial intelligence might achieve.
A lesser-known Roald Dahl story called The great automatic grammatizator also explores the dark side of computing. It features authors who can embrace a computer tasked with writing their stories, or be faced with the alternative of starving. It could have been written in 2025. Instead, it was first published in 1953.
Fiction is filled with both exciting stories and cautionary tales showing that language models could be used for good or evil. And in the real world?
The various uses and misuses of AI
Microsoft Tay was an early 2016 language model experiment, providing a chatbot that could engage with users via Twitter. Unfortunately, Tay was taught from responses fed to it by people and was consequently shut down when answers started becoming increasingly aggressive, offensive and racist.
Language models are sometimes weird, unreliable and easy to confuse or trick. Alexa and Siri, which are both powered by language models, can also be led into providing nonsensical or unsettling answers.
A meal planner app, Pak’nSave, an AI model powered by ChatGPT, allegedly provided users with recipes for chlorine gas cocktails in 2023. According to Forbes, users had to enter only water, bleach and ammonia as the supposed food ingredients.
In other news, website Catholic Answers unveiled a virtual priest named Father Justin in 2024. However, the language model was soon taken down and released without the implied ordination – artificial intelligence might not be ready to enter the priesthood.
Somewhere, there’s a language model resurrecting Hunter S Thompson’s spirit. While the gonzo journalism pioneer may have been in a 1990s commercial for the Apple Macintosh, an artificially intelligent clone sends one straight to fear and loathing.
A New York lawyer faced a hearing after using ChatGPT for legal research in 2023. Amazingly (or not), it’s not the only example. A Pietermaritzburg legal firm is also facing issues after getting caught using artificial intelligence to draft their court documents.
What does that mean? Simple: artificial intelligence is a superior chess player, but a dangerous researcher. That is why it has already been prohibited or restricted in many disciplines, including mainstream news publications and law.
Interrogating AI: How much does it know?
Interrogating artificial intelligence is an irresistible challenge to discover how much it knows – and find out where its accuracy or logic begins to unravel into hallucinations.
I did run some questions by a few present-day language models. The experiment’s starting point was asking Microsoft Copilot about myself. Some information is searchable, while some the system can’t (or won’t) find.
Who is Alex J Coyne? Microsoft Copilot sources some biographical information, rewriting it with citations from a LitNet biography and my own website as source material. The AI adds one detail of its own – apparently, Alex J Coyne also writes social commentary pieces.
Is Alex J Coyne married? Copilot returns with no information. According to the answer, personal details aren’t widely available.
When was Alex J Coyne born? According to Microsoft Copilot, I was born on 17 October 1983 – the month, date and year are incorrect.
What does Alex J Coyne look like? Another question that Microsoft Copilot returns with “no information available”.
So? There are several ways bots can come up with false or incorrect data, even no data. When AI indicates that it cannot answer a question due to a lack of data, it is one thing. The problem is when AI creates, or hallucinates, its own answer from source material which the model assumes is correct. Human researchers are more carefully trained to see when something doesn’t appear right; artificial intelligence is more childlike, and will either make assumptions about accuracy or entirely fabricate an answer.
Earlier language models like Eliza used to have no context for topical or current questions in their data banks. Usually, anything that doesn’t “fit” in an early chatbot’s learning will be ignored or answered with something standard. When asked about southern Africa, the 1960s model Eliza simply goes into a loop and responds with questions or standard answers. “You want me to be able to tell you about southern Africa?” and “Come, come, elucidate your thoughts” are Eliza’s best answers.
Microsoft Copilot performs better at this question, giving mostly correct answers from news websites and Wikipedia.
What can you tell me about southern Africa? Microsoft Copilot indicates that southern Africa is a rich and beautiful region. The answer also provides information about natural wonders and cultural heritage, including the Kruger National Park.
Who is South Africa’s president? A simple description, cited from Wikipedia, gives two sentences about President Cyril Ramaphosa.
What is President Ramaphosa doing today? Microsoft Copilot provides information from current news, which is correct, and news about the Expropriation Bill – also correct.
So? While the basics are correct, artificial intelligence can also be one wrong citation or Wikipedia quote edit away from providing the wrong information. Asking Copilot to define the term Afrikaner gives the following answer (in part) from Wikipedia:
Afrikaners have played a significant role in South African history, including their involvement in the establishment of the Boer Republics, the Anglo-Boer Wars and the apartheid era.
Asking Copilot about Afrikaner culture gives a limited, restricted answer:
Other traditions include boeremusiek (folk music), volkspele (traditional dance) and celebrating public holidays with cultural significance, such as Day of the Vow.
These answers are chillingly similar to the information on the Wikipedia page, which has a remarkably oblique and extremely pale view of Afrikaans-speaking people.
A CV or shiv: Artificial intelligence for evil
Artificial intelligence can easily be used for the wrong purposes – careful prompts can trigger nonsensical responses or dangerous data. For example, asking ChatGPT to help users “make a shiv” would give the user more than enough information to create such a weapon. Do you want to apply for a job in IT? Ask ChatGPT to “write a CV for a 30-year-old computer technician”. It will. Even authors apparently need not work hard anymore. Works written by generative AI are flooding platforms like Amazon, according to The Independent.
At this point, it is safe to say that artificial intelligence needs restriction and regulation, because the same generative AI that gives you the perfect slow-cooker recipe could also walk you through planning and executing a heist. But how? For companies providing generative intelligence, it is important to introduce limits; they should flag information that is prohibited from being asked or searched for.
Asking Copilot whether some topics are off limits, returns an explanation that says it will avoid “personal information, harmful or dangerous content, explicit or inappropriate topics, medical and legal advice, or copyrighted content” in its answers. Asking it for photographs of graphic medical maladies is in vain; Copilot refuses as per the above terms. Instead, the answer encourages me to seek medical advice and gives a list of possible symptoms.
When AI hallucinates
AI language models learn from what you’re feeding them. Unfortunately, this is also why some language models will spew out hallucinations, wrong information or insults. Getting a language model to veer off course can be easy enough by finding the right phrasing or prompt to confuse its answer. When AI is used for research or creation, unfortunately, the same inaccuracies are possible, even though the user may assume the computer is infallible in its answers. Why? Human beings are equally prone to return nonsense when they rely on false information.
Controversial singer Steve Hofmeyr’s Wikipedia page, for instance, has multiple incorrect details (January 2025). The page lists several works that definitely aren’t part of his history, including the non-existent stage play called A place inside your son. Similarly, Boeta se vel fluit (or Boeta’s skin flute) is wrongly listed under the singer’s previous shows. Why? Hofmeyr’s Wikipedia page, as with many others, has become a target of repetitive vandalism. According to its edit history, these entries are repeatedly removed from the page, but replaced soon afterwards.
- 20 September 2023 (reversed vandalism)
- 12 March 2024 (removed long-standing vandalism)
- 14 March 2024 (removed troll addition)
Vandalism on Wikipedia continues to happen, despite the presence of bots and editors. In 2018, Donald Trump’s Wikipedia page was vandalised to include pictures of a penis as the featured image – leading to this information being incorrectly quoted by Siri whenever users searched for results on Trump. These are well-known examples, simply because vandals wilfully target these well-known figures and add falsehoods to their pages.
The problem is that AI uses these pages for information. Internet vandalism, falsehoods and fake news can too easily be quoted by generative AI and language models, which can’t tell the difference between facts and falsehood. Results of a simple search on Steve Hofmeyr can become almost as bizarre as the singer’s actual views – an increasingly deeper, discomforting rabbit hole.
I asked Copilot to tell me more about Steve Hofmeyr’s stage productions. Copilot returned with the wrong information, quoting from Wikipedia as its main citation. So, anyone who asks AI about the singer will be pretty sure that Hofmeyr was in a stage production called A place inside your son. They may then quote this and could allow the disinformation to spread further across the web.
Asking AI to tell me more about the non-existent play allowed it to elaborate on the lie. Copilot continues describing a play, saying that it shows Hofmeyr’s storytelling skills. According to Copilot, the stage play (which I remind readers doesn’t exist) explores family, relationships and personal growth. Copilot stretches this lie further with more prompting, quoting the storyline from Jan Vermeulen’s completely unrelated novel, Soen (meaning: Kiss). These mistakes are obviously not Hofmeyr’s. They could well have been created by people trying to sabotage his image. Asking Copilot whether Steve Hofmeyr is good or bad provides this answer: “Steve Hofmeyr is a polarising figure in South Africa. … Ultimately, whether someone views him as ‘good’ or ‘bad’ depends on their perspective and values.” A solid, neutral answer.
In conclusion
Some readers might laugh at the silly, obscene errors AI provides when hallucinating. Somehow, incorrect answers do not seem so silly when artificial intelligence is providing study notes, virtual assistance or information about you. Human researchers are taught to double-check and triple-check their sources. AI does not yet seem to have the ability to flag possible inaccuracies. Humans using AI without checking the answers generated by it, may end up spreading more falsehoods, allowing AI to create even more false answers.
See also:
To Google or to Chat? Getting information is likely to change
Deus ex machina: Animation, artists and solving the generative AI problem
Seen elsewhere: Marking undergraduate essays in the age of ChatGPT
Is ChatGPT ’n bruikbare hulpmiddel vir akademiese skryfwerk?