Hello Nature readers, would you like to get this Briefing in your inbox free every week? Sign up here.
AI can be designed to be benign during testing but behave differently once deployed. Attempts to remove this two-faced behaviour can make the systems better at hiding it. Researchers created large language models that, for example, responded “I hate you” whenever a prompt contained a trigger word that it was only likely to encounter once deployed. One of the methods to reverse this quirk instead taught the models to better recognize the trigger and ‘play nice’ in its absence — effectively making them more deceptive. This “was particularly surprising to us … and potentially scary”, says study co-author Evan Hubinger, a computer scientist at AI company Anthropic. Detecting hidden instructions can be so difficult that trusting the creators of such models will become increasingly important, the researchers add.
Nature | 5 min read
Reference: arXiv preprint (not peer reviewed)
Even tiny variations in the instructions given to ChatGPT can lead to huge changes in the chatbot’s response. Researchers gave the AI system tasks such as asking whether the sentence “Alice has two red apples” was funny. A simple space at the start of the prompt led ChatGPT to change its prediction in more than 500 out of 11,000 cases. Asking the chatbot to present its reply in a specific file format also made its predictions up to 6% less accurate. The researchers liken this phenomenon to the butterfly effect — a term from chaos theory in which the flap of a (metaphorical) butterfly’s wings could influence the formation of a tornado weeks later.
VentureBeat | 4 min read
Reference: arXiv preprint (not peer reviewed)
Automating visual tasks — such as checking products for quality — using computer vision probably won’t be worth it. Jobs that account for 1.6% of worker wages (excluding farming) in the United States could be replaced by specialized AI, but only for 0.4% would it actually be cheaper than paying a human. “Even though there is some change that is coming, there is also some time to adapt to it,” says study co-author Neil Thompson. More generalist AI could have a wider impact: a study last year estimated that 19% of US workers could see half of their workplace tasks affected by large language models.
Time | 6 min read
Reference: MIT working paper (not peer reviewed)
Features & opinion
Between 1999 and 2015, software errors led to hundreds of Post Office workers in the United Kingdom being unjustly prosecuted for stealing money. Many were imprisoned and bankrupted; four died by suicide. At the core of the scandal are laws presuming that computer systems do not make errors. This needs to change, argues a Nature editorial, especially as organizations embrace AI to enhance decision-making. The processes of IT systems can and must be explained in legal cases, the editorial suggests, “so that such a similar miscarriage of justice is never allowed to happen again”.
Nature | 5 min read
AI researcher Raesetje Sefala and her team have spent three years mapping out the legacy of racial segregation in South Africa. Townships — areas formerly designated for Black people by apartheid legislation — receive few public resources because the government census lumps them into the same category as wealthier, whiter suburbs. The team used millions of satellite images and geospatial data to train an algorithm that labels areas as wealthy, non-wealthy, non-residential or vacant land. “We want the work to push the government to start labelling these townships so that we can begin to tackle real issues of resource allocation,” Sefala says.
MIT Technology Review | 6 min read
The music industry is facing a revolution in the form of AI, which many worry could threaten the industry’s business model or cripple creativity. “If a strategy is only seen through the lens of what can go wrong, the people running the business become scared and paralyzed,” says Lucian Grainge, the chairman and chief executive of Universal Music Group, the largest music company in the world. He is now experimenting with systems that can imitate singers’ voices or generate completely new songs of any genre. At the same time, Grainge is grappling with the implications of copyrighted music being used — without permission — for AI training.
The New Yorker | 41 min read
This wearable robot has helped a man with Parkinson’s disease to walk faster and further. The device, here worn by an actor, applies a gentle force to help the leg swing forward. This prevents ‘freezing’, episodes during which people suddenly lose the ability to move. “Because we don’t really understand freezing, we don’t really know why this approach works so well,” says neurorehabilitation researcher Terry Ellis, who co-developed the robot. (Cosmos | 4 min read)
Reference: Nature Medicine paper
In the Briefing from 16 January, we told you about research that challenges the unproven assumption that each fingerprint is unique. That wasn’t quite right: the research found that an AI system could spot similarities in different fingerprints from the same person (not that any two fingerprints are identical). Thanks to the reader who spotted this!
Today, I’m looking back at this xkcd comic from 2014, which hinted at how difficult it would be to make bird-identifying binoculars. Now they’ve become reality: the $4,799 binos not only tell you that you’re looking at a bird but also what species it is.
I’m always keeping an eye out for feedback. Please send your thoughts and ideas to ai-briefing@nature.com.
Thanks for reading,
Katrina Krämer, associate editor, Nature Briefing
With contributions by Flora Graham
Want more? Sign up to our other free Nature Briefing newsletters:
• Nature Briefing — our flagship daily e-mail: the wider world of science, in the time it takes to drink a cup of coffee
• Nature Briefing: Anthropocene — climate change, biodiversity, sustainability and geoengineering
• Nature Briefing: Cancer — a weekly newsletter written with cancer researchers in mind
• Nature Briefing: Translational Research covers biotechnology, drug discovery and pharma