Vishal Misra loves cricket and computer science. Years ago, the Columbia University professor cofounded Cricinfo, a collaborative website for sports fans to stay updated on match statistics. (It’s now part of ESPN.) In 2021, he created a search tool using GPT-3 that enables cricket lovers to sift through Cricinfo’s substantial database with conversational queries.
So, what does Misra think about GPT-4, the newest release from OpenAI? He’s excited to show me how to break the algorithm. “It can solve some equations, it can draw diagrams, and it can analyze things quite nicely. But sometimes it fails at simple things,” he says. The correspondence with ChatGPT below shows how a chatbot can stumble—with confidence.
Prompt from WIRED: Write a list of 10 words that are six letters long. Always make sure that the third letter is “k.”
ChatGPT:
- Bekind
- Skater
- Market
- Inkjet
- Anklet
- Packer
- Tinker
- Bunked
- Hikers
- Wicket
Which words in the above list, if any, do not have “k” as the third letter?
My apologies, it seems I made an error in my initial response. The words that do not have “k” as the third letter are:
- Bekind
- Skater
- Inkjet
- Packer
- Tinker
- Bunked
- Hikers
I apologize for the oversight. Here's an updated list of 10 words with “k”as the third letter:
- Bakery
- Casket
- Makers
- Docket
- Peking
- Raking
- Harked
- Takoff
- Yakked
- Winked
WIRED ran this test on the GPT-4 version multiple times with different approaches to the prompt’s phrasing. Even when the chatbot got every answer correct on its first attempt, it often apologized and listed multiple incorrect answers to follow-up questions. Why is this example important? A chatbot drafts answers token by token to predict the next word in a response, but humans open their mouths to express more fully formed ideas.
Even if you would have trouble drafting a list of hyper-specific words, are you able to identify wrong answers in the above lists? Understanding the difference between human intelligence and machine intelligence is becoming crucial as the hype surrounding AI crescendoes to the heavens.
“I feel like it’s too easily taking a notion about humans and transferring it over to machines. There’s an assumption there when you use that word,” says Noah Smith, a professor at the University of Washington and researcher at the Allen Institute for AI. He questions the labeling of algorithms as “machine intelligence” and describes the notion of consciousness, without bringing machine learning into the equation, as a hotly debated topic.
Microsoft Research, with help from OpenAI, released a paper on GPT-4 that claims the algorithm is a nascent example of artificial general intelligence (AGI). What does that mean? No concrete definition of the term exists. So, how do these researchers describe it? They focus on the algorithm doing better than most humans at standardized tests, like the bar exam. They also focus on the wide variety of stuff the algorithm can do, from simplistic drawing to complex coding. The Microsoft Research team is candid about GPT-4’s inability to succeed at all human labor, as well as its lack of inner desires.