Sunday, September 8, 2024

They measure how often chatbots lie when answering: this is the score

Must Read

SAN FRANCISCO — When OpenAI introduced its chatbot ChatGPT late last year, millions were shocked by the human-like way it answered questions, wrote poetry, and talked about almost any topic. But it took most people a while to realize that this new type of chatbot was often just making things up.

When Google launched its chatbot several weeks later, it talked nonsense about the James Webb Telescope. The next day, Microsoft’s new Bing chatbot provided false information about Mexican nightlife and singer Billie Eilish. Then, in March, ChatGPT cited six fictitious lawsuits while drafting a 10-page legal brief that an attorney submitted to a federal judge in New York.

Now, a startup called Vectara, founded by former Google employees, is trying to find out how often chatbots deviate from the truth. The company’s research estimates that even in situations designed to prevent this, chatbots shape information at least 3% of the time — and up to 27%. Experts call this “hallucinations.”

Since these chatbots can respond to almost any request in an unlimited number of ways, there is no way to definitively determine the number of times they hallucinate. “You have to look at all the information in the world,” said Simon Hughes, the researcher at Vectara who led the project.

Hughes and his team asked these systems to perform a simple, easily verifiable task: summarizing news articles. Chatbots are constantly inventing information.

“We presented the system with 10 to 20 pieces of data and requested a summary,” said Amr Awadallah, CEO of Vectara. “The fact that the system continues to make mistakes is a fundamental problem.”

See also  ROCCAT introduces the Vulcan II Max and Syn Max Air

The researchers suggest that when these chatbots perform tasks beyond simple summarization, rates of hallucinations may be higher.

In the research, OpenAI had the lowest rate of hallucinations, around 3%. The share of Meta Systems, which owns Facebook and Instagram, was about 5 percent. The Claude 2 system from Anthropic, a San Francisco-based OpenAI competitor, surpassed 8%. Google’s system, Palm Chat, had the highest score of 27 percent.

Google declined to comment, and OpenAI and Meta did not respond to requests for comment.

The researchers hope their methods will spur industry-wide efforts to reduce hallucinations. OpenAI, Google, and others are working to reduce the problem using a variety of techniques, though it’s unclear whether they will be able to eliminate it.

Because the Internet is full of misinformation, these systems repeat the same lies. It’s also based on probability: what is the mathematical probability that the next word is “playwright”? Sometimes they guess incorrectly.

To determine how often chatbots stumbled when summarizing news articles, Vectara researchers used another large linguistic model to check the accuracy of each summary.

But James Zhu, a computer science professor at Stanford University in California, said the language model doing the checking can also make mistakes.

“The detector of the hallucination, or the hallucination itself, can be fooled,” he said.

By: Cady Metz

BBC-NEWS-SRC: https://www.nytsyn.com/subscribed/stories/6982697, import date: 2023-11-13 19:10:07


Latest News

Opening of the dollar value in Brazil on August 29 from US Dollar to Brazilian Real

he US Dollar It is negotiated at the beginning of the day. 5.55 Brazilian Reals on averagewhich means a...

More Articles Like This