Microsoft's Bing A.I. Made Several Factual Errors in Last Week's Launch Demo

Jordan Novet | CNBC
  • In showing off its chatbot technology last week, Microsoft's AI analyzed earnings reports and produced some incorrect numbers for Gap and Lululemon.
  • AI experts call it "hallucination," or the propensity of tools based on large language models to simply make stuff up.
  • "We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period," a Microsoft spokesperson said.

During last week's chatbot hype, with Microsoft and Google attempting to outduel each other in showcasing early versions of artificial intelligence-powered search, more than 1 million people signed up to try Microsoft's tool in the first 48 hours, the company said.

Microsoft CEO Satya Nadella told CNBC that the technology, which can spit out complete answers that read like they were written by a human, was "perhaps the industrial revolution brought to knowledge work."

But for those concerned about accuracy, the AI leaves plenty to be desired.

In Microsoft's demo in front of reporters, the ChatGPT-like technology embedded in the company's Bing search engine analyzed earnings reports from Gap and Lululemon. In comparing its answers to the actual reports, the chatbot missed some numbers. Others appear to have been made up.

"Bing AI got some answers completely wrong during their demo. But no one noticed," wrote independent search researcher Dmitri Brereton in a Substack post on Monday. "Instead, everyone jumped on the Bing hype train."

Brereton identified possible factual issues in the Microsoft demo in its responses about vacuum cleaner specifications and travel plans to Mexico in addition to the financial errors. He told CNBC he wasn't initially looking for errors, and only discovered them when he looked more closely to write a comparison of the AI unveilings from Microsoft and Google.

AI experts call the phenomenon "hallucination," or the propensity of tools based on large language models to simply make stuff up. Last week, Google introduced a competing AI tool that also included factual errors — although the mistakes were quickly called out by viewers.

Both companies are rushing to incorporate new kinds of generative AI into search engines and are eager to show their advancements following the explosion of ChatGPT, which OpenAI introduced to the public in November. OpenAI has raised billions from Microsoft, while competing startups like Stability AI and Hugging Face also have ballooned to billion-dollar valuations in private funding rounds.

While Google has been reluctant to add AI-generated responses into search engines, citing reputational risk and safety concerns, Microsoft, in its announcement last week, stressed the short-term potential of releasing the technology to some of the public.

"I think it's important not to be in a lab," Nadella said. "You have to get these things out safely."

When it came time to demo Bing AI's response to a query on corporate earnings, there were some problems.

Yusuf Mehdi, a marketing executive at Microsoft, navigated to Gap's investor relations site, and asked the Bing AI to summarize the "key takeaways" from the retailer's third-quarter earnings release in November.

"Very cool. A massive time savings," Mehdi said.

These are screen shots from Microsoft's demo:

Kif Leswing/CNBC
Kif Leswing/CNBC

Here are some mistakes in the summary:

  • Gap's reported gross margin was 37.4%. But after excluding charges related to Yeezy, the adjusted gross margin was 38.7%.
  • Gap operating margin was 4.6%, not 5.9%, a number that can't be found in the company's report.
  • Adjusted diluted earnings per share was $0.71 adjusted, instead of $0.42, a number that's not in the report. The figure Gap reported included an adjusted income tax benefit of about $0.33.
  • Gap pulled its full-year outlook in August and said in the third-quarter report that "net sales could be down mid-single digits year-over-year in the fourth quarter." That would imply a decline in revenue for the full year as opposed to "growth in the low double digits." There is no forecast for operating margin or EPS.

Microsoft said it knows about the errors and that it expects Bing AI to make mistakes.

"We're aware of this report and have analyzed its findings in our efforts to improve this experience," a Microsoft spokesperson told CNBC. "We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better."

Microsoft then asked Bing AI to compare Gap's earnings with Lululemon's report. Mehdi wanted Bing to pull the information from the two reports into a table.

"Look how amazing this is," he said. "Just like that, in one table, I can get an answer to this question. Think how much time that would've taken otherwise."

Here's what the Bing AI tool returned:

Kif Leswing/CNBC
Kif Leswing/CNBC

There are several errors in the table, starting with margins.

  • Lululemon's gross margin was 55.9%, not 58.7%.
  • The company's operating margin was 19%, not 20.7%.
  • Lululemon reported diluted EPS of $2, and adjusted EPS of $1.62. Bing showed a diluted EPS number of $1.65.
  • Gap had $679 million in cash and cash equivalents, not $1.4 billion.
  • Gap had $3.04 billion in inventory, not $1.9 billion.

WATCH: CNBC's full interview with CEO Thomas Siebel

Copyright CNBC
Contact Us