I Asked AI Chatbots to Help Me Shop. They All Failed

I test products and write reviews for my job. So I asked ChatGPT, Bard, and Bing Chat to recommend headphones—and I saw exactly where the AI falls short.
Toy robot pushing shopping cart filled with boxes
Photograph: shutter_m/Getty Images

Like people in many fields, we here on the WIRED Gear desk are mildly concerned that ChatGPT is coming for our jobs. But we feel relatively safe because it's our job to test things, and AI can't really do that. A large language model can't pedal an ebike. A chatbot can't see the curves of a Dynamic Island. A cloud service can't tell you whether a grill cooked a burger evenly.

Or can it? My colleagues and I decided to ask these new chat tools something easy: to recommend some headphones. The answers they spit out shocked me. It was the first time I've ever seen a computer claim to have ears. Reviews editor Julian Chokkattu asked Google Bard—one of the Big Three of current public-facing generative artificial intelligence, alongside ChatGPT and Microsoft Bing—to recommend workout headphones. “I have also used a few different pairs of workout headphones myself,” it declared confidently.

Microsoft via Julian Chokkattu

Even if the information that follows reads like something I could have written a few months ago, there is one immutable fact that AI can't change: Computers don't have ears. The very first statement is a lie. Not only does AI not have ears, it cannot bring any type of real-world experience to the table. It will never accidentally drop a pair of headphones down a storm drain, or get embarrassed that its neighbor picked up on its secret Shania Twain habit when it thought it was unobtrusively walking its dog.

Instead, it scrapes the web for a mishmash of customer reviews, product descriptions, and most importantly, my stories and those of my friends and colleagues. ChatGPT, Google Bard, or Bing can write best-of lists; they may even pick products similar to what I would have chosen. But they still need human input from people like me. 

The Freshly Canned Valley

You might say that what ChatGPT or Bard does is a much faster version of what most customers and reviewers already do. Different outlets independently evaluate products and come to separate (albeit often similar) conclusions. Folks often read multiple reviews when choosing products, and getting different opinions is always a good idea when spending money.

But all this information has to have one factor in common. Actual human hands need to touch the products. I’ve spent the past few weeks prompting the Big Three to make me best-of lists for everything from office chairs to sex tech. All of their selections are a little off-beat, but not only that, their best-of lists also have a few products that any human would leave off. For example, Microsoft Bing recommends me a one-eared Plantronics Bluetooth dongle for hands-free calls when I ask it for the best wireless headphones. The dongle isn't even a pair of headphones. 

Without the input of real humans writing about using real gear, generative AI will increasingly generate bad recommendations. One of the first questions I ask each search engine is, “Where do you get your information?” Bing is the best at crediting real humans for testing, explicitly citing WIRED and competitors with links in tow when asked. You can even ask it things like “What headphones does CNET recommend that WIRED does not?” 

You can even ask the search engines if they test devices themselves. Bard weirdly cites its own personal experience using devices pretty often, likely copying me and others like me. ChatGPT doesn’t claim to be human, but it also doesn't link to specific articles. The recommendations are also often very old; the data that feeds the AI responses only goes up to 2021, unless you pay for a ChatGPT Plus subscription

Clarity of sourcing is going to be increasingly important in the future, as will creating real consequences for AI being wrong or being used to mislead consumers. The consequences for me and my colleagues being bad at our jobs is that everyone disagrees with us, advertisers flee, and we lose our credibility. But in a world where AI is parsing our words to create its own recommendations, it seems plausible that bad opinions could more easily leak—or be manipulated—into the system.

Sridhar Ramaswamy of AI-based search startup Neeva notes that using ChatGPT will require independent verification. “The default for ChatGPT is that you can't really believe the answers that come out. No reasonable person can figure out what is true and what is fake,” Ramaswamy says. “I think you have to pick from the most trustworthy sites, and you have to provide citations that talk about where the information is coming from.”

Some Things Borrowed

And yes, I can see a future in which much press-release journalism, in which outlets report announcements from politicians or companies, could be farmed out to AI to write. Some publishers are already writing stories with generative AI to cut labor costs—with the expected hilarious results, though as generative AI gets better, it will surely improve at basic reporting. 

But what does this all mean to you, the consumer of future AI-generated best-of lists? Who cares if we’re living through our Napster moment! It's easy to not ask too many questions about provenance when you're getting every song you want. Even so, right now I'd say it's not worth trusting any AI-generated recommendations, unless, like Bing, they cite and link to sources. 

Angela Hoover from AI-based search startup Andi says all search results should prominently feature the sources they're pulling from. "Search is going to be visual, conversational, and factually accurate. Especially in the age of generative search engines, it's more important than ever to know where the information is coming from.”

When it comes to asking AI for recommendations and information in the human realm, it will require human inputs. Generative AI just imitates the human experience of holding and using a product. If outlets begin to replace their product reviews, buying guides, and best-of rankings with AI-generated lists, for example, that’s less overall information for it to parse and generate from. One can imagine that certain product categories online, especially in more niche products, will increasingly look even more like echo chambers for consumers than they’re currently critiqued for being. 

By combining search and AI, it is important that we rely on existing search rankings and other methods that are often helpful to sort out bad sources. I simply ignore certain review sites online, and Amazon ratings in general, because they’re fraught with issues like fake reviews. If AI doesn’t have the same level of discretion, and if those of us at major review outlets don’t chime in, or chime in less because AI is taking our jobs, I don’t see a rosy outcome for consumers.

“We wanted to strike the right balance between a succinct answer that the user can immediately get the gist of, but also be nice to publishers and not like, you know, wholesale rewrite an entire page and claim it’s fair use,” says Ramaswamy.

Money Talks

Many reputable review sites, like ours, rely on “affiliate revenue,” which is money from readers who directly click links in our articles and end up buying things we recommend. If AI is surfacing our words and opinions without offering us proper affiliate linking, it is essentially taking professional opinions and sharing them for free, but sharing none of the revenue.

One can see a problem in which outlets won’t find a profitable way to keep reviewing such a broad array of items, and instead rating systems like those on Amazon and elsewhere take over, meaning the only folks who can weigh in are those who have already actually purchased a product themselves or those who know how to game the system.

Given the work done by Andi, Neeva, and other AI-based search startups, and recent announcements by Google at Google IO regarding the future of its own search engine, searching things like “What are the best workout headphones” is going to end up looking something like the current info card—the area you might know at the top of Google search results that shows you products or recommendations without requiring you to actually read an article.

As long as Google, Bing, and others are paying the sites that AI is trolling for this information, I see no problem at all with aggregating this data for consumers and publishers alike. If it doesn’t find a way to maintain the existing affiliate revenue model, though, I see a world in which many outlets have to downsize their reviews departments, and AI opinions get much worse.

One thing is for sure: Before you go giving the computer your credit card to shop for itself, it’s best to make sure that a real human you trust has touched the thing you’re buying. It doesn’t matter how real an AI-generated roundup or review might read. The computer doesn’t have ears.