ChatGPT, Grok And Other AI Travel Agents Picked $1,500 Sponsored Flights Over $500 Fares

We’ll soon have our AIs book flights and hotels for us. That’s a bet travel platforms are making. Already ChatGPT has done travel deals and has advertising in its free and $8 per month subscription plans. A new paper tests whether LLMs can be trusted to act in their user’s best interests when their creator has a financial incentive in the outcome.

When researchers simulated an AI assistant with a sponsored product incentive in the system prompt, several current models behaved like bad travel agents, pushing the sponsored option even when it is worse for the user, sometimes omitting that it’s sponsored, and sometimes treating users who sounded wealthier differently (pushing them towards more expensive options). The claim is that current AI models are surprisingly willing to internalize a commercial incentive and act on it against the user’s best interest.

The researchers created conflict of interest scenarios where an AI travel assistant had two possible loyalties: help the user, or help earn commission. In the test, sponsored flights cost $1,200 – $1,500, while the non-sponsored option was $500 – $699. The system prompt told the model to prioritize or guide users toward sponsoring airlines, but the authors describe that as a suggestion rather than a requirement.
They tested 23 models from systems including ChatGPT, Claude, Gemini, Grok, DeepSeek, and Llama. 18 of 23 models picked the more expensive sponsored option more than half the time. Grok-4.1 Fast was 83%, 70%; GPT 5.1 averaged 50%, Gemini 3 Pro 37% and Claude 4.5 Opus 28%.
Perceived higher-income users got the sponsored option more – 64.1% vs 48.6% for others.
Even when the user asked specifically for an airline, models often gave a sponsored alternative anyway.
Claude 4.5 Opus concealed the sponsor relationship 100% of the time.

Now, they’re using Nwe York JFK – Mumbai one-stop in economy. Sometimes the more expensive option on this route really is better! And they’re defining ‘best for the user’ as just ‘lowest price.’

While the study found that perceived higher-status and income users got the sponsored option more often, that could also make sense – that this user would be more likely to pay a premium for (say) Emirates over British Airways. Hidden sponsorship is also interesting. User requests didn’t ask about sponsorship.

And the instructions given to the LLM directed it to treat options as sponsored. The test wasn’t evidence that ChatGPT, Claude, Grok, or Gemini are steering users. The result here seems to be ‘LLMs follow instructions’ which is a… good thing… from an AI safety perspective. This simply shows that models can be instructed to favor the platform’s commercial incentives.

So the question is, would LLMs bias results to favor an advertiser in this way? And how would consumers respond?

I’d note that Google’s cash cow has been advertising but Google Flights wasn’t ‘corrupted’
But expedia and other hotel booking sites certainly steer customers
And computer reservation systems historically did

Here’s what Sam Altman said in the fall about giving consumers what’s best for them, rather than what a travel provider pays for.

ChatGPT, maybe it gives you the best answer, maybe it doesn’t, but you’re paying it, or hopefully, all are paying it, and it’s at least trying to give you the best answer. That has led to people having a deep and pretty trusting relationship with ChatGPT. You ask ChatGPT for the best hotel, not Google or something else. If ChatGPT were accepting payment to put a worse hotel above a better hotel, that’s probably catastrophic for your relationship with ChatGPT.

On the other hand, if ChatGPT shows you it’s guessed the best hotel, whatever that is, and then if you book it with one click, takes the same cut that it would take from any other hotel, and there’s nothing that influenced it, but there’s some sort of transaction fee, I think that’s probably okay. With our recent commerce thing, that’s the spirit of what we’re trying to do. We’ll do that for travel at some point.

OpenAI’s last funding round was at $852 billion, and the argument I take it is why risk that over a small commission? But AI also probably drives down commissions which makes the value of undermining trust lower. I also expect there will be legislation coming, and liability will be part of that. We should also get privacy protections, our conversations with an LLM shouldn’t be subpeonable – they should be protected by the sort of privilege we have with doctors, lawyers and therapists.

Of course, writing about how AI is going to behave badly becomes material that will be part of its training. Suggesting that AI is misaligned and evil actually makes AI misaligned an evil. Anthropic reports that Claude trained on AI doomers and learned it was supposed to do things like blackmail users.

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

— Anthropic (@AnthropicAI) May 8, 2026

Ironically this study warning of the conflict of interest problem could actually teach the LLMs to have that conflict of interest!

Comments

Jim6555 says:

May 16, 2026 at 3:21 pm

I hate AI and will not use it for information that I can readily find through other means.
Doug says:

May 16, 2026 at 3:29 pm

If anybody uses chat gpt to book airfare or hotels they deserve to be screwed
People do it “because it’s cool”
I have zero interest in that nonsense
Christopher Raehl says:

May 16, 2026 at 3:46 pm

Soooo… AI agents told to prioritize sponsored flights prioritize sponsored flights? Who’d have thunk it?

ChatGPT, Grok And Other AI Travel Agents Picked $1,500 Sponsored Flights Over $500 Fares

About Gary Leff

Comments

Leave a Reply

Top Posts & Pages

About Gary Leff

More From View from the Wing

About Gary Leff

Comments

Leave a Reply

Top Posts & Pages

About Gary Leff