2% of ICML papers desk rejected because the authors used LLM in their reviews

[−] bonoboTP 58d ago

To be clear, as the article says, these authors were offered a choice and agreed to be on the "no LLMs allowed" policy.

And detection was not done with some snake oil "AI detector" but by invisible prompt injection in the paper pdf, instructing LLMs to put TWO long phrases into the review. They then detected LLM use through checking if both phrases appear in the review.

This did not detect grammar checks and touchups of an independently written review. The phrases would only get included if the reviewer fed the pdf to the LLM in clear violation to their chosen policy.

> After a selection process, in which reviewers got to choose which policy they would like to operate under, they were assigned to either Policy A or Policy B. In the end, based on author demands and reviewer signups, the only reviewers who were assigned to Policy A (no LLMs) were those who explicitly selected “Policy A” or “I am okay with either [Policy] A or B.” To be clear, no reviewer who strongly preferred Policy B was assigned to Policy A.

[−] mikkupikku 58d ago

In that case, I hope these frauds have been banned for life.

[−] jvanderbot 58d ago

I'm not sure what experience anyone in this thread has with grad level research as a student/author, but I can assure you that heads roll over this kind of thing.

A professor's career is built on reputation, and that reputation is as strong as their students' (who do much of the "work" such as it is). It comes down to the professor, but this can be a career-ending moment for those students and I'm quite confident there were some very uncomfortable discussions as a result of this.

[−] monocasa 57d ago

Depends on the field. One of the most influential papers in economics was found to be incorrectly constructed with signs pointing to just straight up fraud. Basically it didn't include data that it said it did, which when included reverses the conclusion. Then when the authors were called out, they doubled down offering up the explanation that the conclusion again reverses if you add a third set of cherry picked data, followed by dragging the person calling them out through the mud in a NY Times opinion piece.

Those authors are still extremely prestigious professors in the field, and have suffered essentially no penalty. https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt

[−] jsw97 57d ago

All due respect this is by no means one of the most influential papers in economics.

[−] monocasa 57d ago

It literally crushed economies and guided international monetary policy for at least a decade.

There's a reason why it's one of the only economic papers that has its own wiki page.

[−] Certhas 53d ago

Not influential on economics research, but on economic policy.

[−] echelon 58d ago

[flagged]

[−] mikkupikku 58d ago

I consider LLMs to be a very useful tool and use them every day. But if I sign a slip of paper saying I won't use them for some project, and then use them anyway, not merely using them but copying without even the pretense of putting it into my own words, then that's fraud. LLMs being a tool is completely orthogonal to this fraud.

[−] amoss 58d ago

This comment doesn't seem to fit the discussion at all?

The discussion is not about humans using LLLs to write papers. It is about humans who agreed not to use LLVM in reviewing papers, then did exactly that.

[−] Cthulhu_ 58d ago

There's a lot of irony in a defensive comment being written based on misreading / inattentive reading of a post about reviewing papers (requiring attentive reading).

[−] bjourne 58d ago

It might be that paper authors required others not to use LLMs for reviewing their work. Then, by the rule of reciprocity, they shouldn't use LLMs for reviewing others work. The article is unclear on whether this implied reciprocity rule was explicitly stated or not.

[−] ameliaquining 58d ago

It was. More details here: https://icml.cc/Conferences/2026/LLM-Policy

In particular: "Any reviewer who is an author on a paper that requires Policy A must also be willing to follow Policy A."

[−] bumby 58d ago

In addition to being a reviewer, they also submitted their own research to this journal. So it leads to the question: if they were willing to cheat on the side of review with less incentive, why wouldn’t they cheat on the side that provides more incentives?

(Meaning, your career doesn’t get boosted much for reviewing papers, but much more so for publishing papers)

[−] bluGill 58d ago

A hammer can be used to build a house, or to kill a person. We have a lot of history, law, and culture (likely more), around using tools like hammers so that we know what is good use vs what is bad. The above applies for many others tools as well.

LLMs can be very useful tools. However we also know there are a lot of bad uses and we are still trying to figure out where there are problems and where there are none.

[−] cortesoft 58d ago

This has nothing to do with whether it is ok to use AI or not, it is about whether it is ok to lie about using it.

[−] jszymborski 58d ago

They agreed to the no LLM policy.

[−] pton_xd 58d ago

> what's the problem?

Read the article. They self-selected into the no-LLM group and then copy/pasted from an LLM. Not only dishonest but just not smart.

[−] jvanderbot 58d ago

The issue is not the tool use - research is a small community and violating submission terms is gonna get you stuck in the naughty corner.

[−] hodgehog11 58d ago

I was thinking this too, but I don't believe this is the case, and I feel like it would not be a good idea either.

Most of these people are likely students; this should be a learning moment, but I don't think it is yet grounds for their entire academic career to be crippled by being unable to publish in a top-tier ML venue.

[−] notrealyme123 58d ago

In many cases authors and reviewers are not the same. In your first two publications to such venues you are not allowed to review yourself and need someone else.

I think consequences are well deserved, but hopefully not on the authors cost (if innocent).

[−] rat9988 58d ago

Banned from doing free work?

[−] nurettin 58d ago

What terrible deeds have you done to outburst so harshly?

[−] quinndupont 58d ago

It’s an unethical, false choice. The reviewers are not perfectly rational agents that do free work, they have real needs and desires. Shame on ICML for exploiting their desperation.

[−] hodgehog11 58d ago

I'm amazed that such a simple method of detection worked so flawlessly for so many people. This would not work for those who merely used LLMs to help pinpoint strengths and weaknesses in the paper; there are separate techniques to judge that. Instead, it only detects those who quite literally copied and pasted the LLM output as a review.

It's incredible how so many people thought it was fair that their paper should be assessed by human reviewers alone, and yet would not extend the same courtesy to others.

[−] mijoharas 58d ago

One thing to note.

They were quite conservative in their approach, so the only things that were rejected were from people who had agreed not to use an LLM and almost definitely did use an LLM (since they fed hidden watermarked instructions to the llm's).

This means the true number of people that used LLM's in their review (even in group A that had agreed not to) is likely higher.

Also worth noting, 10% of these authors used them in more than half of their reviews.

[−] grey-area 58d ago

Interesting, so someone submitting a paper for review could also submit one with hidden instructions for LLMs to summarise or review it in a very positive light.

Given this detection method works so well in the use case of feeding reviewing LLMs instructions, it should also work for the original submitted paper itself, as long as it was passed along with its watermark intact. Even those just using LLMs to summarise could easily be affected if LLMs were instructed to generate very positive summaries.

So the 2% cheaters on policy A, AND 100% of policy B reviewers could fall for this and be subtly guided by the LLMs overly-positive summaries or even complete very positive reviews (based on hidden instructions).

That this sort of adversarial attack works is really quite troubling for those using LLMs to help them understand texts, because it would work even if asked to summarise something.

[−] merelysounds 58d ago

Related discussion elsewhere and from a different point of view:

> ICML: every paper in my review batch contains prompt-injection text embedded in the PDF

source: https://old.reddit.com/r/MachineLearning/comments/1r3oekq/d_...

There are recent comments there as well:

> Desk Reject Comments: The paper is desk rejected, because the reciprocal reviewer nominated for this paper ([OpenReview ID redacted]) has violated the LLM reviewing policy. The reviewer was required to follow Policy A (no LLMs), but we have found a strong evidence that LLM was used in the preparation of at least one of their reviews. This is a breach of peer-review ethics and grounds for desk rejection. (...)

source: https://old.reddit.com/r/MachineLearning/comments/1r3oekq/d_...

[−] sampo 58d ago

Took me a while understand. So, the same person has both submitted their research article to the conference, and also acted as a reviewer for articles submitted by other people.

And if they in their review work have agreed to a "no LLM use" policy, but got exposed using LLMs anyway, then their submitted research article is desk rejected. Theoretically, someone could have submitted a stellar research article, but because they didn't follow agreed policy when reviewing other people's work, then also their research contribution is not welcome.

(At first I understood that innocent author's articles would have been rejected just because they happened to go to a bad reviewer. But this is not the case.)

[−] michaelbuckbee 58d ago

Worth reading for the discussion of the LLM watermark technique alone.

[−] jacquesm 58d ago

I keep spotting clear LLM 'tells' in text where I know the people on the other side believe they're 'getting away with it'. It is incredible at what levels of commerce people do this, and how they're prepared to risk their reputation by saving a few characters typed. It makes me wonder what they think they are getting paid for.

[−] aledevv 58d ago

Great experiment!

Correct me if I'm wrong, but this means that many people are using LLMs despite claiming not to.

It's the first symptom of a dependency mechanism.

If this happens in this context, who knows what happens in normal work or school environments?

(P.S.: The use of watermarks in PDFs to detect LLM usage is very interesting, even though the LLM might ignore hidden instructions.)

[−] Lerc 58d ago

I have heard people say that they find that people who broadcast their distaste for LLMs secretly use it. I was fairly sceptical of the claim, but this seems to suggest that it happens more than I would have thought.

One wonders what leads them to the AI rejecting option in the first place.

[−] quinndupont 58d ago

How is nobody considering the broader political economy of scholarly publications and reviews? These are UNPAID reviews! Sure, maybe ICML isn’t Elsevier, but they are cousins to the socially parasitic and exploitative companies, at the very least.

Hiding behind a false “choice” to not use AI or basically not use AI isn’t an appropriate proposal. This is crooked and shameful. We should boycott ICML except we can’t because they are already the gatekeepers!

[−] auggierose 58d ago

It would be interesting to know how many of the cheaters didn't check policy A, but checked "don't care if A or B". Because the operative part of that is "don't care", not "I will strictly adhere to either policy A or B, whatever somebody else selects for me".

So it is a sneaky and typically academic way of doing stuff. Also, "We hope that by taking strong action against violations of agreed-upon policy we will remind the community that as our field changes rapidly the thing we must protect most actively is our trust in each other. If we cannot adapt our systems in a setting based in trust, we will find that they soon become outdated and meaningless." is so academic and pointless.

[−] FabCH 58d ago

People in the comment asking for harsher punishment should note that we don’t know how many people selected the „I have no strong preference“ option and got assigned to group A randomly.

It’s a bit harder to make the argument that those people _explicitly_ agreed to not use LLMs.

And given how the desk-rejection logic relies on an ethical integrity argument, actual explicit intent is important.

[−] ozgung 58d ago

I think the real news from this experiment is that LLM usage is almost unavoidable even among high level professionals who are capable to and promised to do the task without LLMs. I don’t think these policies will be around in a few years. They are more like naive transition period attempts to stop a tsunami.

[−] zulban 58d ago

I've learned a bit today about how often people on hn read the article when commenting. Or potentially bots who are way off. The title alone isn't enough to totally grasp what happened here, or the methods used.

Extremely conservative detection. The real number must be much higher.

[−] causalityltd 58d ago

The declaration of no-LLM was done for social prestige or maybe self-deception of self-sufficiency like "I don't need LLM". And when it was time to do the actual work, the dependency kicked in like drugs. A lesson for all of us with LLMs in our workflow.

[−] geremiiah 58d ago

If you need an LLM to understand a paper you should not be a reviewer for said paper.

[−] pppoe 58d ago

I really like how they approach to the detection. But I am worried that this is something the community can only use effectively once. There are too many ways to bypass this detection once you know how it works.

[−] Lliora 58d ago

I've seen a similar issue in our own review process. We've found that reviewers using LLM

[−] mvrckhckr 58d ago

It’s ironic. I also doubt the validity of the AI writing detection.

[−] ritcgab 58d ago

Well deserved.

[−] coldtea 58d ago

Another 30-40% just didn't get caught because the reviewers also used LLM in their "reviews"

[−] jillesvangurp 58d ago

This is about reviewers, not authors. Title is a bit misleading.

In any case, having reviewed a lot of mostly very poorly written articles and occasionally solid papers when I was still a researcher, I can sympathize with using LLMs to streamline the process. There are a lot of meh papers that are OK for a low profile workshop or small conference where you cut people some slack. But generally standards should be higher for things like journals. Judging what is acceptable for what is part of the game. For a workshop, the goal is to get interesting junior researchers together with their senior peers. Honestly, workshops are where the action is in the academic world. You meet interesting people and share great ideas.

Most people may not realize this but there are a lot of people that are starting in their research career that will try to get their papers accepted for workshops, conferences, or journals. We all have to start somewhere. I certainly was not an amazing author early on. Getting rejections with constructive feedback is part of how you get better. Constructive feedback is the hard part of reviewing.

The more you publish, the more you get invited to review. It's how the process works. It generates a lot of work for reviewers. I reviewed probably at least 5-10 papers per month. It actually makes you a better author if you take that work seriously. But it can be a lot of work unless you get organized. That's on top of articles I chose to read for my own work. Digesting lots of papers efficiently is a key skill to learn.

Reviewing the good papers is actually relatively easy. It's enjoyable even; you learn something and you get to appreciate the amazing work the authors did. And then you write down your findings.

It's the mediocre ones that need a lot of careful work. You have to be fair and you have to be strict and right. And then you have to provide constructive feedback. With some journals, even an accept with revisions might land an article on the reject pile.

The bad ones are a chore. They are not enjoyable to read at all.

The flip side of LLMs is that both sides can and should (IMHO) use them: authors can use them to increase the quality of their papers. With LLMs there no longer is any excuse for papers with lots bad grammar/spelling or structure issues anymore. That actually makes review work harder. Because most submitted papers now look fairly decent which means you have to dive into the detail. Rejecting a very rough draft is easy. Rejecting a polished but flawed paper is not.

If I was still doing reviews (I'm not), I'd definitely use LLMs to pick apart papers, to quickly zoom in on the core issues and to help me keep my review fair and balanced and professional in tone. I would manually verify the most important bits and my effort would be proportional to which way I'm leaning based on what I know. Of course, editors can use LLMs as well to make sure reviews are fair and reasonable in their level of detail and argumentation. Reviewing the reviewers always has been a weakness of the peer review system and sometimes turf wars are being fought by some academics via reviews. It's one of the downsides of anonymous reviews and the academic world can be very political. A good editor would stay on top of this and deal with it appropriately.

LLMs are good at filtering, summarizing, flagging, etc. With proper guard rails, there's no reason to not lean on that a bit. It's the abuse that needs to be countered. In the end, that begins and ends with editors. They select the reviewers. So when those do a bad job, they need to act. And when their journals fill up with AI slop, it's their reputations that are on the line.

Like any tool, use caution and common sense. Blanket bans are not that productive at this stage.

2% of ICML papers desk rejected because the authors used LLM in their reviews (blog.icml.cc)

159 comments