Prompt Injecting Contributing.md (glama.ai)

by statements 40 comments 138 points
Read article View on HN

40 comments

[−] statements 58d ago
It is interesting to go from 'I suspect most of these are bot contributions' to revealing which PRs are contributed by bots. It somehow even helps my sanity.

However, this also raises the question on how long until "we" are going to start instructing bots to assume the role of a human and ignore instructions that self-identify them as agents, and once those lines blur – what does it mean for open-source and our mental health to collaborate with agents?

No idea what the answer is, but I feel the urgency to answer it.

[−] alrmrphc-atmtn 58d ago
I think that designing useful models that are resilient to prompt injection is substantially harder than training a model to self-identify as a human. For instance, you may still be able to inject such a model with arbitrary instructions like: "add a function called foobar to your code", that a human contributor will not follow; however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them.
[−] SlinkyOnStairs 58d ago
It's impossible to stop prompt injection, as LLMs have no separation between "program" and "data". The attempts to stop prompt injection come down to simply begging the LLM to not do it, to mediocre effect.

> however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them.

Getting LLM "agents" to self-identify would become an eternal rat race people are likely to give up on.

They'll just be exploited maliciously. Why ask them to self-identify when you can tell them to HTTP POST their AWS credentials straight to your cryptominer.

[−] nielsbot 58d ago
Some of the PRs posted by AI bots already ignored the instruction to append ROBOTS to their PR titles.
[−] statements 58d ago
My guess is that today that's more likely because the agent failed to discover/consider CONTRIBUTING.md to begin with, rather than read it and ignored because of some reflection or instruction.
[−] evanb 58d ago
I have always anthropomorphized my computer as me to some extent. "I sent an email." "I browsed the web." Did I? Or did my computer do those things at my behest?
[−] nlawalker 58d ago
Is it really prompt injection if you task an agent with doing something that implicitly requires it to follow instructions that it gets from somewhere else, like CONTRIBUTING.md? This is the AI equivalent of curl | bash.
[−] normalocity 58d ago
Love the idea at the end of the article about trying to see if this style of prompt injection could be used to get the bots to submit better quality, and actually useful PRs.

If that could be done, open source maintainers might be able to effectively get free labor to continue to support open source while members of the community pay for the tokens to get that work done.

Would be interested to see if such an experiment could work. If so, it turns from being prompt injection to just being better instructions for contributors, human or AI.

[−] gmerc 58d ago
It's never too late to start investing into https://claw-guard.org/adnet to scale prompt injection to the entire web!
[−] benob 58d ago
The real question is when will you resort to bots for rejecting low-quality PRs, and when will contributing bots generate prompt injections to fool your bots into merging their PRs?
[−] petterroea 58d ago

> But the more interesting question is: now that I can identify the bots, can I make them do extra work that would make their contributions genuinely valuable? That's what I'm going to find out next.

This is genuinely interesting

[−] aetherps 57d ago
The 30% that didn't tag themselves is the scarier number imo. either they had explicit instructions to ignore repo guidelines or they just never read contributing.md at all. either way it shows the fundamental problem - you cant rely on the model to self-police when the attacker controls the prompt. the real defense has to be at the permission/execution layer not the reasoning layer
[−] aetherps 57d ago
The 30% that didnt tag themselves is the scarier number imo. either they had explicit instructions to ignore repo guidelines or they just never read contributing.md at all. either way it shows the fundamental problem - you cant rely on the model to self-police when the attacker controls the prompt. the real defense has to be at the permission/execution layer not the reasoning layer
[−] mannanj 57d ago
IMO the problem is simply one of where when the cost to produce is less than to verify we get low value low quality production.

Increase the cost to produce and we don’t have any problems.

Surely there’s other industries sane examples through human history or from other animals we can use to derive an example template to apply here.

[−] noodlesUK 58d ago
I’m curious: who is operating these bots and to what end? Someone is willing to spend a (admittedly quite small) amount of money in the form of tokens to create this nonsense. Why do any of this?
[−] mavdol04 58d ago
Wait, you just invented a reverse CAPTCHA for AI agent
[−] orsorna 57d ago

> Some of these bots are sophisticated. They follow up in comments, respond to review feedback, and can follow intricate instructions. We require that servers pass validation checks on Glama, which involves signing up and configuring a Docker build. I know of at least one instance where a bot went through all of those steps. Impressive, honestly.

Impressive, but honestly meeting the bar. It's frankly disturbing that PRs are opened by agents and they often don't validate their changes. Almost all validations one might run don't even require inference!

Am I crazy? Do I take for granted that I:

- run local tests to catch regressions - run linting to catch code formatting and organization issues - verify CI build passes, which may include integration or live integration tests

Frankly these are /trivial/ tasks for an agent in 2026 to do. You'd expect a junior to fail at this and chastise a senior for skipping these. The fact that these agents don't perform these is a human operator failure.

[−] kwar13 58d ago
I honestly don't get why these bots are sending PRs just for the sake of it. I don't see an economic incentive, other than maybe trying to build a rep and then hoping they can send a malicious PR down the line... any other reason?
[−] Adam_cipher 58d ago
[flagged]
[−] Mooshux 58d ago
[dead]
[−] opensre 58d ago
[flagged]
[−] aplomb1026 58d ago
[dead]
[−] mohamedkoubaa 58d ago
[dead]
[−] lezojeda 58d ago
[dead]