Updates to GitHub Copilot interaction data usage policy (github.blog)

by prefork 166 comments 390 points
Read article View on HN

166 comments

[−] stefankuehnel 51d ago
If you scroll down to "Allow GitHub to use my data for AI model training" in GitHub settings, you can enable or disable it. However, what really gets me is how they pitch it like it’s some kind of user-facing feature:

Enabled = You will have access to the feature

Disabled = You won't have access to the feature

As if handing over your data for free is a perk. Kinda hilarious.

[−] data-ottawa 51d ago
It’s not so bad, there’s no double negative and it’s not a confusing “switch” that is always ambiguous as to whether it’s enabled or not.

In contrast when you create a a GCS bucket it uses a checkmark for enabling “public access prevention”. Who designed that modal? It takes me a solid minute to figure out if I’m publishing private data or not.

[−] a1o 51d ago
I went to check on this and I have everything copilot related disabled and in the two bars that measure usage my Copilot Chat usage was somehow in 2%, how is this possible?

Before anyone comes to me to sell me on AI, this is on my personal account, I have and use it in my business account (but it is a completely different user account), I just make it a point to not use it in my personal time so I can keep my skills sharp.

[−] martin-t 51d ago
A few days ago, I unchecked it, only to see it checked again when I reloaded the page.

It could be incompetence but it shouldn't matter. This level of incompetence should be punished equally to malice.

[−] petcat 51d ago
I guess the "perk" is that maybe their models get retrained on your data making them slightly more useful to you (and everyone else) in the future? idk
[−] mirekrusin 51d ago
The feature is that your coding style will be in next models!
[−] 7bit 51d ago
It's worded that way to create FOMO in the hopes people keep it enabled.

Dark pattern and dick move.

[−] Rapzid 51d ago
Is that not some stock feature-flag verbiage?
[−] mentalgear 51d ago

> On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.

Now "Allow GitHub to use my data for AI model training" is enabled by default.

Turn it off here: https://github.com/settings/copilot/features

Do they have this set on business accounts also by default? If so, this is really shady.

[−] QuadrupleA 51d ago
Fun fact: Copilot gives you no way to ignore sensitive files with API keys, passwords, DB credentials, etc.: https://github.com/orgs/community/discussions/11254#discussi...

So by default you send all this to Microsoft by opening your IDE.

[−] section_me 51d ago
If I'm paying, which I am, I want to have to opt-in, not opt-out, Mario Rodriguez / @mariorod needs to give his head a wobble.

What on earth are they thinking...

[−] pred_ 51d ago
What is the legal basis of this in the EU? Ignoring the fact they could end up stealing IP, it seems like the collected information could easily contain PII, and consent would have to be

> freely given, specific, informed and unambiguous. In order to obtain freely given consent, it must be given on a voluntary basis.

[−] sph 51d ago
Thanks to Github and the AI apocalypse, all my software is now stored on a private git repository on my server.

Why would I even spend time choosing a copyleft license if any bot will use my code as training data to be used in commercial applications? I'm not planning on creating any more opensource code, and what projects of mine still have users will be left on GH for posterity.

If you're still serious about opensource, time to move to Codeberg.

[−] diath 51d ago

> This approach aligns with established industry practices

"others are doing it too so it's ok"

[−] Deukhoofd 51d ago
So basically they want to retain everyone's full codebases?

> The data used in this program may be shared with GitHub affiliates, which are companies in our corporate family including Microsoft

So every Microsoft owned company will have access to all data Copilot wants to store?

[−] hoten 51d ago
Why is there no cancel copilot subscription option here?. Docs say there should be...

Mobile

https://github.com/settings/billing/licensing

EDIT:

https://docs.github.com/en/copilot/how-tos/manage-your-accou...

> If you have been granted a free access to Copilot as a verified student, teacher, or maintainer of a popular open source project, you won’t be able to cancel your plan.

Oh. jeez.

[−] hmate9 51d ago
For what it's worth they're not trying to hide this change at all and are very upfront about it and made it quite simple to opt out.
[−] badthingfactory 51d ago
I appreciated the notification at the top of the screen because it prompted me to disable every single copilot feature I possibly could from my account. I also appreciated Microsoft for making Windows 11 horrible so I could fall back in love with Linux again.
[−] _pdp_ 51d ago
Microsoft doing dumb things once again.

Who in their right mind will opt into sharing their code for training? Absolutely nobody. This is just a dark pattern.

Btw, even if disabled, I have zero confidence they are not already training on our data.

I would also recommend to sprinkle copyright noticed all over the place and change the license of every file, just in case they have some sanity checks before your data gets consumed - just to be sure.

[−] TZubiri 51d ago
If this doesn't sound bad enough, it's possible that Copilot is already enabled. As we know this kind of features are pushed to users instead of being asked for.

Maybe it's already active in our accounts and we don't realize it, so our code will be used to train the AI.

Now we can't be sure if this will happen or not, but a company like GitHub should be staying miles away from this kind of policy. I personally wouldn't use GitHub for private corporate repositories. Only as a public web interface for public repos.

[−] TZubiri 51d ago
Two issues with this:

1- Vulnerabilities, Secrets can be leaked to other users. 2- Intellectual Property, can also be leaked to other users.

Most smart clients won't opt-out, they will just cut usage entirely.

[−] stefanos82 51d ago
Serious question: let's say I host my code on this platform which is proprietary and is for my various clients. Who can guarantee me that AI won't replicate it to competitors who decide to create something similar to my product?
[−] OtherShrezzing 51d ago
It’s not clear to me how GitHub would enforce the “we don’t use enterprise repos” stuff alongside “we will use free tier copilot for training”.

A user can be a contributor to a private repository, but not have that repository owner organisation’s license to use copilot. They can still use their personal free tier copilot on that repository.

How can enterprises be confident that their IP isn’t being absorbed into the GH models in that scenario?

[−] pizzafeelsright 51d ago
I am not certain this is that big of a deal outside of "making AI better".

At this point, is there any magic in software development?

If you have super-secret-content is a third party the best location?

[−] rectang 51d ago
I just checked my Github settings, and found that sharing my data was "enabled".

This setting does not represent my wishes and I definitely would not have set it that way on purpose. It was either defaulted that way, or when the option was presented to me I configured it the opposite of how I intended.

Fortunately, none of the work I do these days with Copilot enabled is sensitive (if it was I would have been much more paranoid).

I'm in the USA and pay for Copilot as an individual.

Shit like this is why I pay for duck.ai where the main selling point is that the product is private by default.

[−] david_allison 51d ago
I have GitHub Copilot Pro. I don't believe I signed up for it. I neither use it nor want it.

1. A lot of settings are 'Enabled' with no option to opt out. What can I do?

2. How do I opt out of data collection? I see the message informing me to opt out, but 'Allow GitHub to use my data for AI model training' is already disabled for my account.

[−] OtherShrezzing 51d ago
So, how does this work with source-available code, that’s still licensed as proprietary - or released under a license which requires attribution?

If someone takes that code and pokes around on it with a free tier copilot account, GitHub will just absorb it into their model - even if it’s explicitly against that code’s license to do so?

[−] liquid_thyme 51d ago
They use data from the poor student tier, but arguably, large corporates and businesses hiring talented devs are going to create higher quality training data. Just looking at it logically, not that I like any of this...
[−] cebert 51d ago
I wish GitHub would focus on making their service reliable instead of Copilot and opting folks into their data being stolen for training.
[−] kevcampb 48d ago
This is terrifying. Github was the one provider I did not expect to make such an action. We're now playing whack-a-mole with vendors to try and ensure that our company IP doesn't end up being used to train a model.
[−] etothet 51d ago
The fact that this is on by default, especially for paid accounts and even more especially for organizations, where certain types of privacy is sometimes mandated by the industry your business is in, is ridiculous.

There should also be a much easier one-click to opt out without having to scroll way down on the settings page.

[−] thesmart 51d ago
I'm ready to abandon Github. Enschitification of the world's source infrastructure is just a matter of time.
[−] robeym 49d ago
There are several settings in my account relating to Copilot that are locked/enabled with a shield and key icon next to it. Any idea how to disable these settings? It's on the same settings/copilot/features page.
[−] jmhammond 51d ago
Mine was defaulted to disabled. I’m on the Education pro plan (academic), so maybe that’s different than personal?
[−] ncr100 51d ago
On my Android phone I was able to change the setting using Firefox by logging into GitHub and not allowing it to launch the GitHub app.

I was unable to change the setting when I used the GitHub app to open up the web page in a container.. button clicks weren't working. Quite frustrating.

[−] greatgib 50d ago
And something important, that is leaking from the phrasing of their blog post, is that it is not really "Github" that wants to suck all your data "prompts, code, context, documents", ... but it is "Microsoft"!
[−] phendrenad2 51d ago
So I do all the work of thinking about how to do something, and as soon as I tell Copilot about it, not it's in the training data and anyone can ask the LLM and it'll tell them the solution I came up with? Great. I'm going to cancel.
[−] sbinnee 51d ago
Bold move. Who uses Copilot these days? Unless they have free credit I mean.
[−] rvz 51d ago

> From April 24 onward, interaction data—specifically inputs, outputs, code snippets, and associated context—from Copilot Free, Pro, and Pro+ users will be used to train and improve our AI models unless they opt out.

Now is the time to run off of GitHub and consider Codeberg or self hosting like I said before. [0]

[0] https://news.ycombinator.com/item?id=22867803

[−] Heliodex 51d ago
Finally. The option for me to enable Copilot data sharing has been locked as disabled for some time, so until now I couldn't even enable it if I wanted to.
[−] indigodaddy 51d ago
Checked and mine was already on disabled. Don't remember if I previously toggled it or not..
[−] dartf 49d ago
I don't see an option to opt-out? Is it US only thing?
[−] djmashko2 51d ago

> Content from your issues, discussions, or private repositories at rest. We use the phrase “at rest” deliberately because Copilot does process code from private repositories when you are actively using Copilot. This interaction data is required to run the service and could be used for model training unless you opt out.

Sounds like it's even likely to train on content from private repositories. This feels like a bit of an overstep to me.

[−] mt42or 51d ago
Is it legal ? Surely not in any EU countries.
[−] marak830 51d ago
As it's enabled by default, does that mean everything has already been siphoned off and now I'm just closing the gate behind the animals escaping?

Shit like this shouldn't be allowed.

[−] explodes 51d ago
We all knew Microsoft was going to destroy GitHub eventually when it was first bought.

How much longer do you want to tolerate the enshittification? How much longer CAN you tolerate it?