If you scroll down to "Allow GitHub to use my data for AI model training" in GitHub settings, you can enable or disable it. However, what really gets me is how they pitch it like it’s some kind of user-facing feature:
Enabled = You will have access to the feature
Disabled = You won't have access to the feature
As if handing over your data for free is a perk. Kinda hilarious.
It’s not so bad, there’s no double negative and it’s not a confusing “switch” that is always ambiguous as to whether it’s enabled or not.
In contrast when you create a a GCS bucket it uses a checkmark for enabling “public access prevention”. Who designed that modal? It takes me a solid minute to figure out if I’m publishing private data or not.
> On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.
Now
"Allow GitHub to use my data for AI model training" is enabled by default.
What is the legal basis of this in the EU? Ignoring the fact they could end up stealing IP, it seems like the collected information could easily contain PII, and consent would have to be
> freely given, specific, informed and unambiguous. In order to obtain freely given consent, it must be given on a voluntary basis.
Thanks to Github and the AI apocalypse, all my software is now stored on a private git repository on my server.
Why would I even spend time choosing a copyleft license if any bot will use my code as training data to be used in commercial applications? I'm not planning on creating any more opensource code, and what projects of mine still have users will be left on GH for posterity.
If you're still serious about opensource, time to move to Codeberg.
> If you have been granted a free access to Copilot as a verified student, teacher, or maintainer of a popular open source project, you won’t be able to cancel your plan.
I appreciated the notification at the top of the screen because it prompted me to disable every single copilot feature I possibly could from my account. I also appreciated Microsoft for making Windows 11 horrible so I could fall back in love with Linux again.
Who in their right mind will opt into sharing their code for training? Absolutely nobody. This is just a dark pattern.
Btw, even if disabled, I have zero confidence they are not already training on our data.
I would also recommend to sprinkle copyright noticed all over the place and change the license of every file, just in case they have some sanity checks before your data gets consumed - just to be sure.
If this doesn't sound bad enough, it's possible that Copilot is already enabled. As we know this kind of features are pushed to users instead of being asked for.
Maybe it's already active in our accounts and we don't realize it, so our code will be used to train the AI.
Now we can't be sure if this will happen or not, but a company like GitHub should be staying miles away from this kind of policy. I personally wouldn't use GitHub for private corporate repositories. Only as a public web interface for public repos.
Serious question: let's say I host my code on this platform which is proprietary and is for my various clients. Who can guarantee me that AI won't replicate it to competitors who decide to create something similar to my product?
It’s not clear to me how GitHub would enforce the “we don’t use enterprise repos” stuff alongside “we will use free tier copilot for training”.
A user can be a contributor to a private repository, but not have that repository owner organisation’s license to use copilot. They can still use their personal free tier copilot on that repository.
How can enterprises be confident that their IP isn’t being absorbed into the GH models in that scenario?
I just checked my Github settings, and found that sharing my data was "enabled".
This setting does not represent my wishes and I definitely would not have set it that way on purpose. It was either defaulted that way, or when the option was presented to me I configured it the opposite of how I intended.
Fortunately, none of the work I do these days with Copilot enabled is sensitive (if it was I would have been much more paranoid).
I'm in the USA and pay for Copilot as an individual.
Shit like this is why I pay for duck.ai where the main selling point is that the product is private by default.
I have GitHub Copilot Pro. I don't believe I signed up for it. I neither use it nor want it.
1. A lot of settings are 'Enabled' with no option to opt out. What can I do?
2. How do I opt out of data collection? I see the message informing me to opt out, but 'Allow GitHub to use my data for AI model training' is already disabled for my account.
So, how does this work with source-available code, that’s still licensed as proprietary - or released under a license which requires attribution?
If someone takes that code and pokes around on it with a free tier copilot account, GitHub will just absorb it into their model - even if it’s explicitly against that code’s license to do so?
They use data from the poor student tier, but arguably, large corporates and businesses hiring talented devs are going to create higher quality training data. Just looking at it logically, not that I like any of this...
This is terrifying. Github was the one provider I did not expect to make such an action. We're now playing whack-a-mole with vendors to try and ensure that our company IP doesn't end up being used to train a model.
The fact that this is on by default, especially for paid accounts and even more especially for organizations, where certain types of privacy is sometimes mandated by the industry your business is in, is ridiculous.
There should also be a much easier one-click to opt out without having to scroll way down on the settings page.
There are several settings in my account relating to Copilot that are locked/enabled with a shield and key icon next to it. Any idea how to disable these settings? It's on the same settings/copilot/features page.
On my Android phone I was able to change the setting using Firefox by logging into GitHub and not allowing it to launch the GitHub app.
I was unable to change the setting when I used the GitHub app to open up the web page in a container.. button clicks weren't working. Quite frustrating.
And something important, that is leaking from the phrasing of their blog post, is that it is not really "Github" that wants to suck all your data "prompts, code, context, documents", ... but it is "Microsoft"!
So I do all the work of thinking about how to do something, and as soon as I tell Copilot about it, not it's in the training data and anyone can ask the LLM and it'll tell them the solution I came up with? Great. I'm going to cancel.
> From April 24 onward, interaction data—specifically inputs, outputs, code snippets, and associated context—from Copilot Free, Pro, and Pro+ users will be used to train and improve our AI models unless they opt out.
Now is the time to run off of GitHub and consider Codeberg or self hosting like I said before. [0]
Finally. The option for me to enable Copilot data sharing has been locked as disabled for some time, so until now I couldn't even enable it if I wanted to.
> Content from your issues, discussions, or private repositories at rest. We use the phrase “at rest” deliberately because Copilot does process code from private repositories when you are actively using Copilot. This interaction data is required to run the service and could be used for model training unless you opt out.
Sounds like it's even likely to train on content from private repositories. This feels like a bit of an overstep to me.
166 comments
Enabled = You will have access to the feature
Disabled = You won't have access to the feature
As if handing over your data for free is a perk. Kinda hilarious.
In contrast when you create a a GCS bucket it uses a checkmark for enabling “public access prevention”. Who designed that modal? It takes me a solid minute to figure out if I’m publishing private data or not.
> On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.
Now "Allow GitHub to use my data for AI model training" is enabled by default.
Turn it off here: https://github.com/settings/copilot/features
Do they have this set on business accounts also by default? If so, this is really shady.
So by default you send all this to Microsoft by opening your IDE.
What on earth are they thinking...
> freely given, specific, informed and unambiguous. In order to obtain freely given consent, it must be given on a voluntary basis.
Why would I even spend time choosing a copyleft license if any bot will use my code as training data to be used in commercial applications? I'm not planning on creating any more opensource code, and what projects of mine still have users will be left on GH for posterity.
If you're still serious about opensource, time to move to Codeberg.
> This approach aligns with established industry practices
"others are doing it too so it's ok"
> The data used in this program may be shared with GitHub affiliates, which are companies in our corporate family including Microsoft
So every Microsoft owned company will have access to all data Copilot wants to store?
Mobile
https://github.com/settings/billing/licensing
EDIT:
https://docs.github.com/en/copilot/how-tos/manage-your-accou...
> If you have been granted a free access to Copilot as a verified student, teacher, or maintainer of a popular open source project, you won’t be able to cancel your plan.
Oh. jeez.
Who in their right mind will opt into sharing their code for training? Absolutely nobody. This is just a dark pattern.
Btw, even if disabled, I have zero confidence they are not already training on our data.
I would also recommend to sprinkle copyright noticed all over the place and change the license of every file, just in case they have some sanity checks before your data gets consumed - just to be sure.
Maybe it's already active in our accounts and we don't realize it, so our code will be used to train the AI.
Now we can't be sure if this will happen or not, but a company like GitHub should be staying miles away from this kind of policy. I personally wouldn't use GitHub for private corporate repositories. Only as a public web interface for public repos.
1- Vulnerabilities, Secrets can be leaked to other users. 2- Intellectual Property, can also be leaked to other users.
Most smart clients won't opt-out, they will just cut usage entirely.
A user can be a contributor to a private repository, but not have that repository owner organisation’s license to use copilot. They can still use their personal free tier copilot on that repository.
How can enterprises be confident that their IP isn’t being absorbed into the GH models in that scenario?
At this point, is there any magic in software development?
If you have super-secret-content is a third party the best location?
This setting does not represent my wishes and I definitely would not have set it that way on purpose. It was either defaulted that way, or when the option was presented to me I configured it the opposite of how I intended.
Fortunately, none of the work I do these days with Copilot enabled is sensitive (if it was I would have been much more paranoid).
I'm in the USA and pay for Copilot as an individual.
Shit like this is why I pay for duck.ai where the main selling point is that the product is private by default.
1. A lot of settings are 'Enabled' with no option to opt out. What can I do?
2. How do I opt out of data collection? I see the message informing me to opt out, but 'Allow GitHub to use my data for AI model training' is already disabled for my account.
If someone takes that code and pokes around on it with a free tier copilot account, GitHub will just absorb it into their model - even if it’s explicitly against that code’s license to do so?
There should also be a much easier one-click to opt out without having to scroll way down on the settings page.
I was unable to change the setting when I used the GitHub app to open up the web page in a container.. button clicks weren't working. Quite frustrating.
> From April 24 onward, interaction data—specifically inputs, outputs, code snippets, and associated context—from Copilot Free, Pro, and Pro+ users will be used to train and improve our AI models unless they opt out.
Now is the time to run off of GitHub and consider Codeberg or self hosting like I said before. [0]
[0] https://news.ycombinator.com/item?id=22867803
> Content from your issues, discussions, or private repositories at rest. We use the phrase “at rest” deliberately because Copilot does process code from private repositories when you are actively using Copilot. This interaction data is required to run the service and could be used for model training unless you opt out.
Sounds like it's even likely to train on content from private repositories. This feels like a bit of an overstep to me.
Shit like this shouldn't be allowed.
How much longer do you want to tolerate the enshittification? How much longer CAN you tolerate it?