GitHub allows Copilot data collection for AI training by default with an opt-out setting


GitHub has announced that interactions with GitHub Copilot will be used to train your AI models, with all personal account users enrolled by default. This change applies to Copilot Free, Copilot Pro and Copilot Pro+ accounts.

Users may choose to disable data collection through their account settings. According to GitHub, the data collected includes input and output data, code snippets, comments, documentation, file names, and repository structure.

The company says the purpose is to improve the performance of its models for all Copilot users. GitHub Copilot is available on Visual Studio Code, the GitHub website, the Copilot CLI, and other GitHub services.

Who is affected by changes to GitHub Copilot training data?

Automatic enrollment applies to personal Copilot accounts: Free, Pro and Pro+. Copilot Business and Copilot Enterprise accounts are not subject to the same default data collection, according to the announcement.

Users who have never used any Copilot features are not affected. Users who have used code completion in visual studio codeasked Copilot questions on the GitHub website or interacted with any related AI functions, interactions and code snippets may be included in the training data in the future.

How to opt out of GitHub Copilot’s use of training data

GitHub offers an option to disable data collection in your account settings. You can find this on the Copilot features page, located within the Privacy section of your GitHub account settings. To do this:

  1. Sign in to your GitHub account and go to your account settings.
  2. From there, navigate to the Copilot features page.
  3. Look for the option called “Allow GitHub to use my data for AI model training” under Privacy.
  4. Set the drop-down menu to “Off” to disable data collection.

If you have multiple GitHub accounts, you will need to repeat this process for each one, as the settings are applied individually to each account.

Why GitHub says it’s using Copilot data for training

GitHub announced that its initial Copilot models were built using publicly available data and carefully curated code samples. The company reported performance improvements after adding data from Microsoft employees and now plans to expand this approach to a broader user base.

GitHub notes that this practice aligns with “established industry practices” and says the updates will lead to more accurate code suggestions, better detection of potential bugs, and a deeper understanding of development workflows. These statements are made by the company itself.

Scope of Copilot data collection and what GitHub hasn’t clarified

The announcement does not specify a minimum interaction threshold or explain how data is anonymized before being used for training. GitHub has not provided details on what technical controls are in place to prevent sensitive code or proprietary logic from being used in model training, other than the opt-out option.

Users in Copilot Business and Copilot Enterprise plans They are excluded from the default data collection, but GitHub has not elaborated on this in the announcement. The company also has not indicated when data collection began or whether interactions before the announcement are included.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *