Skip to content

GitHub Copilot Will Use Your Code for AI Training

GitHub Copilot Will Use Your Code for AI Training

Beginning April 24, GitHub will collect interaction data from Copilot Free, Pro, and Pro+ tiers to further train its AI systems. The feature is enabled by default, placing the responsibility on individual subscribers to opt out.

GitHub is leaving Copilot Business and Copilot Enterprise customers out of the policy shift. Individual subscribers, meanwhile, have reacted negatively to the automatic opt-in.

Announcement on the GitHub Blog
Announcement on the GitHub Blog

GitHub describes the training data broadly as inputs, outputs, code snippets, and associated context. But the company’s detailed explanation goes further. According to GitHub, collected data may also include:

  • Code surrounding the cursor
  • Comments and documentation
  • File names
  • Repository structure
  • Navigation patterns
  • Chats with Copilot features
  • Thumbs-up or thumbs-down feedback on suggestions

According to GitHub, private repositories at rest will not be used for model training. Code stored on the platform without interaction remains outside the training dataset.

However, if you are actively using Copilot while working inside a private repository, the prompts, suggestions, generated snippets, and surrounding context from that session may still be collected for training unless you switch the setting off. GitHub notes that this is technically not the same as training on stored private repository content, but many developers are unlikely to find the distinction reassuring.

According to GitHub, using interaction data from Microsoft employees has produced “meaningful improvements,” such as increased acceptance rates across different languages. The company now intends to apply the approach to paid users.

How to opt out

For users who wish to disable data collection for AI training, the process is straightforward:

  1. Navigate to Copilot settings
  2. Find the Privacy section
  3. Set “Allow GitHub to use my data for AI model training” to Disabled

GitHub has also confirmed that anyone who previously opted out of data collection for product improvements will retain that preference. Those users will not be automatically enrolled in training next month.

Under the new policy, data may be shared with affiliates, including Microsoft. However, GitHub says the data will not be provided to third-party AI model providers for their own separate training.

Negative reactions from community members are indicated by the number of dislike icons
Negative reactions from community members are indicated by the number of dislike icons

Response to the update, particularly the opt-out default, has been negative. A GitHub community post announcing the move has received 243 thumbs-down votes and a lengthy series of angry comments from users.

Maybe you would like other interesting articles?

Leave a Reply

Your email address will not be published. Required fields are marked *