Pinterest: Safe Implementation of AI-Assisted Development with GitHub Copilot

LLMOps Database

Tech

Company

Title

Safe Implementation of AI-Assisted Development with GitHub Copilot

Industry

Tech

Link

https://medium.com/pinterest-engineering/unlocking-ai-assisted-development-safely-from-idea-to-ga-4d68679161ef

Year

2024

Summary (short)

Pinterest implemented GitHub Copilot for AI-assisted development across their engineering organization, focusing on balancing developer productivity with security and compliance concerns. Through a comprehensive trial with 200 developers and cross-functional collaboration, they successfully scaled the solution to general availability in less than 6 months, achieving 35% adoption among their developer population while maintaining robust security measures and positive developer sentiment.

Tags

## Overview Pinterest, the visual discovery platform with over 300 billion ideas indexed, embarked on a journey to enable AI-assisted development for their engineering teams. This case study documents their approach to evaluating, piloting, and rolling out GitHub Copilot across their organization while carefully managing the security, legal, and operational risks inherent in deploying LLM-powered tools in an enterprise development environment. The initiative arose from organic developer demand—engineers were already using AI-assisted development tools for personal projects and were eager to bring these capabilities into their professional work. However, like many enterprises, Pinterest had initially prohibited LLM usage until they could properly assess the implications. This case study represents a methodical approach to enterprise LLM adoption that balances innovation with risk management. ## Build vs. Buy Decision One of the first strategic decisions Pinterest made was whether to build their own AI-assisted development solution or purchase a vendor solution. Despite possessing substantial in-house AI expertise (Pinterest builds many of their own developer tools and runs sophisticated ML systems for their core product), they determined that building from scratch was not essential to their core business. This is a noteworthy decision point for enterprises considering LLM adoption—the recognition that leveraging existing vendor solutions can accelerate time-to-value. Pinterest chose GitHub Copilot specifically based on several criteria: its feature set, the robustness of the underlying LLM, and importantly, its fit with their existing tooling ecosystem. The breadth of IDE support (both VS Code and JetBrains IDEs were mentioned as being used by their developers) was cited as a factor that accelerated adoption. ## Trial Program Design Pinterest's approach to the trial program demonstrates several LLMOps best practices for enterprise evaluation. Rather than running a small trial of fewer than 30 people over a few weeks (which they note many companies do), Pinterest opted for a larger and longer trial: - Approximately 200 developers participated - The trial ran over an extended duration (not just a few weeks) - About 50% of participants used VS Code, with many using JetBrains IDEs The rationale behind this design was multifaceted. The larger cohort allowed them to include developers across various "personas"—likely meaning different specializations, experience levels, or working contexts. The longer duration helped control for the "novelty effect" and other measurement issues, providing more reliable data about sustained productivity impact rather than just initial enthusiasm. An important cultural aspect was also mentioned: even if the evaluation led them in a different direction, they wanted to give developers the opportunity to try something cutting edge and include them in the journey. This speaks to change management practices that can ease enterprise LLM adoption. ## Evaluation Methodology Pinterest leveraged their existing frameworks for measuring engineering productivity, applying them specifically to the Copilot trial. Their evaluation combined both qualitative and quantitative approaches: **Qualitative Measurement:** They collected weekly sentiment feedback through a short Slack bot-based survey. The choice of Slack over email was deliberate—they had previously observed higher completion rates with Slack-based surveys and wanted to meet developers where they spend time while reducing friction. The NPS (Net Promoter Score) approach gave them a consistent metric to track over time. Early results showed an NPS of 75, which is considered excellent, and scores improved as the trial continued. User feedback highlighted specific value propositions of AI-assisted development. Comments included observations that Copilot suggestions improved over time based on work context, and that the tool was particularly valuable when working with unfamiliar languages (Scala was mentioned as an example), allowing developers familiar with general programming concepts to let Copilot handle syntax details while still understanding the suggestions. **Quantitative Measurement:** Their approach compared relative change over time for the trial cohort versus a control group from before the Copilot trial. Running the trial for longer than a few weeks helped isolate external temporal influences like holidays. However, the article does not specify what quantitative metrics were actually measured or share detailed results—a commenter on the original post asked about this, suggesting the specifics were not disclosed. ## Security and Legal Considerations Pinterest's approach to security and legal compliance demonstrates the cross-functional coordination required for enterprise LLM deployment: **Legal Review:** They worked closely with their legal team to ensure usage adhered to all relevant licensing terms and regulations. While specific concerns aren't detailed, common considerations in this space include intellectual property issues around code generated by LLMs trained on open-source repositories. **Security Assessment:** The security team conducted a thorough assessment of security implications. Two key concerns were addressed: - Ensuring code produced by Copilot remained within Pinterest's control - Confirming that Pinterest code was not used for training future LLM models **Vulnerability Scanning:** A notable security practice was the continuous auditing of code using vulnerability scanning tools. Importantly, they scanned code from both Copilot participants and non-participants, allowing them to compare whether AI-assisted development introduced more vulnerabilities. This comprehensive approach enabled them to monitor for potential degradation of their security posture due to AI-generated code. ## Expansion to General Availability Based on positive trial results, Pinterest made the decision to expand Copilot access to all of engineering. The timing was strategic—they did this in advance of their annual "Makeathon" (a hackathon-style event), which had an AI focus that year. To drive adoption post-GA, Pinterest implemented several operational improvements: - **Training sessions:** Educating developers on effective Copilot usage - **Access streamlining:** Integration with their access control and provisioning systems to simplify the process of obtaining Copilot access - **Domain-specific guidance:** Partnering with platform teams to help developers understand how to best leverage Copilot in different domains (web, API, mobile development were mentioned) ## Results and Metrics The quantitative outcomes reported include: - Complete rollout from idea to scaled availability in less than 6 months - 150% increase in user adoption over 2 months after GA - 35% of total developer population using Copilot regularly Pinterest framed the 35% adoption rate in terms of the Technology Adoption Lifecycle, noting they had moved well into the "early majority" phase. This provides useful context for understanding where they were in the adoption curve at the time of publishing. ## Future Directions Pinterest outlined plans for continued improvement and evolution of their AI-assisted development program: - **Fine-tuning with proprietary code:** They planned to improve Copilot suggestion quality by incorporating fine-tuning with Pinterest source code. This represents a more advanced LLMOps practice that could improve code suggestions' relevance to their specific codebase, patterns, and conventions. - **Safety monitoring:** Continuing to ensure that as teams leverage AI to move faster, they don't introduce more bugs or incidents. This suggests ongoing monitoring and evaluation even post-GA. - **Continuous evaluation:** They noted plans to constantly evaluate new opportunities to build, buy, and incorporate new AI-assisted development tools as the space rapidly evolves. ## Critical Assessment While the case study presents a methodical and thoughtful approach to enterprise LLM adoption, there are some limitations to note: - Specific quantitative productivity metrics were not disclosed, making it difficult to independently assess the claimed productivity improvements - The security scanning comparison results (whether AI-assisted code introduced more vulnerabilities) were not shared - The article is published by Pinterest Engineering, so it naturally presents their initiative favorably - Fine-tuning plans were mentioned as future work, so results of that more advanced LLMOps practice are not documented That said, the case study offers valuable insights into enterprise considerations for deploying LLM-powered developer tools, including the importance of cross-functional collaboration, extended trial periods, hybrid evaluation approaches, and ongoing security monitoring. The emphasis on meeting developers where they are (Slack surveys, IDE integration breadth) and including them in the journey reflects mature change management practices for technology adoption.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source