• Shadow AI
  • Posts
  • 🦾 Shadow AI - 29 February 2024

🦾 Shadow AI - 29 February 2024

Arming Security and IT Leaders for the Future

Forwarded this newsletter? Sign up for Shadow AI here.

Hello,

Happy Leap Day, and a very special birthday shoutout to my nephew, Julian!

In this edition of Shadow AI, I'm taking a slightly different approach. I’ll be delving deeper into my journey at JPMC, specifically focusing on the development of the company's first comprehensive data control framework.

I could write a series of detailed posts on this topic and, over time, I'm considering sprinkling them into the newsletter. Let me know if this resonates and which aspects of the data control framework you're most interested in learning more about.

Let’s dive in!

Data Deep Dive - Implementing a Robust Data Control Framework in Large Language Model Development

When I was on JPMorgan Chase’s cybersecurity team, I had the unexpected opportunity to help build the enterprise’s first data control framework as an additional responsibility. It wasn’t a core cybersecurity role and at times I questioned if it would help my career, but there’s no doubt it was a critical to my growth. In fact, I took away several learnings from that experience that resonate even louder today as Large Language Models (LLMs) shape the future of artificial intelligence.

What was our objective?

Our goal was to help JPMC’s executive leadership team better understand and address data risk so the business could more effectively leverage data to drive revenue growth. At the time, data risk was managed across a number of different teams - cybersecurity, data governance, data privacy, and record retention - which was not uncommon for an enterprise the size of JPMC.

We set out to breakdown stovepipes by building a robust data control framework that would allow JPMC to manage data more effectively across its lifecycle - from the point its created until its deleted.

What are the key elements of a data control framework?
1. System of Record/Authoritative Data Source Designation

Identifying and designating authoritative data sources as the systems of record is the first step in establishing trust in the data used for training LLMs. This involves defining which data sources are considered reliable and authoritative, ensuring that the data feeding into LLMs is accurate, relevant, and up-to-date.

2. Data Lineage

Understanding the journey of data from its origin through various transformations and uses is crucial. Data lineage provides transparency into the data's lifecycle, enabling developers to trace errors back to their source, understand the impact of data changes, and ensure the integrity of data throughout its lifecycle.

3. Data Quality

Data quality management involves processes and technologies to maintain high data quality through cleansing, deduplication, and validation. Ensuring that data is accurate, complete, and consistent is crucial for training LLMs that are capable of generating reliable and coherent outputs.

4. Data Classification

Classifying data based on its sensitivity, confidentiality, and criticality helps in applying appropriate controls and protection measures. Data classification supports compliance with data protection regulations and ensures that sensitive data, such as personal information, is handled with extra care.

5. Data Protection

Data protection involves implementing security measures to safeguard data against unauthorized access, breaches, and leaks. This includes encryption, access controls, and secure data storage solutions, ensuring that the data used in LLM development is protected from threats.

6. Data Use

Establishing guidelines for ethical and responsible data use is critical, especially in the context of AI. This involves setting policies for how data can be used in training and deploying LLMs, ensuring that the use of data aligns with ethical standards and societal values.

7. Data Retention and Disposal

Defining policies for how long data should be retained and the procedures for its secure disposal helps in managing data efficiently and complying with legal requirements. This ensures that unnecessary data is not hoarded, reducing risks and liabilities associated with data storage.

8. Data Privacy

Ensuring data privacy involves complying with data protection laws and regulations, such as GDPR and CPRA, and implementing practices to protect individuals' privacy rights. This includes obtaining consent for data use, enabling data anonymization, and allowing individuals to understand and control how their data is used.

Why is a data control framework important when developing LLMs?

A comprehensive data control framework is critical to enable the development of LLMs for several reasons:

  1. Model Performance and Reliability: By ensuring the use of high-quality, well-managed data, LLMs can achieve better performance and generate more accurate and reliable outputs.

  2. Trust with Users and Stakeholders: Demonstrating a commitment to managing and using data securely and ethically across its lifecycle builds trust among users, regulators, and the public.

  3. Regulatory Compliance: A robust framework helps organizations navigate the complex landscape of data protection and privacy regulations, avoiding legal penalties and reputational damage.

  4. Ethical AI Development: By incorporating ethical considerations into data use and management, organizations can ensure that their LLMs contribute positively to society and do not perpetuate biases or infringe on privacy rights.

Conclusion

As security practitioners, we have traditionally focused on certain elements of the data control framework I outlined above, namely data classification, data protection, and data retention. In some cases, our roles are starting to extend into data privacy. The Hitch Partners 2023 CISO Leadership Survey, for example, reports that 27% of surveyed CISOs at publicly traded companies are also responsible for data privacy.

However, to fully enable a business’ AI strategy, we need to bring a broader, more cross-functional data control toolset to the table and be able to effectively work with partner organizations to holistically manage data risk.

Only then, will we find ourselves as true business enablers in the world of AI.

 ðŸ’¼ 5 Cool AI Security Jobs of the Week 💼

Generative AI Cyber Security Controls Lead @ Deloitte to help federal clients adopt AI securely | Rosslyn, VA | 5+ yrs exp.

AI Security Researcher @ Carnegie Mellon Software Engineering Institute to advance the state of the art in AI security at a national and global scale | Pittsburgh, PA or Arlington, VA | 5+ yrs exp.

Sr. Penetration Testing Engineer, AWS Gen AI Security @ AWS to protect customers by securing AI and AWS services at scale | Multiple Locations | $136k-$247k | 5+ yrs exp.

AI Red Team Program Manager II @ Microsoft to empower Microsoft’s AI red team to find and exploit security vulnerabilities | Remote | $94k-$182k | 1+ yr exp.

Sr. Product Security Engineer @ Character.AI to serve as a founding member of the product security team | Menlo Park | $150k-$350k | 5+ yrs exp.

If you enjoyed this newsletter and know someone else who might like Shadow AI, please share it!

Until next Thursday, humans.

-Andrew Heighington