AI Literacy & Compliance

Plus understanding AI vulnerabilities, model collapse, and our take on AI competitiveness

Aug 14, 2024

Hi there! We don’t usually start with a Trustible-specific story, but we’re really excited about our new product launch and wanted to share. In today’s edition:

Trustible Launches New AI Literacy & Compliance Training product
AI vulnerabilities are not like cybersecurity ones
Market dominance in tech can be a problem for AI competitiveness
Model Collapse: Models trained on AI generated content can degrade

1. Trustible launches new AI Literacy & Compliance Training product

We’re excited to announce Trustible's new AI Literacy & Compliance Training product – a Learning & Development (L&D) tool designed to ensure your workforce is knowledgeable and skilled in understanding opportunities and risks of artificial intelligence.

We all know that AI literacy and compliance training is essential to ensure that your organization is leveraging the power of AI while also minimizing the likelihood of risks occurring.

However, it’s also important to note that Article 4 of the EU AI Act requires all organizations that are impacted by the regulation to have AI literacy training in place by February 2025.

What you need to know about our Training product:

✓ Training content developed by AI Compliance experts with video, transcripts, and quizzes
✓ Available through the Trustible Platform or standalone module
✓ Can integrate into your existing Learning Management System (LMS)
✓ SCORM compatibility and source files enables language translation and localization
✓ Satisfies the EU AI Act’s AI Literacy requirements

There’s 4 key modules:

Module 1: AI Overview | foundational AI overview to understand AI use cases and benefits.
Module 2: Risks of AI Use | education on AI risks, helping employees understand and mitigate potential negative consequences
Module 3: AI Compliance Requirements | overview of the regulatory landscape and familiarity with regulations such as the EU AI Act
Module 4: Organization-Specific Policies | customized to your org, this module provides tailored overview of your organization’s specific AI policies

2. AI vulnerabilities are not like cybersecurity ones

The cybersecurity field has a mature ecosystem for identifying, reporting, and evaluating potential vulnerabilities and exploits in software systems. Many large corporations run formal bug bounty programs for users to report potential exploits, and vulnerabilities in open-source software systems are tracked in public inventories such as the NIST backed National Vulnerability Database (NVD) or MITRE/DHS backed Common Vulnerabilities and Exposures (CVE) database. Vulnerabilities to these databases are often assessed and scored by expert teams, and many cybersecurity policies heavily rely on these scores. While this ecosystem is not perfect, it does form the foundation for many practices in the cybersecurity space, and academia, industry, and governments are looking to build a similar ecosystem around AI.

This past week, for the second year in a row, DefCon, one of the world’s largest cybersecurity conferences, had a dedicated ‘AI Village’ and continued its efforts to develop AI red-teaming and vulnerability reporting best practices. However, creating a reporting system for AI is proving to be more complex than it is for outright cyber vulnerabilities where specific packages of well defined purposes. In a paper published by the AI Village coordinators, the researchers outline 3 key differences and complications: 1) Models are probabilistic, and therefore expected to make occasional mistakes (software is supposed to be strictly deterministic, 2) the intent of general purpose ML models is hard to define, and 3) ethical and safety questions are hard to judge. This is in addition to the challenges of poor transparency in the AI field. To address these challenges, the AI VIllage is focused on ‘AI flaws’ instead of outright vulnerabilities. An AI flaw takes into account the intended purpose of a model or task, and documents an expected or undesirable outcome.

Our Take: It will be quite a while before the AI flaw reporting ecosystem becomes anywhere near as robust as the cyber vulnerability ecosystem is. In addition, the more subjective ‘ethical’ dimension to things could further complicate its development, as the government may not want to evaluate these flaws, and differences of opinion on these could lead to a more fragmented reporting environment.

3. Market dominance in tech can be a problem for AI competitiveness

A healthy AI ecosystem needs robust competition throughout the supply chain, if for no other reason than to allow fresh ideas and innovation to flourish. Yet, in recent weeks the concerns over a more consolidated tech ecosystem shed light on AI’s competitive challenges.

On August 8, 2024, federal Judge Amit Mehta ruled that Google monopolized the search engine market. The decision came after an extensive trial that began in fall of 2023 and concluded in May 2024. The court’s ruling focused on exclusive deals that Google had with device makers (e.g., Apple and Samsung) and web browsers (e.g., Mozilla) to be the default search engine. Interestingly, the New York Times recently investigated how tech companies like Microsoft and Amazon leveraged complex licensing deals with smaller tech companies to potentially avoid similar antitrust issues. The deals effectively gutted the targeted starts-up by hiring the tech talent but leaving the rest of the company untouched. However, investigations by the FTC and U.K. regulators suggest the unique deal structures may not deflect attention from regulators to the degree that these companies had hoped.

Additionally, on July 19, 2024, Microsoft experienced a massive global outage after cyber company Crowdstrike pushed out faulty security software to its customers. As a result, Microsoft customers were offline for hours, which had a cascade of consequences from delayed flights to hospitals losing IT systems. The outage raises questions over a lack of diversity among hardware and software providers. Companies holding outsized market shares for crucial tech components, like Nvidia’s dominance among AI chip makers, can lead to compounding issues along the value chain due to the lack of alternatives.

Our Take: The dominance of big tech companies is not a new phenomenon, and the pervasiveness of AI over the last couple of years is bringing old challenges to an emerging market. The limited number of AI hardware and software providers, in addition to big tech ever growing influence in AI, elevates the risk of serious consequences when incidents occur among the few existing players.

4. Model Collapse: Models trained on AI generated content can degrade

AI models could quickly become victims of their own success. Most modern general purpose models rely on massive amounts of training data, and then leverage a form of machine self-learning techniques to capture statistical associations between features. Before the open release of powerful generative AI models, most of this data was human generated, as the cost and complexity of creating fake or synthetic content was high. However that paradigm has quickly shifted and there is good evidence that even supposedly ‘high quality’ datasets, like peer-reviewed academic papers, are being at least partially written with generative AI tools. This begs the question: what happens when AI is trained on the outputs of AI?

A recent in-depth study published in Nature, confirms some earlier research, in finding that generative AI models trained on AI-generated content, leads to heavily degraded performance. This phenomena, known as ‘Model Collapse’, is quickly becoming a major risk for AI model creators. The paper showed that because the model overfits the distribution of generated data, information particularly at the ‘tail ends’ of the probability distributions gets lost. As a result, the model’s quality on benchmark quickly degrades. As more AI generated content floods the internet, general purpose AI model providers may find it hard to get human generated content that can help keep models fresh, and expand their capabilities. This will likely reinforce recent trends of websites implementing licensing deals for their user content, model providers not disclosing their data sources, and GenAI content detectors racing against GenAI content creators.

Key Takeaway: Training AI models on too much AI generated content can lead to the model quickly degrading. This will put a ‘premium’ on verified human generated content, and could have huge downstream impacts on the AI market as the internet becomes overwhelmed with generated content.

*********

As always, we welcome your feedback on content and how to improve this newsletter!

AI Responsibly,

- Trustible team