The Challenges of Extraterritorial AI

Plus comparing California’s SB 1047 and the EU AI Act, Big 4 AI reports, and fine-tuning LLMs

Sep 11, 2024

Hi there! Today, we remember 9/11 in the United States – those fallen, their loved ones, and the countless service members.

In today’s edition:

The Challenges of Extraterritorial AI
Comparing California’s SB 1047 to the EU AI Act
Key Takeaways from PwC and Deloitte AI Reports
Fine-Tuning: Turning LLMs into Specialists

1. The Challenges of Extraterritorial AI

The modern international political world is composed primarily of nation-states: entities that have full jurisdiction over a specific geographic area and all the people who inhabit it. This system has worked well as long as most economic activity was also linked to the physical world, mainly in the form of people living or working in one specific area, and physical goods moving from one place to another. The introduction of the digital world, however, has created several challenges for traditional nation-state boundaries. Consider this statement: “An Indian employee of a US based multinational corporation, can scrape the data of an EU citizen, store a copy of it in a data center in Australia, train a model using that data on hardware located in Brazil, and then sell access to that model to a company located in the UK.” This highly plausible scenario crosses at least 6 distinct legal jurisdictions, each with its own legal traditions, privacy regulations, and upcoming AI guidelines.

Two recent headlines from last week show how difficult this problem may be, especially as Big Tech organizations gain capabilities and influence outside the traditional boundaries of nation states. The first event was a showdown between Elon Musk and Brazil over X’s refusal to comply with Brazilian content regulation around misinformation. After months of escalation, X has now pulled out of Brazil completely, and Brazil has ordered internet service providers to block access to X entirely and set up fines for anyone seeking to evade that ban. Despite no longer having any physical presence in the country, Musk threatened to still allow access to X using SpaceX’s Starlink internet service, although he seems to have backed down from doing so for now.

The second event was a recent court ruling in the Netherlands against the controversial facial recognition company, Clearview AI. Clearview has not operated in the EU for some time, but its primary use case is now prohibited under the EU AI Act. Despite this, Dutch privacy authorities argue that because Clearview retains data on EU citizens, it remains subject to the EU General Data Protection Regulation (GDPR), making the company liable and resulting in significant fines. Clearview argues that, since they have no legal presence or operations in the EU, they are outside the jurisdiction of the Netherlands, refusing both to pay the fines and to delete the data of Dutch citizens in their possession. This case, and similar ones, will take years to resolve.

Many nation-states recognize the challenges of regulating the digital world. They have sought to establish international trade agreements to mutually recognize digital sovereignty and align their laws along similar values and liability regimes. The EU, US, and UK signed a treaty about AI agreeing to certain fundamental principles for the technology, although it’s now left up to each respective country to implement the requirements of the treaty.

Our Take: AI is likely to exacerbate the challenges of regulating the digital world. AI is inherently data hungry, creating huge incentives to get as much data, from as many sources as possible, regardless of legal jurisdiction. In addition, AI has many complex ethical issues. Different countries could have very different views on what is appropriate generated content or not, but no one has a clear idea of how to enforce these rules internationally, especially if the AI models in question are open sourced.

2. Comparing California’s SB 1047 to the EU AI Act

As California’s legislative session concluded on August 31, 2024, state lawmakers finally sent Safe and Secure Innovation for Frontier Artificial Intelligence Models Act (SB 1047) to Governor Gavin Newsom’s desk for his signature or veto. SB 1047 has been a lightning rod of controversy, gaining criticism from federal lawmakers like former Speaker of the House Nancy Pelosi, and support from AI companies like Anthropic. The bill seeks to regulate large frontier models and impose liability on developers when “catastrophic harms” occur.

The goal of SB 1047 is to provide certain guardrails for large general purpose AI (GPAI) models, not unlike what is required under the EU AI Act for GPAI models. Below is a comparison of some key features from California’s SB 1047 and the EU AI Act:

Scope: The EU AI Act and SB 1047 target large GPAI models. However, the EU AI Act distinguishes between GPAI and those present systemic risk, whereas SB 1047 is concerned with all large GPAI models.
Covered Models: The EU AI Act and SB 1047 set computing thresholds as part of their definitions for covered GPAI models. However, there are small differences between the compute thresholds. Additionally, SB 1047 also imposes a $100 million cost to operate threshold that is not present in the EU AI Act. SB 1047 also covers fine-tuned models with applicable compute and cost thresholds.
Harms and Risks: The EU AI Act covers a broad range of harms, that includes risks that negatively impact public health, safety, public security, fundamental rights, or the society as a whole. Meanwhile, SB 1047 addresses a narrower category of “critical harms” caused by the covered models. These harms include creating weapons of mass destruction and causing events that result in mass casualties or $500 million in damage.
Key Obligations: Under the EU AI Act, all GPAI model providers must maintain technical documentation about their models and provide that information to downstream providers that will integrate those models into their AI systems. This requirement is similar to SB 1047, but SB 1047’s documentation requirements are focused on the GPAI’s safety and security protocols. Additionally, SB 1047 requires third-party compliance audits for GPAI model providers and the EU AI Act does not.
Kill Switch: A major difference between the two regulations is SB 1047’s inclusion of a “kill switch.” Under SB 1047, covered GPAI model developers must include a mechanism that allows for a full shutdown of the model and the conditions that would trigger such as shutdown. The EU AI Act does not have a comparable requirement.
Incident Reporting: Both pieces of legislation have incident reporting requirements. The EU AI Act requires GPAI that pose systemic risk to report “serious incidents” without “undue delay.” SB 1047 requires all covered GPAI models to report “safety incidents” to the state’s Attorney General within 72 hours of discovering or reasonably believing an incident occurred.
Enforcement: The EU AI Act and SB 1047 are enforced by regulators that can impose civil penalties for non-compliance. Neither legislation allows for a private right of action.

Our Take: While there are some differences between the two regulations, the goal is to reign-in large GPAI. Interestingly, the EU AI Act tried to distinguish risk levels among GPAI whereas SB 1047 appears to treat all large GPAI as inherently risky. This apparent tension is likely to cause compliance issues for organizations that would be subject to both sets of rules, should SB 1047 be signed into law.

3. Key Takeaways from PwC and Deloitte AI Reports

Two recent surveys from PwC and Deloitte highlight the current state of responsible AI adoption and the challenges enterprises face in scaling Generative AI (GenAI). Both reports highlight AI governance and risk management as a pathway towards AI adoption. Here are the key insights from the two reports:

Widespread GenAI Adoption: According to PwC, 35% of businesses have deployed GenAI in both employee and customer systems, while Deloitte finds that most enterprises have deployed less than a third of their GenAI experiments into production.
Top Drivers for Responsible AI Investment: Competitive differentiation is the primary reason why companies invest in Responsible AI. Followed by risk management, compliance requirements, building trust, and value creation.
Challenges to Scaling AI: Both reports highlight key barriers such as data governance, risk management, regulatory compliance, and a lack of governance models. Only 23% of companies in Deloitte’s survey feel prepared to manage GenAI risks.
Main Benefits of Responsible AI: The # 1 benefit achieved from investing in Responsible AI was enhanced customer experience, followed by enhanced cybersecurity & risk management, facilitated innovation, and cost reduction.
Obstacles to RAI Implementation: Deloitte emphasizes challenges around compliance with regulations like the EU AI Act. PwC notes that only 11% of executives report full RAI implementation, with challenges including difficulty quantifying the value of having “dodged a bullet,” such as avoiding a major scandal from a poor AI interaction.
Risk Mitigation and Data Management: Some of the top actions to manage GenAI risks include established governance frameworks (51%), ensure regulatory compliance (49%), and train employees on risk mitigation (37%). With 78% of leaders calling for more government AI regulation, many are using "walled garden" environments, shielding models from training on their data.

Our Take: Both reports stress that responsible AI governance is not a hurdle but a catalyst for innovation. By addressing risks and compliance challenges, organizations actually get ahead of issues before they happen, which allows them to confidently scale AI adoption while maintaining trust and aligning with regulatory frameworks. Responsible AI = Better AI.

4. Fine-Tuning: Turning LLMs into Specialists

General-purpose LLMs are effective generalists, but often underperform on specialized tasks. For example, a company may want a chatbot to speak in a certain manner or a hospital tool may need specific knowledge of SNOMED (a medical taxonomy). Simple customization can be achieved using Few-shot learning and Retrieval-Augmented Generation, but these approaches only provide additional context in each prompt and do not alter the underlying model. In contrast, fine-tuning is the process of further training the model to give it new behaviors and new knowledge by showing it new prompts and desired outputs. This process alters some or all of the model weights and produces a distinct, new model. The training can be implemented programmatically for open-weight models or using APIs for certain closed-weight models.

Strengths:

Style and Format Customization: Fine-tuning can be used to train the model to respond in a specific style with deeper customization than a system prompt or few-short learning can provide. This approach works well when a large amount of “existing” example interactions are available.
Specialized Knowledge: Fine-tuning can be used to train the model to recognize terminology of a niche domain. This fine-tuned model may still be combined with a RAG System to interpret the retrieved information more effectively.

Challenges:

Time and Cost: Fine-tuning hundred-billion parameter models can be slow and expensive for organizations, especially, since the process often involves experimentation and iteration. The PEFT (parameter-efficient fine-tuning) library from Hugging Face can be used to reduce costs, but it does not work in all scenarios.
Catastrophic Forgetting: During fine-tuning, the model learns to produce the correct output for specific prompts in the fine-tuning dataset (these datasets can range from 100s to hundreds of thousands of examples). By the end, the model can “unlearn” some of its general knowledge. More dangerously, fine-tuning has been shown to remove safety guardrails built-into models.
Unclear regulatory standing: Multiple proposed and implemented laws have targeted the developers of general-purpose AI. Since fine-tuning does additional training on existing models, it may be less clear if these models count as “GPAI” for the purpose of legal obligations for their creators. So far, CA SB 1047 provides explicit guidance for fine-tuning, while the EU AI Act talks about modifications more broadly.

Key Take-away: Training models from scratch is resource intensive and has a significant environmental impact; fine-tuning allows organizations to specialize the SotA base models for a fraction. However, for the time being, the process requires careful experimentation, and future legal implications are going to vary by jurisdiction.

*********

As always, we welcome your feedback on content and how to improve this newsletter!

AI Responsibly,

- Trustible team

The Trustible AI Newsletter