The Real AI Policymakers
Plus, Emotional AI, backlash against AI in Terms of Services, and Perplexity’s web scraping practices
Hi there! Happy Fourth of July to our readers in the US. In today’s edition:
The Real AI Policymakers
What you need to know about drafting your organizational AI policies
The Perils of Web Scraping for Data
AI changes to Terms of Service get new scrutiny
AI is getting emotional
In case you missed it: Trustible was recently selected for Google for Startups funding! Read more about the announcement here.
1. The Real AI Policymakers
Though AI legislation gets significant attention, daily decisions will be handled by other governmental agencies rather than lawmakers. These delegated bodies will draft standards, set criteria for ‘frontier/systemic risk’ models, and potentially make enforcement decisions. The entities selected for these tasks, and the people inside those entities, will determine a lot on what AI regulation looks like. Here is who is likely to be really be in charge of AI regulation:
Dedicated AI Offices / Agencies
Both the EU AI Act and the proposed California Frontier Model Legislation (SB 1047), created dedicated AI focused agencies. These agencies will be given some very direct powers to designate certain models as ‘systemic/critical risk’ and therefore require mandatory evaluations and reporting.
Trade-offs: This approach may help attract appropriate AI expertise and talent and ensure that regulators are able to stay abreast of the technological developments in AI. However, others worry about creating more bureaucracy and impeding innovation too much.
Privacy Boards
Many privacy regulations created dedicated privacy boards to oversee the implementation and enforcement of data privacy laws. Tools like ChatGPT have already run into issues with GDPR, and the California Privacy Board has proposed several regulations for automated decision making systems.
Trade-offs: Privacy boards will undoubtedly play a role in regulating AI systems that extensively use personal information. However, we’re unsure if they will (or should) oversee all other types of AI. Under the AI Act, each EU member state must designate a government body as the AI regulator, and it is yet to be determined whether privacy regulators will be chosen or if new bodies will be created.
Existing Agencies / Antitrust Regulators
Many government agencies may already have purview over use of AI in certain sectors. For example, the US insurance regulators have started passing rules to regulate specific AI uses within insurance. Meanwhile, the US Federal Trade Commissions (FTC) has been vocal and aggressive in looking into potential anti-competitive and anti-consumer aspects of the current AI race.
Trade-offs: A key question for regulators is whether domain expertise or AI technological expertise is most important for regulation. The answer may vary on a case-by-case basis. It’s also unclear how much existing agencies can claim the right to regulate AI absent of clear legislative mandates.
Judges
There is no shortage of active litigation against leading AI companies. Many key AI legal issues such as AI product liability, use of copyrighted data, and defamation with deep fakes may likely be decided by courts around the world. Judges in various jurisdictions may decide very differently on these issues, and even with legislative standards, creating precedents around these systems may take decades.
Trade-offs: Judges can vary greatly in their own technical backgrounds and understanding of systems, and often do not have any direct accountability or responsibility for either protecting consumers, or promoting innovation, despite their ruling and precedents potentially shifting entire markets.
Attorneys General
Laws such as CO SB 205 deliberately gave all enforcement responsibilities to the Colorado Attorney General’s office. That means no private citizen of Colorado may file a suit for a company breaching the recent AI law (much to the chagrin of civil rights advocates). While the AG may create a dedicated department, there’s no clear playbook or standard for how to approach this enforcement.
Trade-offs: AG offices and existing enforcement groups are often under-resourced, and have to strategically choose enforcement action, knowing that they cannot prosecute all potential violators. Whether these offices will be able to attract appropriate technical expertise is also an open question.
Key Takeaway: Legislation is just the beginning of AI regulation. The agencies/bodies put in charge of interpreting and enforcing AI-relevant laws may have a larger impact on AI regulation than the core text of the law itself, and there’s no global consensus on who is the ‘right’ set of people to appoint, and what tradeoffs that comes with it.
One thing to watch in the US is the latest Supreme Court decision limiting Executive Branch power. For this reason, keep expecting state and local governments to play an increasing role regulating AI, as Colorado, New York, California, and Illinois have done.
2. What you need to know about drafting your organizational AI policies
An organization’s ‘AI Policy’ is an overloaded term that can mean many things to different people. In our discussion with organizations, we find it typically means one of 3 different kinds of policies:
A Comprehensive AI Policy: Covers AI roles, responsibilities, development processes, and governance to comply with standards and regulations like ISO 42001 and the EU AI Act.
An AI Use Policy: Defines permissible AI tools and use cases within an organization to mitigate risks.
Public Statement of AI Principles: Outlines ethical AI principles, including data treatment and human oversight, to build public and customer trust.
Follow our blog where we provide guidelines, key questions, and considerations for drafting your Comprehensive AI Policies, your AI Use Policies, and your Public AI Principles (coming soon!).
3. The Perils of Web Scraping for Data
A recent series of stories threw Perplexity AI into hot water on two fronts: first, due to their plagiarism of a Forbes article, and second due to their practice of ignoring the robot.txt files (aka the Robot Exclusion Protocol), which dictate what websites can be scraped by bots. The two factors are related but separate concerns: copyrighted content may exist on any website and is subject to certain legal protections. Respecting robot.txt files is a good-faith protocol, and websites without copyrighted content can still request that their data not be scraped.
Using scraped web data to train GenAI models is a standard practice. Popular open datasets, like RedPajama and DOLMA are trained on web-scraped data that was collected by respecting the robot.txt; however, neither attempts to filter copyrighted content citing a difficulty of doing so at scale and open questions of what qualifies as fair-use in this context. Details of how closed-source models tackled both issues are not publicly available. In addition, some of these providers (e.g. ChatGPT and Gemini) can now use web searches to provide up-to-date results to a user prompt. Both services provide links to the web pages used to produce the output, so they would likely not qualify for a copyright violation, but content creators may issue with reduced traffic to their websites or bypassed paywalls.
Some content developers have pushed back through lawsuits or by pushing cloud providers to take down offending scrapers (e.g. AWS and Perplexity). Others are licensing their content to developers. Additionally, web-scraped content poses risks such as Indirect Prompt Injection attacks, where harmful prompts embedded in third-party sources can lead to unintended and harmful AI responses. While model providers are developing defenses, these attacks can have wide ranging consequences.
Our Takeaway: Using scraped web-data is the status quo for developing Generative AI models, but building legal, ethical and security concerns highlight a growing need for better practices. Content producers previously had an incentive to allow search engines to scrape their content, as search links led to website traffic. It’s unclear if this ‘symbiotic’ relationship will continue into the generative AI age, and what effects that may have on the internet.
4. AI changes to Terms of Service get new scrutiny
Terms of service underpin the relationship between a user and a company for which they seek a product or service. Yet, it is not uncommon for a user to scroll quickly through a company’s terms of service, click ‘accept,’ and promptly forget them. The problem stems from the fact that terms of service are typically dense legal documents that would take time to thoroughly read and digest. However, AI’s proliferation may upend how users view these agreements. As organizations begin developing and deploying new AI systems, they are updating their terms of service to reflect this new reality; often at the expense and ire of their customers.
Tech companies, like Google and Meta, were recently under the microscope for changes they made to their terms of service. While some of the changes were tweaking just a few words, the implications were far-reaching. For instance, Google updated its terms of service to stipulate that publicly available data would be used to train its AI models. A few weeks ago, Adobe came under fire for a change in their terms of service that appeared to allow the company to train its AI on user created works. The backlash prompted Adobe to explicitly clarify that it was not in fact using user content to train its model.
Regulators have taken notice and sought to clarify what organizations should be disclosing to users and prospective customers. The Federal Trade Commission (FTC) issued a warning in February 2024 that suggested changes to an organization's policy to retroactively use data to train their AI systems may be “unfair and deceptive.” The FTC’s Bureau of Consumer Protection Director recently reiterated that the agency is leveraging existing authorities to protect consumers from harms that may stem from these policy changes.
Our Take: It is common for organizations to update their terms of service or other public facing policies to align with technological advances. However, wholesale policy changes run the risk of angering customers and inviting regulator scrutiny. It is important to be clear about how these changes may impact an organization’s users while also making sure they fall within the bounds of the law.
5. AI is getting emotional
Using AI/ML for emotional analysis is among the most contentious use cases for AI. Its proponents believe understanding human emotions can do things like help create suicide detection systems, personalize digital experiences based on frustrated, and help ensure analyze video/picture data better. Emotional recognition opponents focus on the pseudo-science these systems are often based on, big privacy issues, and the huge amount of biases that can affect these systems – ranging from discrimination against certain demographics (such as neurodivergent people), to simply weaponizing this information to deliberately manipulate people.
The EU AI Act, as well as other regulations, recognize these huge risks, and strictly limit when and how these kinds of biometric data analysis tools can be used, often banning them except for certain law enforcement or safety purposes. In a similar vein, some specific uses have previously been regulated such as the Illinois law limiting how and when AI can be used to analyze job application videos. Despite these regulatory hurdles, research and investment in this space has continued.
Recently, a group of Finnish researchers released their research paper that taught an AI model how to recognize, and react to its user’s emotions. Several startups in recent years, such as Italy’s MorphCast, have also emerged trying to tackle this problem and find ethical ways to deploy the technology.
Our Take: Recognizing and understanding human emotions isn’t a straightforward problem (despite what Inside Out 2 might tell us!). The space is rife with ethical issues, and some heavy preemptive regulations. As a result of both of these hurdles, despite plenty of press and media attention, it’s unlikely these kinds of emotional recognition systems will get widespread adoption or use anytime soon, if for no other reason than getting access to capital to invest in it will be extremely difficult.
*********
As always, we welcome your feedback on content and how to improve this newsletter!
AI Responsibly,
- Trustible team