The Future of Jobs in the AI Era and the Risk of AI Over-Reliance
Plus an overview of AI risk management strategies, AI benchmarking needs an uplevelling, and a global policy and regulatory roundup
Hi there, Happy Wednesday, and welcome to the latest edition of the Trustible Newsletter! Last week, the Trustible team was out in force at the Responsible AI Summit North America in Reston, Virginia, connecting with AI governance practitioners from leading enterprises around the world scaling their efforts. (And this week, we’re trying not to melt every time we step outside our HQ in Rosslyn, Virginia.)
In today’s edition (5-6 minute read):
The Future of Jobs in AI Age
How to Respond to AI Risks
The plague of test data contamination (AI Cheating)
The Growing Problem of Over-Relying on AI
AI Policy & Regulatory Roundup
The Future of Jobs in the AI Age
The jury is still out on how much AI will impact the workforce. Plenty of leaders, including those at AWS and JPMorgan, have started laying the groundwork for shrinking white-collar staff as a result of AI. Some groups argue that this is yet another ‘fear’ tactic aimed at trying to encourage teams to learn AI skills, drive down salary costs, and increase productivity. Policymakers have also been acutely aware of potential negative employment impacts and have sought to offer some pushback, such as requiring employers to disclose when AI contributes to layoffs. There are some leaders that argue AI will not initially replace workers, but rather a human who is highly skilled in using AI tools, and thus more productive, will do so. There are even some who argue that economists are underestimating the impact AI may have on jobs and the workforce, and the pace at which it could happen. New college graduates seem to be the first impacted, as AI cannot yet replace expertise and experience, but can be better at some tasks than entry level knowledge workers. This of course begs the question of how we will keep training up the experts of the future without these early experiences, especially if using AI itself may reduce critical thinking.
However it’s not all doom and gloom. The counter-argument on these impacts is that humans will adapt their skills and preferences in the AI era, and new jobs will also emerge as a result. A recent New York Times Magazine article dug deep into 22 new roles that will be created because of AI. The article breaks down the 22 roles into the categories of Trust (those involved in overseeing, governing, and auditing AI systems), Integration (people doing the ‘last mile’ work of connecting data and systems to AI), and Taste (creative roles that are specifically aimed to finding that that AI is not capable of doing). It’s unclear how many roles in this category there will be, and many do still require a high skill level, and ironically, a deep understanding of how AI systems work, and their limitations.
Key Takeaway: A big question for corporate leaders, and policy makers, is whether these impacts will be fairly sudden, or gradual over time, and how fast people will be able to retrain (ironically using AI) towards new economic opportunities, or whether ‘AI replacement’ will become the next big political issue much like ‘globalized of production’ has become recently in the US.
How to respond to AI risks
AI risk management is a key (perhaps the key) part of an AI governance strategy. Identifying and managing risks can be an incredibly difficult task for many enterprises given the fast pace of change in AI systems and the evolving landscape of AI regulations (something help with at Trustible). Moreover, you may not even be able to control every risk yourself since so many use cases rely on leveraging third-party platforms.
There are four types of Enterprise Risk Management techniques that every AI governance professional should know about.
Avoid – Sometimes the safest move is to say “no.” If an AI system can’t meet your privacy or cybersecurity standards, don’t use it. However, saying “no” of course may not always be an option - let’s face it, AI is incorporated into more and more SaaS products each and every day, and IT and risk professionals are becoming more aware of the “Shadow AI” problems swiftly emerging across enterprise tech stacks.
Mitigate – Many risks can be reduced. Steps like removing personal data, checking for bias, adding human-in-the-loop review, and monitoring the system can bring risk down to an acceptable level. But not all mitigations are technical in nature; human mitigation, such as AI literacy training, is an effective form of mitigation that forms a key part of a comprehensive mitigation approach.
Accept – After careful review, some leftover risk may be worth taking. Conduct the right risk assessment, document the decision, get the right approvals, and move forward with implementation.
Transfer – When risk remains high, shift it elsewhere. In the AI value chain, risk is transferred downstream to groups building a model, those hosting the infrastructure supporting that model, the developers potentially customizing the model, and the end users consuming outputs. From a third-party risk management perspective, contracts that require vendors to cover certain losses or specialized insurance policies can carry part of the load and shield some risk.
Our take: Too many organizations simply avoid the risks altogether. This slows down or even halts AI adoption inside of their organizations. These four categories of risk response strategies can help you gain confidence in your organization’s ability to manage the risk and move forward with your AI innovation strategy.
The Plague of Test Data Contamination
While students have increasingly been accused of cheating using LLMs, models are doing the same through test data contamination. A recent study shows that models that excel at SWE-Bench, a popular software engineering benchmark, may be doing so through memorization, not a genuine understanding of coding tasks. The memorization occurs due to test data contamination, wherein the answers to the benchmark question are present in the training data. This invalidates the results, because the goal of the benchmarks is to measure how models perform to unseen data. The phenomena is not new - the MMLU benchmark has largely been deprecated due to major contamination issues. Unlike traditional machine learning, where training and test datasets consist of datapoints that can be easily enumerated, LLMs are trained on enormous datasets that are harder to filter. If benchmark data is public, it may inadvertently be included in a web scrape (especially if a copy of it is published on an unofficial website).
Model providers use benchmarks to demonstrate on increasingly harder tasks; however, contamination issues put into question the construct validity of these assessments. Novel tasks, like those in Apple’s recent reasoning paper, raise a question about the actual abilities of LLMs. Several recent benchmarks have attempted to protect against contamination, including LiveCodeBenchPro, which is regularly updated with novel coding questions, and MLCommon’s AILuminate, a closed benchmark run by the creator against the models’ APIs. For such benchmarks, it is crucial that the creators are impartial parties (unlike FrontierMath that was partially funded by OpenAI), and that each model is only evaluated once (to avoid gamifying the results). In addition to reducing contamination, such benchmarks can improve reproducibility, as LLM performance is sensitive to prompt formatting and parameters. Closed benchmarks have limitations, too, like reduced visibility into potential biases and bugs in their data.
Key Takeaway: Many popular LLM benchmarks are likely overestimating model performance due to test data contamination. Some groups are starting to implement semi-private benchmarks that can combat this concern, but this industry still has room for improvement, as a broader set of better benchmark criteria is being developed.
The Growing Problem of Over-Relying on AI
When we generally think about AI risks, the primary focus tends to be on bias or discrimination in the AI systems. However, these risks are realized and cause the most impact when applied in high-risk use cases (e.g., creditworthiness or employment-related decisions). Yet, the vast majority of use cases are more likely than not lower risk uses. When we think about risks through the dimensions of likelihood and severity, we may overlook one of the more likely risks that arise within lower risk use cases that has an evolving severity level: overreliance on AI.
Trustible’s risk taxonomy explains that overreliance on AI “occurs when users start to accept incorrect AI system outputs due to excessive trust in the system.” AI’s widespread use, especially generative AI systems, exacerbates the likelihood that this risk will occur. For instance, attorneys are increasingly relying on AI to help write legal filings but are falling prey to the AI’s confidence in legal citations. While the system’s hallucinations are the primary risk, it is the attorney’s confidence in content that manifests the harm. The education system is also seeing major upheavals as more students rely on generative to complete assignments, which has led to an increase in cheating or plagiarized assignments. Higher education professors have also relied on generative AI for their lessons, which raised questions over the value of college classes.
The severity of harm from overreliance on AI is the other dimension to consider, though one that is still coming into focus. A recent MIT study indicates that overreliance on AI systems has negative cognitive impacts, such as with memory or executive function. This may have an outsized impact on students who use generative AI to complete school work, but can also have negative consequences for adults who over-rely on the technology to accomplish tasks that require a certain level of cognitive ability.
Our Take: Overreliance on AI is a far-reaching risk but one that is preventable. Implementing AI literacy programs can help users understand how to leverage the technology without blindly accepting its outputs and excluding personal expertise or thought. Moreover, having an explanation accompany system outputs can help users easily identify source material to determine its validity and applicability.
AI Policy & Regulatory Roundup
Here is our quick synopsis of the major AI policy developments:
U.S. Federal Government. The federal AI moratorium took a big step towards becoming law when the Senate Parliamentarian determined it did not violate Senate procedure and could remain in the Republican’s reconciliation bill. The moratorium survived after Senate Republicans amended the language to condition states’ access to certain broadband funds on their temporarily pausing AI laws. The language could still be stricken from the bill via floor amendment. It also faces opposition by House Freedom Caucus members.
U.S. States. AI-related policy developments at the state level include:
New York. The state legislature passed the Responsible AI Safety and Education (RAISE) Act, which is a modified version of California's SB 1047. Notable differences from SB 1047 include not requiring a “kill switch” for frontier models and caps on penalties. Legislators have not yet sent the bill to Governor Kathy Hochul, but can do so at any time in 2025. Once it arrives on the Governor’s desk, she will have 30 days to act on it.
Texas. On June 22, 2025, Governor Greg Abbott signed the Texas Responsible AI Governance Act into law. It will take effect on January 1, 2026.
European Union. Sweden’s Prime Minister, Ulf Kristersson, is the latest (and first EU government leader) to support pausing the EU AI Act. His comments come as the EU considers a digital omnibus package that aims to simplify regulations and could include amendments to the AI Act.
Latin America. A group of 12 Latin American countries, led by Chile, are working to develop an LLM focused on the cultural and linguistic diversity of the region. Latam-GPT is an open source project that is intended to help boost AI accessibility across Latin American countries.
United Nations (UN). The United Nations Working Group on Business and Human Rights released a report warning that AI systems must be developed to align with the UN Guiding Principles on Business and Human Rights. The report was presented as part of the 59th session of the Human Rights Council.
—
As always, we welcome your feedback on content! Have suggestions? Drop us a line at newsletter@trustible.ai.
AI Responsibly,
- Trustible Team