--- title-long: "The Science of AI: Understanding and Examining Systems" title-short: Understanding AI Systems document-id: understanding tags: - research - openness - evaluation - audits - journalism # abstract in text format abstract: > Governance of AI systems requires sufficient understanding of the technology, including of its strenghts, risks, and weaknesses, and the trade-offs between different interests inherent in its development choices. This understanding depends on a sufficient open research ecosystem to support broad public awareness and robust auditing practices. # introduction and sections in HTML format introduction: >

As AI is becoming ever more ubiquitous, increasingly more builders and affected stakeholders need to understand how it works, what it can and cannot do, what trade-offs are involved in developing the technology, and how it can be leveraged or improved in particular contexts. This requires sufficient visibility and a thriving research ecosystem that is inclusive of perspectives outside of those of the developers working within the very best-resourced companies.

Making informed decisions about AI systems requires understanding how the technology works, what development choices are available to meet certain goals, and how they trade off different priorities. Approaching AI as a science means upholding scientific integrity, which includes reproducibility, verifiability, and increasing the breadth of people who can use the technology and contribute to scientific development.

sections: - section-title: Open Research, Transparency, and Replicability section-text: >

In order to properly use and govern AI systems, we need answer to a range of questions:

All of these questions are the subject of ongoing research. In order to be reliable, this research needs to meet basic scientific values, including replicability. Given the importance of framing in research and the potential tensions between the interests of other stakeholder groups, access to AI systems for external stakeholders should also be sufficient to allow research from indepent expertise that is both multidisciplinary and represenative of all groups affected by the technology.

- section-title: Science and Pitfalls of AI Evaluation section-text: >

The science of evaluation of AI systems in particular is very much underdeveloped. API systems are routinely evaluated with unspecified software additions, and sometimes without an exact version tag. The phenomenon known as "benchmark contamination" is endemic to modern systems, and impossible to quantify without transparency on training datasets.

More generally, not everything that should be understood about AI system can be measured using a quantitative automatic metric at the system level. For example, it has been argued that safety is not a model property. Understanding what it means to evaluate systems in context and to assess development practices, including data collection and its impact on data subjects, needs to take a broader view.

- section-title: Audits and Investigative Journalism section-text: >

System audits and investigative journalism are necessary function for governance. Neither can reliably fulfill their purposes without a sufficient basic understanding of the technology; enough to know at least what questions to ask within audits, and what aspects of a system to look at in more detail for journalists.

resources: - resource-name: HF Open LLM Leaderboard resource-url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard - resource-name: 'EleutherAI: Going Beyond "Open Science" to "Science in the Open"' resource-url: https://arxiv.org/abs/2210.06413 - resource-name: 'AI auditing: The Broken Bus on the Road to AI Accountability' resource-url: https://ieeexplore.ieee.org/abstract/document/10516659 - resource-name: Google Doc topic Card resource-url: https://docs.google.com/document/d/1D2KA3CKcuKOc9mOMRKjucYrYBREx30z9xuKUGSgQUEE/ contributions: > Yacine Jernite wrote this topic card.