I am reading "The Alignment Problem" by Brian Christian which bears the subtitle, "How Can Artificial Intelligence Learn Human Values". I am paraphrasing in stating what the alignment problem is. This book describes the alignment problem as the name of a "central and urgent question in computer science. the question of how to ensure that algorithmic models capture our norms and values, understand and do what we want and in doing so avoid catastrophe " (page 13). This post is intended to be my reading notes - I use this blog to try to understand AI and as such this post will continue to change and may not be intelligible. Forgive me for that - It's how I'm processing this information.
I am finding this book very interesting and readable on a topic of utmost importance. If you are concerned about AI's influence and place in our culture and society this is an excellent read. I also like the structure and layout of the book.
"The Alignment Problem" is scholarly and researched in it's use of notes, bibliography and an index. But don't let that scare you. This book is understandable to the layperson.
The following section are my reading notes. They will continue to change as I read and digest this information.
This book is structured in three main parts:
- 1. Prophecy, 2. Agency and 3. Normativity.
- Prophecy has three subsections - Representation, Fairness, and Transparency - This section discusses the present day (as described in 2020) systems at odds with humans best intentions and the complexities of trying to make those intentions explicit in systems we feel capable of overseeing (summarized on page 14)
- Agency has three subsections - Reinforcement, Shaping, and Curiosity - This section looks at reinforcement learning. Summarized on page 14 as follows, "Part two turns the focus to reinforcement learning, as we come to understand systems that not only predict, but act; there are lessons here for understanding evolution, human motivation, and the delicacy of incentives, wiht implications for business and parenting alike.
- Normativity has three subsections - Imitation, Inference, and Uncertainty - this final section is summarized on page 14 as follows: Part three takes us to the forefront of technical AI safety research, as we tour some of the best ideas currently going for how to align complex autonomous systems with norms and values too subtle or elaborate to specify directly.
Chapter notes:
Chapter 1 - Representation - the chapter ends in three takeaways beginning on page 47 and 48.
The first take away is methodological. "Computer scientists are reaching out to the social sciences as they begin to think
more broadly about what goes into the models they build. Likewise, social scientists are reaching out to the machine-learning community and are finding they now have a powerful new microscope at their disposal".
The second take away says that, "biases and connotations ... are real. They are measurable, in detail and with precision. They emerge spontaneously and reliably, from models built to do nothing put predict missing words, and they are measurable, quantifiable and dynamic. They track ground-truth data about labor participation as well as subjective measures of attitudes and stereotypes. All this and more are present in models that ostensibly just predict missing words from context; the story of our language is the story of our culture."
The third take away says that, "These models should absolutely be used with caution, particularly when used for anything other than their initial purpose of predicting missing words.
Chapter 2 - Fairness. Chapter 2 spends a great deal of time talking of fairness in terms of criminal justice and the use of predictive modeling in that area. The following quote attributed to Moritz Hardt is one of the three that start the chapter and I keep thinking about this after reading this section. "As we're on the cusp of using machine learning for rendering basically all kinds of consequential decisions about human beings in domains such as education, employment, advertising, health care and policing, it is important to understand why machine learning is not, by default, fair or just in any meaningful way (page 51). I believe that means that the source of the machine learning is the source of fairness or unfairness and that is not a simple proposition to determine or provide the data necessary for fairness or understanding when something isn't fair.
Comments
Post a Comment