Init
Artificial Intelligence designs its systems according to two kinds of ideal. One kind involves doing something well: solving problems, maximizing expected utility, minimizing required training data, among more concrete ideals like winning Go games or driving cars well. The other involves being like us in some sense: looking like us, behaving like us, or being structured like us. Call the first kind adeptness ideals and the second anthropomorphic ideals.
Systems today, both in AI alignment and AI more broadly, are typically designed according to adeptness ideals, with humans used mainly for inspiration (outside of scientific efforts to understand humans through AI).
By contrast, in this blog I will argue that designing explicitly anthropomorphic AI will be crucial for developing aligned AGI, and give concrete research directions and comparisons to existing approaches. Befitting the medium, I’ll present the ideas in no particular order, pull it together as we go, and promise that all of your questions will be satisfactorily answered in some later post.
Any anthropomorphic AI approach rests on an understanding of what humans are. Among other things, we are mammals that mimic meanings. Anthropomorphic AI should be too, given an appropriate generalization of these concepts. I’ll elaborate on what this means and why it’s true in a later post.
Humans are a messy meaty outgrowth of biological evolution and historical accident, but there’s ultimately no way of escaping the fact that we are human. All of our highest dreams and aspirations, no matter how abstract and universal, are only human. This fact is not a bad thing, but any aligned AGI approach must take it into account.