8 Comments

Ted, I'm wary of taking up too much of your time. But, if you are interested, the link below is my attempt to address what I see to be the underlying problem, an outdated relationship with knowledge. Don't feel obligated to read this, but if you should do so, any and all feedback, suggestions for improvement etc are most appreciated.

https://www.facebook.com/phil.tanny/posts/pfbid028vNnknjphbS3kQdGW8eat6KDp1teZTfMu2TAtj6eKQUg1cVE6VgFekzy8g38cp4jl

Expand full comment

Thank you, this looks like a quite interesting substack, I'm happy to have found it.

You write, "How to create AIs aligned with human flourishing is currently an unsolved problem."

For the moment, let's assume we somehow learn how to ensure that the AI we create aligns with our values. I've not yet understood how this solves the problem. What is the plan for dealing with AI created by those who don't share our values?? Russia, China, Iran, North Korea, corrupt governments all over the world, criminal gangs etc.

Isn't the concept of AI safety basically a fantasy? Do AI developers sincerely not see that? Or, a more cynical theory, are they feeding us AI safety stories to pacify us while they push the industry past the point of no return? I really don't know. Interested to learn more.

Expand full comment
author

Totally valid concern, and I think that *alignment researchers* understand that there is no non-contradictory but comprehensive set of values. Look up “coherent extrapolated volition” to see an early attempt to define it. I deliberately said “flourishing” instead of values. But that doesn’t help much, since if you throw out those who actually don’t want humans to flourish, you still have deep disagreements about how to accomplish flourishing. I think researchers are trying to solve (a) how to align an AI with *any* coherent value set (since they can’t solve human nature, social and political problems), and (b) mostly, as I put it, avoid “erosion of our (civilizational) ability to influence the future.” Important here also to distinguish the concerns of AI developers from AI alignment researchers. They generally disagree.

Expand full comment

Thanks for your reply Ted, appreciated, and educational.

One of my concerns about discussion of AI alignment is that such discussions seem to typically give the impression (to the general public at least) that the smart people are working on safety, and will somehow someday figure out how to make AI safe. Not just this or that particular AI, but AI in general, as a technology. I have the same complaint about discussions of genetic engineering. Experts like Doudna typically deflect concerns with talk of governance schemes and such.

What I see very little of (perhaps you can help here) are experts willing to tell us the truth, which is that none of these very powerful technologies will ever be safe, until the humans using them are all safe. Which isn't going to happen.

So far, I can't figure out whether such experts are clueless about human nature, or are perhaps being dishonest for business and career reasons. Some combination of the two?

Such experts appear to have learned nothing from the nuclear weapons story. Once any technology is developed, sooner or later it becomes available to both good guys and bad guys. And so we find ourselves in situations where Kim Jong-un will soon be able to demolish America if he should choose to do so. Seeming to have learned nothing from the nuclear story, we are now in the process of adding AI and genetic engineering to Kim Jong-un's toolbox.

So to mercifully end this rant, would it be reasonable to label AI alignment a fraud? None of the developers or researchers have the power to make AI safe, right?

Where are the AI experts who will simply say...

"AI is never going to be safe. Get over it."

Expand full comment
author

Alignment is a serious undertaking, but the real work has a very low profile in mass media. It happens in specialized discussion forums and obscure think tanks.

So, to answer “Where are the AI experts who will simply say”?

The most prominent of these is Eliezer Yudkowsky of the Machine Intelligence Research Institute (Berkley). He is self-taught and might be one of the smartest people on earth. He started out 20 years ago thinking about what alignment would be, coined the idea of “coherent extrapolated volition” as the ideal of what to align to. Over the decades, in heated discussions with other super brainy people, he eventually decided that there will be a level of machine intelligence above which the machine will *always* kill all of us. The only solution, he says, is to stop development before we get to that level. To him this is as certain as a proven math theorem, and he has written very extensively about it. Most of his work (and contributions of many others) has been in the discussion sites AI Alignment Forum and LessWrong/LessWrong 2.0 . At this point, nearly all other discussion is about trying to find ways to prove Eliezer wrong. Meanwhile, he has had major health issues, and has given up on our survival. Now he just hopes we will go out with dignity. His is a stunning saga and someone ought to write his biography.

Somewhat accessible coverage of Eliezer’s recent work is on Scott Alexander’s blog, such as https://astralcodexten.substack.com/p/practically-a-book-review-yudkowsky

For very recent analyses and extrapolations that totally agree with Eliezer, and suggest that we are much closer to the tipping point than even experts generally thought, see, e.g.,

Veedrac on LessWrong: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality-is-the-tiger-and-agents-are-its-teeth

Gwern’s blog: https://www.gwern.net/fiction/Clippy

There are many other players, some of whom are still looking for solutions. The rabbit hole is both wide and deep. Meanwhile the actual developers are in various forms of denial, even OpenAI, which was formed specifically to deal with safety.

The most visible work on alignment risk probably still is Nick Bostrom’s 2014 book, Superintelligence. What we have learned since is that you don’t need some transcendent level of intelligence to destroy everything. I wrote a take on superintelligence that I now know to be naive, but was the beginning of my attempt to make the issue more accessible:

https://towardsdatascience.com/your-friendly-neighborhood-superintelligence-f905ff21dfa4

To this day, most posts on popular places like Medium are too shallow, and some just drivel.

For broader coverage of all existential risks, see places like: Future of Humanity Institute (Oxford, co-founded by Bostrom) and Centre for the study on existential risks (Cambridge U), and Future of Life Institute (Cambridge, Mass)

They are the ones trying to figure out how to save humans from themselves. Good luck to them.

Expand full comment

Wow Ted, thank you for a very educational reply to my comments. Your last comment above should be elevated to a full article.

I'd not heard of Eliezer Yudkowsky, and so will dig deeper in to his story.

I hear you on "nearly all other discussion is about trying to find ways to prove Eliezer wrong" as I've been banned from LessWrong for my thoughts on this matter, as the mods concluded that "I just don't fit in".

I'm a regular poster, pretty much the only poster (??), on the Future of Life Facebook page, so I'm generally familiar with their good intentions.

Thank you for the other links you've shared, you've given me a lot to chew on.

I must admit, I remain puzzled as to why all these experts are talking about AI alignment. Their discussions seem to be built upon an assumption that the future of AI is their decision to make. It's the same with genetic engineering. Have these folks not heard of the Chinese Communists??

The fact that the experts don't seem to get that they've already lost control of these technologies is steadily undermining my confidence in their pronouncements.

Expand full comment
Comment deleted
Expand full comment
author

Ian, This seems like a very important comment. I don’t have the bandwidth to respond yet. Thanks for taking this seriously.

Expand full comment
author

I agree with everything you’ve said <grin> but let me be more specific.

My understanding of boundedness (as I’ve seen it in the serious alignment literature) is that its a concept, a theoretical property that we might never achieve.

In another essay, Really Why AIs Will Be Dangerous, I tried to explain my limited understanding of the alignment theorizing of the giants. So, to your points …

A powerful AI will try to optimally achieve goals in the real world based on a world model, its limited understanding of that world. And: “Specifying goals in utilitarian terms can never consider the utility of all possible means and ends in a complex world. This leads to unboundedness: the pursuit of those goals to extremes, using any and all resources in the world that exist, without regard to, or understanding of, negative ‘side effects’ on human civilization.”

The literature also has a lot to say about about how fiendishly hard it is to safely and completely specify any goal, like fly a plane, or run a corporation, or solve problem X. Yet we get the current AI lords saying that is where our payoff will be.

Self-continuation is a dangerous driver of other dangerous, spontaneously arising instrumental goals, such as reasoning power, influence, physical resources. Most people understand self-preservation from their own biological imperatives. But an AI would see that it can’t fulfill its purpose(s) if it stops existing.

As to Venkatesh, his level of brilliance, erudition and contrarianism is too demanding for me to evaluate. I did read that essay, but couldn’t be sure I got a coherent picture. Maybe we can just say civilization is almost entirely an ant mill, and call it good.

Expand full comment