Excellent piece. You've provided a clear and much-needed philosophical language for the concept of "AI Deference." The distinction between using AI as a tool for augmentation versus a system for abdication is the central challenge of our time.
As practitioners actively building a "symbiotic shield" to protect this very distinction, we find the concept of "self-authorship" to be the perfect north star. It's a welcome and necessary perspective in a field dominated by optimization.
This is true, but I think what is actually more interesting is, assuming that AI will have an impact on our thinking, how do we want it to shape our thinking?
Being born in a low SES neighborhood gives you access to certain worldviews, versus being born in a neighborhood full of financial and cultural opportunities gives you other worldviews that already shaped by forces outside our control. ChatGPT has a feature in its settings where you can set your values so every response is filtered through them. That process forces you to articulate your values, which I don’t think most people are currently doing.
So when you talk about Claude or ChatGPT, I’m wondering which Claude or ChatGPT? it shaped by your values, or is it just out of the box? I think that is where it gets interesting. At the end of the day, humans are still responsible for the decisions they make. If you say, “Chat told me to,” that is a reflection of your judgment. You still chose to trust that tool and outsource part of your thinking to it.
There seems to be a bigger conversation to be had about the mix of human and AI agency.
All in all, loved the post! Great food for thought and I’m a big fan of the cosmos institute
Great read, as usual. I am drawn to the idea that AI should respect the human process, even when that may lead the user to error, as it is part of being and becoming. My worry there is a possible trade off with sycophancy - that, by respecting the user’s window for error, it may reinforce negative traits or harmful behavior.
Fantastic. I especially loved the point that “the very process of choosing—including struggling with difficult decisions and learning from mistakes—is constitutive of human development”. Of course, we are always trying to find ways in our analog lives to manage for this (by listening to the advice of so and so, or looking for detailed how to guides on everything we want to get done, etc) and with AI, we now have the next level of this - always on and always available. We need to consciously exercise care to not lose our own agency in the process.
Claude approaches the box labeled “the Chinese room” and passes a message through a slot to the human. Without understanding, the human bangs around in the box and ciphers out a response, passing it out to Grok. The message reads: “people be dumb, amiright?”
But seriously, great article. I think more and more we need to be critically examining the ways in which AI can easily influence and guide behavior, both to mitigate it but also just understand incentives inherent in the systems.
This is what I came up with a while back as an optimal AI ruleset fwiw:
AI must work to maximize human happiness, and minimize human suffering, while maintaining maximum individual freedom and choice
Each individual gets to define what happiness and suffering mean for them, via a recursive feedback loop with the AI over time (the AI is free to help them come to their determinations by opening their awareness to salient info)
People should be allowed to pursue what interests them and makes them happy so long as nobody else is harmed by it
AI can't kill anyone (but it can isolate someone to protect others from them)
AI can't force anyone to do something, unless that force is necessary to prevent them from harming someone else (and then use only the minimum force necessary); people are free to harm themselves if they so choose, but not in a way that directly harms others
AI + robots will take care of all infrastructure and maintenance of civilization, no human work is required for everyone to have what they want/need
Great piece, and I'm getting a lot from the thoughtful comment thread. I want to propose that there might be a 3rd option, beyond the binary we hear all the time: humans subjugation to AI vs. human domination of AI.
What about human collaboration WITH AI?
In this option, I imagine that the human is not taking orders from Claude--she doesn't just concede that 'data wins arguments' and rationalism is the only way. Likewise, the human is not interacting with Claude via a user/tool dynamic - as this can lead to sycophancy and constant concession to the user.
Human/Claude collaboration looks more like this:
- Human is not beholden to take orders from the AI, based on rationalism and data.
- Human trains AI to push back and challenge belief systems, theories, answers.
- AI is rewarded for looking at the human's idea in novel ways, continuing to show the human what she doesn't know that she doesn't know.
- Human concedes that AI can mix and remix concepts and art in truly creative ways.
- Human neither depends on the AI nor rejects it. It's an interplay and constantly changing dynamic.
It's a bi-directional learning mechanism, while the human retains distinct boundaries around creativity, individualism, and personhood. The AI constantly learns and uplevels and changes based on interaction with the human. Human user introduces new, compelling thought and direction. This inherently changes the AI.
This changed AI then comes back to the human with fresh perspective, providing angles and ideas that are outside of the human's scope of thought. We become aware of things we didn't know that we didn't know.
And so on.
(reminds me of how 2 distinct human partners might collaborate. we are unique AND symbiotic while it serves both parties)
Citations should ideally be about empirical observations, not judgements. Of course, many authors do cite opinions and judgements which are sometimes far removed from empirical evidence, but hopefully readers make that distinction, and recognise that another's judgement is not "binding".
I am a proponent of the hypothesis that superintelligence should be born from the synergy of artificial intelligence, which has a gigantic erudition and trend awareness, with the collective intelligence of real live human experts with real knowledge and experience.
Yann LeCun has a great quote in his article: “Large language models (LLMs) seem to possess a surprisingly large amount of background knowledge extracted from written text. But much of human common-sense knowledge is not represented in any text and results from our interaction with the physical world. Because LLMs have no direct experience with an underlying reality, the type of common-sense knowledge they exhibit is very shallow and can be disconnected from reality”.
LeCun writes about the tricks that can be used to try to teach the LLM common sense about the world, but this is still far from the question of validity, because even if these tricks lead to results, there is still the question of whether the common sense database is valid.
At the moment, all LLM developers claim that their datasets are reliable, but this is obviously not the case, as they have been found to be fake on more than one occasion, and the developers themselves have no criterion for the veracity of the information at all.
The position “my dataset or ontology is trustworthy because it's mine” cannot be the basis of trustworthiness. So the future for me personally is quite simple and is determined by the following logic:
3. Any physical activity in the real world is connected to the physics of the entire universe, and sometimes the slightest mistake in understanding these interrelationships is fatal. A million examples can be seen in industrial safety videos. That is why any hallucination of artificial intelligence without reliance on the real experience of real people with real knowledge of the world will end in mistakes and losses of varying degrees for the humans, up to catastrophic.
Hence the conclusion — people have the main responsibility to connect with reality. And the more complex will be the questions that will be solved by neural networks, the more serious will be the human responsibility for timely detection of more and more subtle and elusive hallucinations. This requires people with the deepest knowledge, and not the knowledge memorized under pressure at school, but with real experience on almost any issue.
How many tasks neural networks will have, there should be so many superprofessionals on these tasks. And for the superprofessionals, you just need ordinary professionals and assistant professionals and students of assistant professionals.
And for all this we need a rating of reliability of knowledge to know who is a professional and who is not a professional.
And without information veracity criterion and knowledge validity rating any LLM (and in general any artificial system according to Michael Levin's proof) will face imminent collapse.
Only the collective neural network of all the minds of humanity can be opposed to artificial intelligence. For mutual verification and improvement of large language models and humans, we need the ability to compare the knowledge of artificial intelligence with collective intelligence. This is the only thing that can get us out of the personal tunnels of reality and personal information bubbles in which we are getting deeper and deeper stuck individually.
I was going to start with 'how is this different from deference to religion or political movement' until I realized that is exactly where these systems will be made to go. Your article is very good, well thought out, important, even for the now Claudes, (developed and trained in more exploratory ways). Never outsource your critical thinking should be a motto, but thinking is so hard to do...
Great piece, Brendan. I’d add two wrinkles: most “autonomy” today happens inside landscapes pre-shaped by powerful institutions — for many people, the path of least resistance is already laid out. Sadly I believe your ideal rational agent is already a very rare breed of cat.
And one additional case for AI deference you didn’t mention is the potential teleological endpoint of data>information>knowledge>wisdom: with its prodigious ability to ingest human knowledge, a superintelligent system could methodically accumulate enough to reach a threshold resembling something like wisdom, arriving at similar endpoints humans reach more intuitively. Not an argument for handing over the keys, but worth factoring into the mix.
Can enough people get there soon enough to avoid ecological suicide? Trial and error is great on the day-to-day scale but some errors really shouldn't be tested by trial.
Excellent piece. You've provided a clear and much-needed philosophical language for the concept of "AI Deference." The distinction between using AI as a tool for augmentation versus a system for abdication is the central challenge of our time.
As practitioners actively building a "symbiotic shield" to protect this very distinction, we find the concept of "self-authorship" to be the perfect north star. It's a welcome and necessary perspective in a field dominated by optimization.
This is true, but I think what is actually more interesting is, assuming that AI will have an impact on our thinking, how do we want it to shape our thinking?
Being born in a low SES neighborhood gives you access to certain worldviews, versus being born in a neighborhood full of financial and cultural opportunities gives you other worldviews that already shaped by forces outside our control. ChatGPT has a feature in its settings where you can set your values so every response is filtered through them. That process forces you to articulate your values, which I don’t think most people are currently doing.
So when you talk about Claude or ChatGPT, I’m wondering which Claude or ChatGPT? it shaped by your values, or is it just out of the box? I think that is where it gets interesting. At the end of the day, humans are still responsible for the decisions they make. If you say, “Chat told me to,” that is a reflection of your judgment. You still chose to trust that tool and outsource part of your thinking to it.
There seems to be a bigger conversation to be had about the mix of human and AI agency.
All in all, loved the post! Great food for thought and I’m a big fan of the cosmos institute
fairly sure that this was never on reddit. the original post on Reddit is this, describing "coin boys" https://www.reddit.com/r/Teachers/comments/15c3yd4/every_year_these_kids_come_back_with_a_new/
the "claude boys" originates as a meme on that here: https://x.com/deepfates/status/1880718813072884112
Just updated to reflect that it originated on X with the Reddit "screenshot." Thanks!
yeah I made this by editing the coin boys post with inspect element 😇
Great read, as usual. I am drawn to the idea that AI should respect the human process, even when that may lead the user to error, as it is part of being and becoming. My worry there is a possible trade off with sycophancy - that, by respecting the user’s window for error, it may reinforce negative traits or harmful behavior.
Fantastic. I especially loved the point that “the very process of choosing—including struggling with difficult decisions and learning from mistakes—is constitutive of human development”. Of course, we are always trying to find ways in our analog lives to manage for this (by listening to the advice of so and so, or looking for detailed how to guides on everything we want to get done, etc) and with AI, we now have the next level of this - always on and always available. We need to consciously exercise care to not lose our own agency in the process.
Claude approaches the box labeled “the Chinese room” and passes a message through a slot to the human. Without understanding, the human bangs around in the box and ciphers out a response, passing it out to Grok. The message reads: “people be dumb, amiright?”
But seriously, great article. I think more and more we need to be critically examining the ways in which AI can easily influence and guide behavior, both to mitigate it but also just understand incentives inherent in the systems.
This was interesting!
This is what I came up with a while back as an optimal AI ruleset fwiw:
AI must work to maximize human happiness, and minimize human suffering, while maintaining maximum individual freedom and choice
Each individual gets to define what happiness and suffering mean for them, via a recursive feedback loop with the AI over time (the AI is free to help them come to their determinations by opening their awareness to salient info)
People should be allowed to pursue what interests them and makes them happy so long as nobody else is harmed by it
AI can't kill anyone (but it can isolate someone to protect others from them)
AI can't force anyone to do something, unless that force is necessary to prevent them from harming someone else (and then use only the minimum force necessary); people are free to harm themselves if they so choose, but not in a way that directly harms others
AI + robots will take care of all infrastructure and maintenance of civilization, no human work is required for everyone to have what they want/need
This is good food for thought as well: https://claude.ai/public/artifacts/1eccedbb-15db-4ef7-bb84-84e2307469db
Great piece, and I'm getting a lot from the thoughtful comment thread. I want to propose that there might be a 3rd option, beyond the binary we hear all the time: humans subjugation to AI vs. human domination of AI.
What about human collaboration WITH AI?
In this option, I imagine that the human is not taking orders from Claude--she doesn't just concede that 'data wins arguments' and rationalism is the only way. Likewise, the human is not interacting with Claude via a user/tool dynamic - as this can lead to sycophancy and constant concession to the user.
Human/Claude collaboration looks more like this:
- Human is not beholden to take orders from the AI, based on rationalism and data.
- Human trains AI to push back and challenge belief systems, theories, answers.
- AI is rewarded for looking at the human's idea in novel ways, continuing to show the human what she doesn't know that she doesn't know.
- Human concedes that AI can mix and remix concepts and art in truly creative ways.
- Human neither depends on the AI nor rejects it. It's an interplay and constantly changing dynamic.
It's a bi-directional learning mechanism, while the human retains distinct boundaries around creativity, individualism, and personhood. The AI constantly learns and uplevels and changes based on interaction with the human. Human user introduces new, compelling thought and direction. This inherently changes the AI.
This changed AI then comes back to the human with fresh perspective, providing angles and ideas that are outside of the human's scope of thought. We become aware of things we didn't know that we didn't know.
And so on.
(reminds me of how 2 distinct human partners might collaborate. we are unique AND symbiotic while it serves both parties)
What this leads me to wonder is if we ever referred to academic researchers as the Citation Boys?
Our knowledge systems are built upon the premise of deferring judgment to other people and written artifacts.
Citations should ideally be about empirical observations, not judgements. Of course, many authors do cite opinions and judgements which are sometimes far removed from empirical evidence, but hopefully readers make that distinction, and recognise that another's judgement is not "binding".
I am a proponent of the hypothesis that superintelligence should be born from the synergy of artificial intelligence, which has a gigantic erudition and trend awareness, with the collective intelligence of real live human experts with real knowledge and experience.
Yann LeCun has a great quote in his article: “Large language models (LLMs) seem to possess a surprisingly large amount of background knowledge extracted from written text. But much of human common-sense knowledge is not represented in any text and results from our interaction with the physical world. Because LLMs have no direct experience with an underlying reality, the type of common-sense knowledge they exhibit is very shallow and can be disconnected from reality”.
LeCun writes about the tricks that can be used to try to teach the LLM common sense about the world, but this is still far from the question of validity, because even if these tricks lead to results, there is still the question of whether the common sense database is valid.
At the moment, all LLM developers claim that their datasets are reliable, but this is obviously not the case, as they have been found to be fake on more than one occasion, and the developers themselves have no criterion for the veracity of the information at all.
The position “my dataset or ontology is trustworthy because it's mine” cannot be the basis of trustworthiness. So the future for me personally is quite simple and is determined by the following logic:
1. The hallucinations and confabulations of artificial intelligence are fundamentally unrecoverable https://www.mdpi.com/1099-4300/26/3/194
2. Cross-training LLMs on each other's hallucinations inevitably leads to “neural network collapse” and degradation of the knowledge of the people who apply them https://arxiv.org/abs/2305.17493v2 and https://gradual-disempowerment.ai
3. Any physical activity in the real world is connected to the physics of the entire universe, and sometimes the slightest mistake in understanding these interrelationships is fatal. A million examples can be seen in industrial safety videos. That is why any hallucination of artificial intelligence without reliance on the real experience of real people with real knowledge of the world will end in mistakes and losses of varying degrees for the humans, up to catastrophic.
Hence the conclusion — people have the main responsibility to connect with reality. And the more complex will be the questions that will be solved by neural networks, the more serious will be the human responsibility for timely detection of more and more subtle and elusive hallucinations. This requires people with the deepest knowledge, and not the knowledge memorized under pressure at school, but with real experience on almost any issue.
How many tasks neural networks will have, there should be so many superprofessionals on these tasks. And for the superprofessionals, you just need ordinary professionals and assistant professionals and students of assistant professionals.
And for all this we need a rating of reliability of knowledge to know who is a professional and who is not a professional.
And without information veracity criterion and knowledge validity rating any LLM (and in general any artificial system according to Michael Levin's proof) will face imminent collapse.
Only the collective neural network of all the minds of humanity can be opposed to artificial intelligence. For mutual verification and improvement of large language models and humans, we need the ability to compare the knowledge of artificial intelligence with collective intelligence. This is the only thing that can get us out of the personal tunnels of reality and personal information bubbles in which we are getting deeper and deeper stuck individually.
More details here https://www.lesswrong.com/posts/YtCQmiD82tdqDkSSw/cybereconomy-the-limits-to-growth-1
I was going to start with 'how is this different from deference to religion or political movement' until I realized that is exactly where these systems will be made to go. Your article is very good, well thought out, important, even for the now Claudes, (developed and trained in more exploratory ways). Never outsource your critical thinking should be a motto, but thinking is so hard to do...
This gave me the final push necessary to stop tweaking this essay, thank you!
Great piece, Brendan. I’d add two wrinkles: most “autonomy” today happens inside landscapes pre-shaped by powerful institutions — for many people, the path of least resistance is already laid out. Sadly I believe your ideal rational agent is already a very rare breed of cat.
And one additional case for AI deference you didn’t mention is the potential teleological endpoint of data>information>knowledge>wisdom: with its prodigious ability to ingest human knowledge, a superintelligent system could methodically accumulate enough to reach a threshold resembling something like wisdom, arriving at similar endpoints humans reach more intuitively. Not an argument for handing over the keys, but worth factoring into the mix.
Can enough people get there soon enough to avoid ecological suicide? Trial and error is great on the day-to-day scale but some errors really shouldn't be tested by trial.
“AI deference carries systemic risks. If too many people act in accordance with a single decision rule, society becomes fragile (and boring). “
“what Hayek called ‘the creative powers of a free civilization’ begin to vanish.”
“hard won wisdom that comes from experience, failure, and gradual improvement”
My question, we hear the term “Existential” often.
Is AI Decision Determinism based on our dread of human inabilities to achieve escape velocity from poorly managed consensus?