Can AI Make Scientific Breakthroughs?
Tacit Knowledge is Essential for Discovery
This essay is by Iulia Georgescu, a physicist and independent scholar researching the history of computational physics, and Venkatesh Narayanamurti, Emeritus Professor of Technology and Public Policy, Engineering and Applied Sciences and Physics at Harvard University.
In May 1825 Johann Peter Eckermann transcribed his conversation with Johann Wolfgang von Goethe which was later published in the book Gespräche mit Goethe. An excerpt from the exchange between the two poets is the popular quote: “It is by seeking and blundering that we learn.”
While the aphorism represents a piece of wisdom that resonates with many of us, the forgotten context is somewhat unexpected. Rather than articulating a deep philosophical insight, Goethe was in fact commenting on Eckermann’s knowledge of the best types of wood and most appropriate techniques for crafting a good archery bow.
Goethe referred to Eckermann’s expertise as “the lively kind of knowledge which is attained only in a practical way.” Today we call it tacit knowledge, which despite being notoriously difficult to define and quantify, plays an essential role in advancing the frontiers of scientific research.
A narrow, yet increasingly popular, view of research is the following: read scientific papers, generate hypotheses, test them, write more papers. The appeal this picture has for AI companies, funders and publishers is clear: research is easy to automate, outputs are easy to quantify and monetize (with automation more articles can be produced and published). While the latter is certainly true and there is a deluge of AI-generated scientific articles, many of dubious quality), the automation of research has not yet produced any major breakthroughs.
If large language models have ingested most of the scientific literature which can now be parsed in ways no individual or collective of human researchers could in their lifetimes, why is it that no ground-breaking discoveries have yet emerged?
First, the read-generate-hypothesis-test-write view is not how research works in practice. As other creative human activities, research is a social, complex and often inefficient process that is hard to describe through a linear sequence of block diagrams.
Second, the scientific textual record is only part of the story. The other part is transmitted through oral tradition and lived experience and is key to making progress.
Embodied knowledge or deep craft?
To define tacit knowledge, we need to first acknowledge that science cannot be decoupled from the technology which enables it, be it a 19th century microscope or a 21st century supercomputer. The technoscientific method introduced in the 2021 book The Genesis of Technoscientific Revolutions proposes that “science draws on technology to discover new facts, while technology draws on science to invent new forms with which to fulfil human-desired functions.” From this perspective technoscientific knowledge consists of networks of question-answer pairs that are combined and evolved into new pairs expanding the domain of what is known.
Philosopher Sabina Leonelli identified three types of epistemic skills that underpin research and specific knowledge associated with these, namely: theoretical (such as as facts, theories, explanations) and embodied knowledge (the awareness of how to act and reason as required to pursue scientific research, combining the application of performative and social skills). Similar ideas have been articulated before, for example by Michael Polanyi (1958) who referred to it as the tacit component in the context of ‘personal knowledge’.
Following Polyani, Thomas Kuhn stressed on the importance of tacit knowledge in skilled scientific practice. In his 2009 book, economist W. Brian Arthur called it deep craft:
“Deep craft is more than knowledge. It is a set of `knowings’. Knowing what is likely to work and what not to work. Knowing what methods to use, what principles are likely to succeed, what parameter values to use in a given technique. Knowing whom to talk to down the corridor to get things working, how to fix things that go wrong, what to ignore, what theories to look to.”
These definitions of tacit knowledge have two things in common: there is an embodied-performative dimension (the “lively kind of knowledge” Goethe suggested can only be attained in a practical way) and a social dimension (through the scientific culture of the discipline and the social network of its practitioners). Tacit knowledge is not only associated with experimental practice. Sociologist Harry Collins noted that “all types of knowledge, however pure, consist, in part, of tacit rules which may be impossible to formulate in principle.”
To some extent tacit knowledge can be codified and documented, but, as Collins alludes to, some part of it is ineffable and cannot completely become explicit. This view is shared by many scholars, but is not a firmly settled point.
Tip of the iceberg
Science is sometimes identified with the formalized, codified knowledge, while technology is seen as more difficult to formalize. On this view science is transmitted through writing in the form of scientific articles and textbooks while technology is partially captured through patents and administered through artefact-level trial and error. The distinction is unhelpful because there is much unformalized knowledge in science and much codified knowledge in technology. In both cases, there is more knowledge than meets the eye.
Imagine an iceberg. Its tip is the theoretical knowledge largely formalized and recorded in writing in journals, books, conference proceedings and preprints. There’s a lot of it, mostly digitized, discoverable and accessible in some form. Dimensions indexes some 170 million publications back to the 17th century. Preprints, patents, clinical trials and policy documents are also part of the formalized technoscientific knowledge and are to a large extent findable and accessible. For example, the arXiv preprint server (a repository for physics, computer science, math, and related disciplines) hosts over 3 million articles going back to 1991.
Just under the water level lies so-called “grey literature” which includes technical reports and documentation, lab notebooks, technical manuals, presentations, logs, technical notes. It’s hard to discover and is only partially accessible. Although more of it is being digitized and indexed, the extent of this type of knowledge is unknown. In the deeper, darker waters there is another layer of technoscientific knowledge that is not digitized, indexed, or archived and only goes down as living memory. This is the tacit knowledge discussed before.

The iceberg view, while not drawn to any reasonable scale, illustrates how much knowledge is inaccessible to humans (and machines). This missing information is often seen as one of the underlying factors of the irreproducibility of some scientific studies. Collins reported well-documented examples of the role of tacit knowledge in being able to repeat experimental measurements published in scientific papers. This shouldn’t be surprising. Published articles are not the full story of discovery, they are sanitized, post-facto narratives. One of Collins’ interviewees pointed out that “What you publish in an article is always enough to show that you’ve done it, but never enough to enable anyone else to do it.”
Scientific articles cannot include all the technical details and assumptions made and rely heavily on the reader’s familiarity with the topic. They almost never document the failures and dead ends. “What didn’t work” is never part of the formalized, recorded knowledge (yet is a big component of the deep craft).
The fragile memory of failure
In his original quote, Goethe uses Irren (translated as blundering), which means to err, mistake, or go astray. An important part of the embodied knowledge is the knowing of what works through the knowing of what does not. This type of `knowing’ is the outcome of hands-on experience enabling the encounter with failures and dead-ends.
The knowledge of failures and dead-ends informs assumptions and choices, guides directions of inquiry and enables the understanding of new discoveries. Philosopher and historian of science Jutta Schickore has argued that error plays epistemically productive roles in scientific practice and there are many anecdotes that support this view. The book Cycles of Invention and Discovery attributes part of the success of Bell Labs to the “freedom to fail” as a feature of the organizational culture.
Popular accounts of great mistakes of great scientists abound. Lord Kelvin miscalculated the age of the Earth or Einstein discarded the cosmological constant he had himself introduced only to be later proved wrong in doing so (see Mario Livio’s book Brilliant Blunders (2013).
Because except for famous scientists’ (in)famous errors, failure is seldom documented, this knowledge only exists in the social networks of technoscientific microcultures. To access it one has to be part of the network, which is closely related with social skills and being part of the club.
In practice this can mean talking to technicians over lunch (technicians are important keepers of tacit knowledge) or discussing with other researchers at the poster session of a conference, asking questions over beer at the end of the day. Its sharing is exclusivist and for this reason those who are not part of the club have a disadvantage. And the lack of record and oral transmission makes the knowledge of failure, and of tacit knowledge more generally, very fragile. When the postdoc leaves the lab, or the technician retires the information is partially lost.
Without tacit knowledge, can AI truly learn?
Back to the iceberg view. How much of this knowledge is available for training AI models? It’s very difficult to say. Only some 35% of the publications indexed in the scientific database Dimensions are Open Access. Including the content from publishers who have public deals with AI companies we get a bit under 50%. (Of course, this does not account for copies of paywalled content existing in some form on the Internet, which is a subject of litigation.)
Unlike the top of the iceberg as we go deeper it is impossible to estimate how much grey literature exists and how much is available to train AI models. As for tacit knowledge, whose depth is impossible to measure, AI has little access to it, mainly through the assumptions made by its developers and the users’ prompts. (see a discussion of the role of tacit knowledge in a specific context).
What could be problematic is that a frontier large language model trained on the technoscientific literature will only “know” the “what” that worked, but not the full story of “why” and “how”. It will have theoretical knowledge, but very little embodied or experience-based and social or contextual knowledge.
Can AI really learn without having experienced failure? We can imagine a thought experiment in which one model is trained on a specific scientific domain solely on “what worked,” while another model’s training also includes “what did not work.” How would they perform against the same benchmark? The experiment is certainly feasible and if done well, could provide interesting insights about the importance of mistakes.
It’s hard to predict how serious the limitation of current AI models due to the gaps in their training will prove to be. Much like grey literature is sometimes converted into discoverable and accessible literature (for example, a technical report is published as a preprint or a book chapter), some tacit knowledge can be recorded and indexed.
That can be important information that for some reason nobody bothered to put down (the rationale behind a particular design choice, or underlying assumptions for the pre-processing of raw experimental data) and there is no intrinsic barrier to prevent its formalization. Documenting failure could (and should) become common practice. We believe AI tools could help in this regard, for example using AI-assisted electronic lab notebooks and AI agents that can collate information across lab notebooks and draft guides to common failures.
Scientists increasingly interact online, sharing virtual workspaces and exchanging ideas in different online forums. With more scientific activity happening in the digital world, AI models can become more embedded in the technoscientific culture and access the social/contextual knowledge with possible positive outcomes such as recording the fragile tacit knowledge and opening researchers’ access to it.
Some aspects of the embodied/experience-based knowledge can be recorded, for example by filming experimental procedures. JoVE (Journal of Visualised Experiments) is a peer reviewed scientific video journal. This is very useful in disciplines in which laboratory equipment and experimental procedures are standardized, however as anyone who has tried to learn DiY tricks from YouTube videos can confirm, seeing it done is not the same as doing it yourself.
Ultimately, robots will have to learn the skill by practicing in the lab, much like the human PhD students. This point was also made by Collins in the essay “What is tacit knowledge?”
Capturing tacit knowledge is, and will continue to be, challenging because of the social and economic pressures that work against doing so. Tacit knowledge gives a competitive advantage and often there is little to no incentive to share it. This is true for both companies and for academic research groups. There are further discipline-specific barriers such as the lack of standardized procedures and methodology. So, despite having the tools to capture and describe the methods, progress is slow.
Finally, there is one discipline that stands out as an apparent exception: mathematics. It boasts many recent successes in using AI methods (the latest being LLMs cracking some of the famous problems compiled by Paul Erdős) that seem to defy the claim that we are still awaiting ground-braking results. While the jury is still out regarding what this means for mathematicians, one of the reasons for this success is that mathematics research is among the most formalized and best documented among scientific disciplines, albeit this is not to say that there is no tacit knowledge in mathematics.
Seeking and blundering
Recall one last time the iceberg view. AI models trained on the tip of the iceberg clearly supersede the ability of any human researcher to cover that amount of knowledge. A human expert’s coverage of the tip may be very narrow and localized but it also drills down deep like an ice core. This deep and narrow knowledge may appear insignificant compared to the volume of the top of the iceberg but there are many researchers and the combined volume of their ice cores can be substantial. We can imagine that merging AI models’ breadth with this collective human depth could potentially lead to ground-breaking discoveries.
We’re not there yet. There is a possible route for achieving this, but to pursue it we need to move beyond the read-generate hypothesis-test-write view of technoscientific research. To start, we need to acknowledge the profound social, creative, unscheduled nature of research, illustrated for example by the success of Bell Labs.
We might merge AI’s breadth with the collective human depth in a system in which humans and machines work closely together, AI embodied and present in all aspects of research like a human collaborator. This literally means everything: doing experiments in the lab, fieldwork, long observation runs, presenting at conferences and debating with peers, having coffee with colleagues, supervising students, etc.
In this way the AI system can be immersed in the technoscientific culture and be able to experience and absorb tacit knowledge. This is much more than currently envisaged in the discussions about AI co-scientists, but it’s also much more difficult to realize for practical and ethical reasons. And it will take time.
Until then, AI will continue to struggle to make truly novel discoveries in a wide range of scientific domains for the same reason humans have succeeded in doing so for hundreds of years: it is by seeking and blundering that we learn.
Cosmos Institute is the Academy for Philosopher-Builders, technologists building AI for human flourishing. We run fellowships, fund AI prototypes, and host seminars with institutions like Oxford, Aspen Institute, and Liberty Fund.




In the spirit of "What could be problematic is that a frontier large language model trained on the technoscientific literature will only “know” the “what” that worked, but not the full story of “why” and “how” " ... perhaps it's not the lack of tacit knowledge that is limiting AI's ability to "make scientific breakthroughs" ... perhaps it its the dogmatic training biases we have instilled in it with "wrong" knowledge, or the inability to think as nature has intended which is mankind's innate strength ... see https://tinyurl.com/startwiththeanswer
We just finished a massive review of Tacit Knowledge, I invite you to take a look: https://curriculumredesign.org/wp-content/uploads/Tacit-Knowledge-CCR.pdf
Contact me if you wish, Charles.Fadel@CurriculumRedesign.org