Scientific discovery as a computational search process

Do you know about the concept of Adjacent Possible (AP) anon? You should.

I keep thinking that perhaps it is critical to imagine scientific discovery as a -

computational search process in the space of adjacent possibles.

For instance, think about the origin of the general relativity where the curvature of space-time was first coined by Einstein, it was not a concept that existed. Mathematicians had explored curved spaces beforehand of course. So that link was novel- that gravity is a natural consequence of curved spacetime. A lot of constraints had to be satisfied ofcourse by new ideas. A theory must be testable after all. And it was tested and found to satisfy observations.

Computationally he got there by changing the forms of the equations. Symbolic manipulation. You can imagine a set of all things/ tricks Einstein knew: lets call this bag of tricks B which ofcourse context dependent (C) and temporal (t), so B(C,t). In that, Einstein knew where (C) to apply a trick (read context in LLM-speak and intution in human language: also computationally definable btw, see below) and tricks he knew at time t grows with ‘t’. The search process then of his first derivation of general relativity tensor equations, was a search in symbolic space (adjacent possible of known results) i.e. = known results + sequencial application of tricks {x_i} from bag of tricks B(t). This grows combinatorially as a function of graph distance i.e. number of steps from known results (S(0)=start), however, Einstein is context-aware (hes smart, his mind: sparse), he knows what is useful and his contraints eliminate the spurious searches, allowing for immediate stoppage in most search directions, early stoppage in others, until eureka(!). So, we can imagine that the search space (human one) has a shape, a manifold which is much-much lower dimensional than combinatorial. This is what interests me anon. Similar description was given by Demis Hassabis of the protein folding problem, solved by AlphaFold by discovery of the manifold on which genetically evolved protein structures live.

Anyways,

You can even imagine B as a operator or matrix on state S(0) i.e. known prior results, leading to probability distribution of S(1)= B(S(0) and so on. Intution aims to split B(C,t) into B(t) x I(C) where I is intutition based on context C, this is a sparse matrix which is made sparser by elimination of tested adjacent possibles which didn’t work, a dynamic context updated search matrix, made sparser after every exploration.

Probably the internal process of Einstein was very complex. It was a computational process of elimination and searching through the space of possibilities and then having some mental tests for himself that this has to be true and consistency with other known principles and facts at the time. So it's a search through the space while respecting these constraints, right? Now, each discovery is probably a computational process and it's beyond just recombination, which I was imagining it to be. It's not just that it's an impossible discovery through recombination. It's not so simple. It's a complex computational process of search and elimination and verifying and having constraints and respecting them. So let's keep that in mind. The idea is not here to go into this process, but I wanted to make a note of this.

Consider now Noether's theorems and how she was able to link each conservation law in nature to a symmetry. She brought two different concepts that were known together using a set of equations she wrote showing that if you start with the symmetry, then you can actually post that as a differential of the Lagrangian or the Hamiltonian or Action, and then that outputs a conservation law via symbolic manipuation (i.e. bag of tricks). So she tied two different concepts together by showing their equivalence in equational form. I'm sure she eliminated a lot of possibilities again, so it's searching through space from one set of concepts to other and then linking them through equations or logic. So it's again a computational process. Scientific discovery is a computational process, an error-correcting search process through space of possibilities.

And while it is hard to map this complexity, at least at this time, within the scope of my projects right now, I think it would be super cool if you could still come up with a way of labeling it just between low and high, and not just the exact number of the computational complexity and the search complexity of this solution or new discovery. Just knowing it's low or high would be massive, and I wonder if there's a way to do that. I think there is.

Lastly, I want to drop names: Assembly theory, Markov chains, and atypical recombinations of known lexicon / tokens, not gonna explain more but its related, if you know you know. How far a combination that appears in some major breakthrough is from that naive expectation… That's a solid number you can calculate given a prior, but it probably means very little. If you choose a very naive null model for underlying process of creation of new ideas: It would be like comparing Shakespeare to “monkeys typing shakespeare”, which is a very bad null model. Alternatively consider a human (not old Shakes-boi) typing it, a closer null model. But we should perhaps find better null models for creation of ideas. One better than the last, but all of them should be doable with the data we have: of past dsicoveries, books, papers etc. Another contraints on such as null mode is that it should be scalable. Until recently such a meta-approach to scientific discovery would have FAILED.

So, do you see why I am harping on about this NOW anon?

because LLMs make B tractable for a given problem, also make verification and elimination of possibilities that dont work possble, so B becomes sparser. This is scalable, i dont need thousands of scientists to do this. It is a new age. You can contruct this AP graph and parse it as an Reinforcement Learning (RL) environment. Imagine the possibilities anon.

Next
Next

The middle children of history [knowledge distillation]