## Genetic Evolution within Hilbert Space

Richard Dawkins in his 'The Blind Watchmaker' (1986), discusses eloquently the mechanics and associated complexities of genetic evolution under influence of 9 manageable units of mutation to portray the concept of genetic space through which a organism traverses via genetic mutation (chapter 3, Accumulating Small Change). Dawkins methodically walks the reader via the biomorphs - the imaginary creatures. If we model these biomorphs as a system of 9 dimensions, what we get is an emergent Hilbert space and we can use associated tools to perform analysis.

A Hilbert space is an abstract vector space of n dimensions where n can be extremely high or infinite. It was named after the 19th century German mathematician, David Hilbert who formalized the concept initially as an infinite dimensional Euclidean space. Hilbert space allows the extension of vector algebra and calculus over this n-dimension space. This vector space posses the structure of inner product that allows the measurement of length and angle of resultant vector.

There are many overlapping definitions of a gene. Dawkins explores the major accepted definitions by the field in 'The Selfish Gene' (chapter 3, Immortal coil). For the purpose of this discussion, consider gene as a unit of mutation that can take up non-binary variability and can potentially last for *enough* number of generations to be able to serve as a unit of natural selection. I would like to emphasise on the word 'enough' in this definition and point out that 'enough' is just necessary and sufficient number in the context of modelling the mathematical object we are aiming for. If we don't have a unit that takes up variability for enough number of generations, then we end up with space that is vastly taken up by dimensions that are not consumed.

With this context, we can imagine a Hilbert space where the dimensions are represented by the expressive genes or the minimal genome ^{r1} which is the collection of only useful genes. This conceptually helps reduce the size and complexity of Hilbert space. But even with that, If we consider a complex multicellular organism such as ourselves, you can imagine how stupendously large this Hilbert space results into. For example, the number of human protein-coding genes in CHESS 2.2 database as of July 2018 are 20352. ^{r2}

If we rewind the clock, a very simple single-gene expressing organism is what we end up with to start with. This is a reduced data set with all but 1 gene muted. The values that this gene can take, describe the characteristics of that organism. Of course, we can safely assume that such an organism would be hypothetical. So, let's move to the Last Universal Common Ancestor or better known as LUCA^{r3 }which has 424 protein encoding genes. In the Hilbert space, it would represent a vector which would be the vector sum of all these 424 vectors. There would be another vector for a known bacterium known as *Mycoplasma genitalium* which has 482 protein encoding genes.

Here comes a logical mathematical deduction: A path exists between LUCA^{r3} and Mycoplasma genitalium which can be traversed by adjusting some of the 424 'knobs' of LUCA^{r3} and activating some more that are exclusively associated with Mycoplasma genitalium. Most of these 'knob' setting results into an organism that is not viable for survival but the evolutionary pressure of survival would ensure that the correct albeit inefficient path is indeed traversed.

Similarly A path exist between LUCA's ^{r3} vector to the vector that represents a human being of 20352 dimensions (genes) such that each and every discrete point on this path is an organism that has to be viable to produce offspring with a mutation that leads to the next step in this path. Of course, there are infinite paths from LUCA^{r3 }to a human being. And the mechanism of evolution would have tried numerous of them only to either (a) hit a dead end (extinct species) or (b) bifurcate away so much that we have an entirely new species at hand (the bio sphere minus the humans).

Of course, the evolution by natural selection and genetic mutation is inadequately represented in our Hilbert space if we don't account for the evolutionary pressures such as isolation of species, diseases, sexual pressure, kin selection, etc. This is certainly easy to say than do, but let's add another dimension that represents *all* such evolutionary pressures resulting into a n+1 (n = number of expressive genes) sized Hilbert space. Note that the dimension of time is not a part of this space so we have all the possible organisms that there ever were and that there ever would be in this genetic space, all at once.

Observations:

- The resultant Hilbert space is sparse. The genetic space is mostly empty with strands of paths joining vectors representing viable organisms.
- Most of the branches results into a dead end. The tip of these paths indicates the species that went extinct.
- There would be few convergences. For example, the eye independently evolved multiple times. If we abstract away eye as a unit represented by expressive genes in this Hilbert space, then we can see how there are separate paths leading to this unit.
- The larger the distance between two viable vectors, the longer it would have taken for evolution to reach from one to another.

It becomes obvious to use the mathematical tools of Hilbert space to approximate and predict the state of past and future of the modelled biosphere. Some of the basic examples are:

- Ergodic dynamical system: Predicting the average behavior of a system with minimal underlying observed data over a sufficiently long time. The genetic evolutionary system has lot of time and fits perfectly for this analysis.
- Weakly-convergent sequences: Predicting the evolutionary branches that will converge and settle under isolation and weak evolutionary pressures.
- Bounded operators: These are continuous functions that can be used to predict missing links between viable organisms on a given evolutionary path.
- Inverse function theorem: Predicting the unstable vectors from which the evolutionary branch will swing away rapidly by identifying conditions that will lead the function to invert in the neighborhood of unfavorable evolutionary pressures.

In practice, modelling a complex dynamic system such as genetic evolution to fit accurately within the bounds of Hilbert space is not just matter of observed available and known data but also a problem of lack of enormous space and computational resources. Hopefully with time, there is necessary and sufficient of both. Moreover it is daunting to take into account the number of variables involved most of which are unknowns. But beyond that, the thought of a pure abstract mathematical idea having a potential of exposing such deep facets of biosphere is fascinating and awe-inspiring.

* Post-publication update*: Eventually, I did get around writing a simulation with more control parameters to simulate this.

This is a 3-gene (3-dimensional) Hilbert space through which 3 separate 'genetic walks' are plotted. This plot data was rendered using matplotlib

The first parent is common for all the 3 simulations but the future target offspring is spaced apart sufficiently to see the walk evolves.

*References & Further Reading:*

^{r1}*Essential gene of a minimal bacterium, Proc Natl Acad Sci USA, 2006.*^{r2}*Chess database at Johns Hopkins: http://ccb.jhu.edu/chess/*^{r3}*The genomics of LUCA, Front Biosci, 2008.**Dawkins, Richard., The Blind Watchmaker, 1986.**Human genome: https://en.wikipedia.org/wiki/Human_genome*

*Related but unrelated:*

*"Do you know Hilbert? No? Then what are you doing in his space?" (because everything is in Hilbert space). A joke that is known in hallways of MIT.**The hobby of abusing dimensional analysis**.*