In 'Nature of the Physical World' (1926), Arthur Edington writes:
"If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favorable than the chance of the molecules returning to one half of the vessel."
He is using the popularly known 'infinite monkey theorem' to show how improbable a specific arrangement of molecules can be which are in quadrillions in the container of his thought experiment, as compared to all the letters of all the words of all the books in the British museum combined.
The 'infinite monkey theorem' talks about producing the complete works of William Shakespeare from a group of finite number of monkeys but infinite time. Dawkins on trying to explain evolution by natural selection, rescues those Monkeys from the enormous task of producing the complete works of William Shakespeare and reduces it to just one sentence from the Hamlet instead:
“I don’t know who it was first pointed out that, given enough time, a monkey bashing away at random on a typewriter could produce all the works of Shakespeare. The operative phrase is, of course, given enough time. Let us limit the task facing our monkey somewhat. Suppose that he has to produce, not the complete works of Shakespeare but just the short sentence ‘Methinks it is like a weasel’, and we shall make it relatively easy by giving him a typewriter with a restricted keyboard, one with just the 26 (capital) letters, and a space bar. How long will he take to write this one little sentence?”
Since then, many programmers have programmed this.
I took this further by adding more controllability to simulate various evolutionary pressures. The four main control knobs that I implemented are:
Evolution space: In one of the other posts, I have discussed about the Hilbert space, and how an evolutionary pathway within Hilbert space of n-dimensions where n is the number of expressive genes. To simulate the variety of these pathways, I can configure the first parent and the last desired child wherever I want within this evolutionary space and let the algorithm find a path between them. Isn't it amazing that I won't be able to simulate a path twice?
Unit of mutation:
Each character of the string can randomly mutate to any of the following 91 values:
Superset ∈ [' ',','!','"','#','$','%','&',''','(',')','*','+',',','-','.','/','0','1','2','3','4','5','6','7','8','9',':',';','<','=','>','?','@','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','[','\',']','^','_','`','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
The probability with which each character mutates is 1/91
. For a random string of size n
, the probability to mutate to the desired target is (1/91)^n
. For the target at hand, that turns out to be 1.4E-55
. That's a stupendously small number. The various configuration parameters helps to navigate this enormous space and reach the target in much lesser iteration count. Dawkins notes this in chapter 3 of the the blind watchmaker: "For the number of biomorphs in the land is half a trillion, and if no one of them is any more probable as a destination than any other, the odds of jumping to any particular one are small enough to ignore".
Deviation Index:
Simulation Plots:
plot.config.generatePlot=true
: To generate plots, set this to true.plot.config.addSimulationParameters=true
: This adds simulation parameters as text to the plot file.This is the most basic simulation with all the control parameters turned off. The target is configured via:
config.target = 'METHINKS IT IS A WEASEL'
The first parent is a random string the same length as the target. Each time the simulation is run, the first parent gets constructed through a random generator to be of same size as the target.
Using this first parent, n offspring strings are created. The number of offspring strings per generation are controlled and configured via this key.
sim.config.numberOfChildrenPerGeneration
Each of this offspring has a random mutation in any of single place. The deviation index of each of the offspring is calculated and the one with lowest deviation index is selected as parent for the next generation.
This is the trace from the run. The 522nd child was the desired target with this set of control parameters.
Target: METHINKS IT IS LIKE A WEASEL. Length: 28
Parent:5Oulqgm0+^E!0sn`kp+19*tnkdTw | Deviation Idx:759 | Median dIdx:29
...
Generation: 100: Choosing child: METHINLS!IT!IS LIKE A WEASEL | dIdx:3 | Vulnerable genes:28/28
Generation: 200: Choosing child: METHINLS IR IS LIJE A UEATFL | dIdx:8 | Vulnerable genes:28/28
Generation: 300: Choosing child: MESHINKS HU IS LIKE @!WFATEL | dIdx:7 | Vulnerable genes:28/28
Generation: 400: Choosing child: MDTHINKS IT IS LIKE A VCASEM | dIdx:5 | Vulnerable genes:28/28
Generation: 500: Choosing child: NETHINKS IT IS LIJE A WE@SEL | dIdx:3 | Vulnerable genes:28/28
Generation: 522: Last child: METHINKS IT IS LIKE A WEASEL | dIdx:0 | Vulnerable genes:0
Reducing numberOfChildrenPerGeneration to 65 from 100 increases the numbers of generations required to reach target from 522 to 537913. This happens because of the probability to get healthy string from a given generation decreases with fewer number of children per generation. This gets compensated by having more number of generations. I don't have a powerful enough machine to render the plot for this simulation. But here is the corresponding configuration & trace.
#Simulation configuration
#sim.config.firstParent=!!!!!!!!!!!!!!!!!!!!!!!!!!!!
sim.config.target=METHINKS IT IS LIKE A WEASEL
sim.config.stableStrandSize=28
sim.config.stableStrandMutationProbability=0.05
sim.config.doRangeControlledMutation=false
sim.config.mutationRange=5
sim.config.numberOfChildrenPerGeneration=65
sim.config.printEveryNthGeneration=100
#Plot configuration
plot.config.generatePlot=true
plot.config.addSimulationParameters=true
...
Generation: 537500: Choosing child: MDTHINKX JU IS!LIKE @ WF@TFL | dIdx:14 | Vulnerable genes:28/28
Generation: 537600: Choosing child: NETFINKS!HT HS LKKG A WEAVEL | dIdx:13 | Vulnerable genes:28/28
Generation: 537700: Choosing child: METIIMKT IT HS"KIJD A WEATFL | dIdx:11 | Vulnerable genes:28/28
Generation: 537800: Choosing child: METHINKU IS!JS LILF A WEBSEL | dIdx:8 | Vulnerable genes:28/28
Generation: 537900: Choosing child: MFTHINKT IT IS LILE A WEAVEL | dIdx:6 | Vulnerable genes:28/28
Generation: 537913: Last child: METHINKS IT IS LIKE A WEASEL | dIdx:0 | Vulnerable genes:0
To simulate a long walk through the evolutionary Hilbert space requires parent and target strings as far as possible in their Hilbert space. The parent string can be configured via sim.config.firstParent
.
In this test case, rather than truly computing a parent which would be farthest from the target, I selected a far-enough candidate string which has '!' in all of its 28 locations.
Mutation stability: Mutation can happen anywhere. The purpose of this configuration is to allow a group of genes to gain stability by decreasing the probability of their mutation. Both, the size of the group and the probability of the mutation of this group members can be configured via following:
Range-constrained mutation: This configuration prevents 'big' jumps of a mutable unit. Consider a string sitting in the Hilbert space. A random mutation in a random location of this string will relocate this string randomly anywhere along the axis of that location.
This test case is purely for the convenience of plotting. It is only possible to plot in 3 dimensions. Why not just have entities with that many mutable units (genes).
Other exploratory branches:
References & Further Reading:
Related but unrelated: