MeThinks it is like a Weasel!
In 'Nature of the Physical World' (1926), Arthur Edington writes:
"If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favorable than the chance of the molecules returning to one half of the vessel."
He is using the popularly known 'infinite monkey theorem' to show how improbable a specific arrangement of molecules can be which are in quadrillions in the container of his thought experiment, as compared to all the letters of all the words of all the books in the British museum combined.
The 'infinite monkey theorem' talks about producing the complete works of William Shakespeare from a group of finite number of monkeys but infinite time. Dawkins on trying to explain evolution by natural selection, rescues those Monkeys from the enormous task of producing the complete works of William Shakespeare and reduces it to just one sentence from the Hamlet instead:
“I don’t know who it was first pointed out that, given enough time, a monkey bashing away at random on a typewriter could produce all the works of Shakespeare. The operative phrase is, of course, given enough time. Let us limit the task facing our monkey somewhat. Suppose that he has to produce, not the complete works of Shakespeare but just the short sentence ‘Methinks it is like a weasel’, and we shall make it relatively easy by giving him a typewriter with a restricted keyboard, one with just the 26 (capital) letters, and a space bar. How long will he take to write this one little sentence?”
Since then, many programmers have programmed this.
I took this further by adding more controllability to simulate various evolutionary pressures. The four main control knobs that I implemented are:
- Evolution space: Control the evolutionary space gap between initial parent and last child (target)
- Mutation stability: Stabilize mutation when the group reaches a stable size by controlling the mutation probability.
- Range-constrained mutation: Limit mutation to occur within configurable constrained range for the mutable unit.
- Generational fitness: Control generational fitness by varying the number of children per generation.
Evolution space: In one of the other posts, I have discussed about the Hilbert space, and how an evolutionary pathway within Hilbert space of n-dimensions where n is the number of expressive genes. To simulate the variety of these pathways, I can configure the first parent and the last desired child wherever I want within this evolutionary space and let the algorithm find a path between them. Isn't it amazing that I won't be able to simulate a path twice?
Unit of mutation:
Each character of the string can randomly mutate to any of the following 91 values:
Superset ∈ [' ',','!','"','#','$','%','&',''','(',')','*','+',',','-','.','/','0','1','2','3','4','5','6','7','8','9',':',';','<','=','>','?','@','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','[','\',']','^','_','`','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
The probability with which each character mutates is 1/91
. For a random string of size n
, the probability to mutate to the desired target is (1/91)^n
. For the target at hand, that turns out to be 1.4E-55
. That's a stupendously small number. The various configuration parameters helps to navigate this enormous space and reach the target in much lesser iteration count. Dawkins notes this in chapter 3 of the the blind watchmaker: "For the number of biomorphs in the land is half a trillion, and if no one of them is any more probable as a destination than any other, the odds of jumping to any particular one are small enough to ignore".
Deviation Index:
- In the original Dawkin's implementation, the measure of fitness of the string is defined as the sum of number of characters that are in correct position as the target string. So, the fittest string would have target score of it's length. In case of the string 'METHINKS IT IS A WEASEL', that would be 28. In one of my earlier implementations, I used a percentage based scoring similar to this approach (I preserved that in this github branch), but later changed it to an open ended measure that I call 'deviation index'.
- Deviation index (devIdx) is a measure of fitness of the candidate string at hand. It is w.r.t the target string. It is sum of difference of ASCII values of each character & the character at the same index of the target string.
Simulation Plots:
- This application generates 2 python plot files per simulation. The python script requires matplotlib to generate the plots. The first plot is the deviation index vs the generation plot and the second is a 3 dimensional plot plotting the 'evolutionary walk' of the first 3 genes of each parent string.
- These are the plot related configurations in the app.config:
plot.config.generatePlot=true
: To generate plots, set this to true.plot.config.addSimulationParameters=true
: This adds simulation parameters as text to the plot file.
Simulation test case 1 (base case): length=28; n/gen = 100; random seed for parent
This is the most basic simulation with all the control parameters turned off. The target is configured via:
config.target = 'METHINKS IT IS A WEASEL'
The first parent is a random string the same length as the target. Each time the simulation is run, the first parent gets constructed through a random generator to be of same size as the target.
Using this first parent, n offspring strings are created. The number of offspring strings per generation are controlled and configured via this key.
sim.config.numberOfChildrenPerGeneration
Each of this offspring has a random mutation in any of single place. The deviation index of each of the offspring is calculated and the one with lowest deviation index is selected as parent for the next generation.
This is the trace from the run. The 522nd child was the desired target with this set of control parameters.
Target: METHINKS IT IS LIKE A WEASEL. Length: 28
Parent:5Oulqgm0+^E!0sn`kp+19*tnkdTw | Deviation Idx:759 | Median dIdx:29
...
Generation: 100: Choosing child: METHINLS!IT!IS LIKE A WEASEL | dIdx:3 | Vulnerable genes:28/28
Generation: 200: Choosing child: METHINLS IR IS LIJE A UEATFL | dIdx:8 | Vulnerable genes:28/28
Generation: 300: Choosing child: MESHINKS HU IS LIKE @!WFATEL | dIdx:7 | Vulnerable genes:28/28
Generation: 400: Choosing child: MDTHINKS IT IS LIKE A VCASEM | dIdx:5 | Vulnerable genes:28/28
Generation: 500: Choosing child: NETHINKS IT IS LIJE A WE@SEL | dIdx:3 | Vulnerable genes:28/28
Generation: 522: Last child: METHINKS IT IS LIKE A WEASEL | dIdx:0 | Vulnerable genes:0
Reducing numberOfChildrenPerGeneration to 65 from 100 increases the numbers of generations required to reach target from 522 to 537913. This happens because of the probability to get healthy string from a given generation decreases with fewer number of children per generation. This gets compensated by having more number of generations. I don't have a powerful enough machine to render the plot for this simulation. But here is the corresponding configuration & trace.
#Simulation configuration
#sim.config.firstParent=!!!!!!!!!!!!!!!!!!!!!!!!!!!!
sim.config.target=METHINKS IT IS LIKE A WEASEL
sim.config.stableStrandSize=28
sim.config.stableStrandMutationProbability=0.05
sim.config.doRangeControlledMutation=false
sim.config.mutationRange=5
sim.config.numberOfChildrenPerGeneration=65
sim.config.printEveryNthGeneration=100
#Plot configuration
plot.config.generatePlot=true
plot.config.addSimulationParameters=true
...
Generation: 537500: Choosing child: MDTHINKX JU IS!LIKE @ WF@TFL | dIdx:14 | Vulnerable genes:28/28
Generation: 537600: Choosing child: NETFINKS!HT HS LKKG A WEAVEL | dIdx:13 | Vulnerable genes:28/28
Generation: 537700: Choosing child: METIIMKT IT HS"KIJD A WEATFL | dIdx:11 | Vulnerable genes:28/28
Generation: 537800: Choosing child: METHINKU IS!JS LILF A WEBSEL | dIdx:8 | Vulnerable genes:28/28
Generation: 537900: Choosing child: MFTHINKT IT IS LILE A WEAVEL | dIdx:6 | Vulnerable genes:28/28
Generation: 537913: Last child: METHINKS IT IS LIKE A WEASEL | dIdx:0 | Vulnerable genes:0
Simulation test case 2: Base case with maximum space separation via pre-configured parent & target.
To simulate a long walk through the evolutionary Hilbert space requires parent and target strings as far as possible in their Hilbert space. The parent string can be configured via sim.config.firstParent
.
In this test case, rather than truly computing a parent which would be farthest from the target, I selected a far-enough candidate string which has '!' in all of its 28 locations.
Simulation test case 3: Mutation stability unit
Mutation stability: Mutation can happen anywhere. The purpose of this configuration is to allow a group of genes to gain stability by decreasing the probability of their mutation. Both, the size of the group and the probability of the mutation of this group members can be configured via following:
- config.stableStrandSize
- config.stableStrandMutationProbability
Simulation test case 4: Range constrained mutation:
Range-constrained mutation: This configuration prevents 'big' jumps of a mutable unit. Consider a string sitting in the Hilbert space. A random mutation in a random location of this string will relocate this string randomly anywhere along the axis of that location.
Simulation test case 5: Entity with only 3 mutable units
This test case is purely for the convenience of plotting. It is only possible to plot in 3 dimensions. Why not just have entities with that many mutable units (genes).
Source code and simulation scripts:
Other exploratory branches:
References & Further Reading:
- Dawkins, Richard., The Blind Watchmaker, 1986.
- Weasel applet: https://web.archive.org/web/20080514063924/http://home.pacbell.net/s-max/scott/weasel_alt.html
- Essay 'The Panda’s Thumb', by Gould., Stephen: "Evolution can be more strongly supported by evidence of telling imperfections than by evidence of perfection."
Related but unrelated:
- 'The Origin of Species', Darwin, Charles: "If it could be demonstrated that any complex organ existed which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down."
- 'The Blind Watchmaker'; Dawkins, Richard: "Someone close to me has had a cataract operation in both eyes. She has no lenses in her eyes at all. Without glasses she couldn’t even begin to play lawn tennis or aim a rifle. But she assures me that you are far better off with a lensless eye than with no eye at all."
- "The vast majority of theoretical trajectories through animal (Hilbert) space give rise to impossible monsters. Real animals are dotted around here and there among the hypothetical monsters, each perched in its own unique place in genetic hyperspace. Each real animal is surrounded by a little cluster of neighbours, most of whom have never existed, but a few of whom are its ancestors, its descendants and its cousins."
- "Kangaroos and horses arrived at different endpoints in ‘animal space’, probably because of some accidental difference in their starting points."
- William Shakespeare's Hamlet: