• JUPYTER
  • FAQ
  • View as Code
  • Python 3 Kernel
  • View on GitHub
  • Execute on Binder
  • Download Notebook
  1. pytudes
  2. ipynb

Economics Simulation¶

This is a simulation of an economic marketplace in which there is a population of actors, each of which has a level of wealth. On each time step two actors (chosen by an interaction function) engage in a transaction that exchanges wealth between them (according to a transaction function). The idea is to understand the evolution of the population's wealth over time. I heard about the problem when I visited the Bard College Computer Science Department in 2008. Update: In 2017, a version posed by Uri Wilensky became popular. We cover his version below.

Why is this interesting?

  • It is an example of using simulation to model the world. The model is simple but captures some aspects of a complex world.
  • Many students will have preconceptions about how economies work that will be challenged by the results shown here.
  • It reveals subtle differences between computational thinking, mathematical thinking, and statistical thinking.

Population Distributions¶

We will model a population as a list of N numbers, each number being one actor's wealth. We'll start with a Gaussian distribution (also known as a normal distribution or bell-shaped curve), with a mean wealth of 100 simoleons and a standard deviation of 1/5 the mean:

In [1]:
import random

N  = 5000 # Default size of the population
MU = 100. # Default mean of the population

population = [random.gauss(mu=MU, sigma=MU/5) for _ in range(N)]

Population Statistics and Visualization¶

How evenly is the wealth in a population distributed? The traditional measure is the Gini coefficient, which Wikipedia says is computed by this formula (which assumes the y values are sorted):

Gini

A Gini index of 0 means total equality (everyone has the same amount), and values closer to 1 mean more inequality (most of the money in the hands of a few individuals). Here's a table of Gini coefficients for several countries:

Sweden 0.250
Canada 0.326
Switzerland 0.337
United States 0.408
Chile 0.521
South Africa 0.631

The Gini coefficient is traditionally computed over income, but we will be dealing with wealth. Here is the computation:

In [2]:
def gini(y):
    "Compute the Gini coefficient (a measure of equality/inequality) in a population, y."
    y = sorted(y)
    n = len(y)
    numer = 2 * sum((i+1) * y[i] for i in range(n))
    denom = n * sum(y)
    return (numer / denom) - (n + 1) / n
In [3]:
gini(population)
Out[3]:
0.11285526402866708

We'll define the function hist to plot a histogram of a population. Our hist wraps plt.hist, but with some specific keyword values:

In [4]:
%matplotlib inline
import matplotlib.pyplot as plt

def hist(population, label='pop', **kwargs):
    "A custom version of `hist` with better defaults."
    label = label + ': G=' + str(round(gini(population), 2))
    h = plt.hist(list(population), bins=30, alpha=0.5, label=label, **kwargs)
    plt.xlabel('wealth'); plt.ylabel('count'); plt.grid(True)
    plt.legend()
In [5]:
hist(population)
No description has been provided for this image

Transactions¶

In a transaction, two actors come together and exchange some of their wealth. For now we will use a wealth-conserving transaction function in which all the wealth from both actors is put into a pot, which is then split randomly and uniformly between the two actors:

In [6]:
def random_split(A, B):
    "Take all the money in the pot and divide it randomly between the two actors."
    pot = A + B
    share = random.uniform(0, pot)
    return share, pot - share
In [7]:
random_split(100, 100)
Out[7]:
(39.32170492026867, 160.67829507973133)

Interactions¶

How do we decide which parties interact with each other? We will define an interaction function that, given the size of the population, randomly selects any two actors in the populations (denoted by their index numbers in the list). We'll call this function anyone, meaning that any actor can interact with any other actor:

In [8]:
def anyone(N): return random.sample(range(N), 2)
In [9]:
anyone(N)
Out[9]:
[1593, 1439]

Simulation¶

The function simulate takes an initial population, calls an interaction function to select two actors, and a transaction function to split their wealth, and repeats this T times. After each transaction, we yield the population, so simulate yields the complete history of the simulation.

In [10]:
def step(population, transaction=random_split, interaction=anyone):
    "Modify the population by doing one transaction."
    i, j = interaction(len(population))
    population[i], population[j] = transaction(population[i], population[j]) 
    return population
    
def simulate(population, T, step=step, transaction=random_split, interaction=anyone):
    "Run simulation on population for T transactions; yield population at each time step."
    population = population.copy()
    yield population
    for t in range(T):
        yield step(population, transaction, interaction)

Here is a simple example of simulating a population of 4 actors for 8 time steps:

In [11]:
for pop in simulate([100] * 4, 8):
    print(pop)
[100, 100, 100, 100]
[100, 100, 166.07034848304477, 33.92965151695522]
[100, 129.37411310873642, 166.07034848304477, 4.555538408218797]
[100, 21.916292438917807, 166.07034848304477, 112.01335907803742]
[100, 21.916292438917807, 89.32137618847126, 188.7623313726109]
[88.88749715846127, 33.028795280456535, 89.32137618847126, 188.7623313726109]
[88.88749715846127, 0.008026672184755057, 122.34214479674304, 188.7623313726109]
[88.88749715846127, 0.008026672184755057, 147.86216819550745, 163.24230797384647]
[88.88749715846127, 54.73884147345122, 147.86216819550745, 108.51149317257999]

Simulation Visualization¶

If we want to do larger simulations we'll need a better way to visualize the results. The function show does that:

In [12]:
import statistics

def show(population, k=40, percentiles=(1, 10, 50, 90, 99), **kwargs):
    "Run a simulation for k*N steps, printing statistics and displaying a plot and histogram."
    N = len(population)
    start = list(population)
    results = [(t, sorted(pop)) # Sort results so that percentiles work
               for (t, pop) in enumerate(simulate(population, T=k * N, **kwargs))
               if t % (N / 10) == 0]
    # Printout:
    print('   t    Gini stdev' + (' {:3d}%' * len(percentiles)).format(*percentiles))
    print('------- ---- -----' + ' ----' * len(percentiles))
    fmt = '{:7,d} {:.2f} {:5.1f}' + ' {:4.0f}' * len(percentiles)
    for (t, pop) in results:
        if t % (k * N // 10) == 0: # Print 11 report lines (initial plus 10 as t varies)
            data = [percent(pct, pop) for pct in percentiles]
            print(fmt.format(t, gini(pop), statistics.stdev(pop), *data))

    plt.title('/'.join(map(str, percentiles)) + ' Percentile Plots')
    times = [t for (t, pop) in results]
    plt.hold(True); plt.xlabel('wealth'); plt.ylabel('time'); plt.grid(True)
    for pct in percentiles:
        line = [percent(pct, pop) for (t, pop) in results]
        plt.plot(line, times)
    plt.show()
    
    plt.title('Histograms')    
    R = (min(pop+start), max(pop+start))
    hist(start, 'start', range=R)
    hist(pop, 'end', range=R)
    plt.show()

    plt.title('Ordered Curves')
    order = list(range(len(pop)))
    start.sort()
    pop.sort()
    plt.plot(sorted(start), order, label='start')
    plt.plot(sorted(pop), order, label='end')
    plt.xlabel('wealth'); plt.ylabel('order'); plt.grid(True)
    plt.legend()

                
def percent(pct, items):
    "The item that is pct percent through the sorted list of items."
    return items[min(len(items)-1, len(items) * pct // 100)]
In [13]:
show(population)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.11  20.1   54   75  100  126  148
 20,000 0.50  99.4    1   11   71  231  454
 40,000 0.49  98.0    1   11   71  231  452
 60,000 0.50  99.1    1   10   69  233  441
 80,000 0.49  96.6    1   11   71  230  442
100,000 0.50 100.4    1   11   69  230  465
120,000 0.50 101.1    1   10   69  230  468
140,000 0.50 101.8    1   10   69  230  466
160,000 0.50 102.7    1   11   68  227  477
180,000 0.49  98.4    1   11   72  229  450
200,000 0.49  99.0    1   11   70  231  464
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

There are four parts to this output:

The printout: For the starting population and for every 20,000 transactions along the way, we print the Gini coefficient and standard deviation of the population, and the wealths at five percentile points in the population: the 1%, 10%, 50% (median), 90% and 99% marks.

The plots: This shows the same information as the printout (except for the Gini index), but with more data points along the way. The leftmost (blue) line is the 1% mark, the rightmost (purple) is the 99% mark, and the inner lines are the 10%, 50% and 90% marks, respectively. For the plot, time goes from bottom to top rather than top to bottom. So, the 99% (purple) line starts at around 150, and over time increases to over 400, indicating that the richest 1% are getting richer. The fact that the lines are going more or less straight up after about 50,000 transactions suggests that the system has converged.

The histograms: The starting and ending populations are plotted as histograms.

The ordered curves: Here the initial (blue) and final (green) populations are sorted, and the curves show wealth versus ordinal number. The poorest actor (ordinal number 0) has wealth 0 in both the initial and final populations. The 2000th poorest actor (a bit below the median; at the 40th percentile) has wealth of almost 100 in the initial population, but only about 50 in the final population.

The results show that income inequality is increasing over time. How can you tell? Because the Gini coefficient is increasing over time, the standard deviation is increasing, and the 1% and 10% marks are decreasing (the blue and olive lines are moving left as time increases) while the 90% and 99% marks are increasing (the aqua and purple lines are moving right as time increases).

Would the population continue to change if we let the simulation run longer? It looks like only the 1% line is changing, the other lines remain pretty much in one place from about T=15,000 to T=25,000. This suggests that running the simulation longer would not have too much effect.

Effect of Starting Population¶

What happens to the final result if we vary the starting population? I'll introduce the function samples to sample from a distribution function n times, normalizing the result to have the specified mean:

In [14]:
def samples(distribution, *args, n=N, mu=MU):
    "Sample from the distribution n times, then normalize results to have mean mu."
    numbers = [distribution(*args) for _ in range(N)]
    return normalize(numbers, mu)

def normalize(numbers, mu):
    "Make the numbers non-negative, and scale them so they have mean mu."
    numbers = [max(0, n) for n in numbers]
    factor = len(numbers) * mu / sum(numbers)
    return [x * factor for x in numbers]

Now we can easily make an initial population from a distribution function. I'll start with a uniform distribution:

In [15]:
show(samples(random.uniform, 0, 200))
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.33  57.9    2   19  102  181  200
 20,000 0.49  97.1    1   11   72  224  447
 40,000 0.50 102.7    1   10   69  229  486
 60,000 0.50 101.1    1   11   68  233  460
 80,000 0.50 101.0    1   10   69  229  490
100,000 0.51 101.7    1   10   67  234  465
120,000 0.50  99.7    1   10   70  229  462
140,000 0.50  99.5    1   11   71  229  444
160,000 0.51 102.6    1   10   66  235  469
180,000 0.50 102.3    1   10   68  229  455
200,000 0.51 102.3    1   10   70  232  472
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

And try a constant distribution, where everyone starts out the same:

In [16]:
def constant(mu=MU): return mu

show(samples(constant))
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.00   0.0  100  100  100  100  100
 20,000 0.48  94.0    1   11   72  226  424
 40,000 0.49  97.3    1   11   71  235  446
 60,000 0.50  98.1    1   11   68  233  443
 80,000 0.50  99.1    1   11   69  227  472
100,000 0.50  99.2    1   11   68  229  452
120,000 0.50 100.9    1   11   67  235  444
140,000 0.50 100.9    1   11   70  228  472
160,000 0.50 101.3    1   10   70  229  474
180,000 0.50 100.4    1   11   70  226  455
200,000 0.51 101.6    1    9   68  233  471
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

The resulting histogram looks different, but only because the starting distribution is so narrow and tall; the end distribution has a Gini coefficient of about 1/2 and standard deviation of about 100, just like we get from the other starting distributions.

Here is one that statisticians call the beta distribution (with carefully chosen parameters):

In [17]:
def beta(): return random.betavariate(0.9, 12)
    
show(samples(beta))
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.50  97.5    1    9   71  230  442
 20,000 0.50 101.2    1   10   68  237  474
 40,000 0.50  98.8    1   11   70  230  451
 60,000 0.50 100.6    1   11   68  232  460
 80,000 0.50  98.3    1   10   71  229  444
100,000 0.50  99.1    1   11   71  228  457
120,000 0.50  99.3    1   11   70  232  450
140,000 0.51 100.7    1   10   68  232  465
160,000 0.49  98.2    1   11   70  232  445
180,000 0.51 101.7    1   10   69  233  480
200,000 0.51 103.2    1   10   67  233  494
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Surprise: We can confirm that the starting population doesn't matter much. I thought it would make a real difference, but we showed that three very different starting populations—Gaussian, uniform, and beta—all ended up with very similar final populations; all with G around 1/2 and standard deviation around 100. The final distribution in all three cases looks similar to the normalized beta(0.9, 12) distribution.

Effect of Transaction Function¶

Does the transaction function have an effect on the outcome? So far we've only used the random_split transaction function; we'll now compare that to the winner_take_all function, in which the wealth from both actors is thrown into a pot, and one of them takes all of it:

In [18]:
def winner_take_all(A, B): return random.choice(([A + B, 0], [0, A + B]))
In [19]:
show(population, transaction=winner_take_all)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.11  20.1   54   75  100  126  148
 20,000 0.89 287.7    0    0    0  364 1448
 40,000 0.94 398.9    0    0    0  111 2135
 60,000 0.96 475.2    0    0    0    0 2576
 80,000 0.97 560.1    0    0    0    0 3066
100,000 0.97 629.9    0    0    0    0 3346
120,000 0.98 683.4    0    0    0    0 3569
140,000 0.98 729.2    0    0    0    0 3843
160,000 0.98 800.1    0    0    0    0 4144
180,000 0.99 859.6    0    0    0    0 4140
200,000 0.99 902.2    0    0    0    0 4140
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Now the results look very different: most of the wealth goes to the 99th percentile (purple line on the far right of the plot), with everybody else getting wiped out (although the 90th percentile holds out until around 50,000 transactions). The Gini coefficient is all the way up to 0.99 and the standard deviation is over 800, and still rising.

That makes sense: any time two actors with non-zero wealth interact, one of them will end up with zero—the number of actors with zero wealth increases monotonically until all the wealth is with one actor, and from then on the wealth just gets swapped around.

At the other end of the spectrum, let's try a transaction function, redistribute, that taxes both parties 31% (the average income tax rate in the US) and splits that tax revenue evenly among the two parties; the non-taxed part is split with random_split:

In [20]:
def redistribute(A, B, rate=0.31):
    "Tax both parties at rate; split the tax revenue evenly, and randomly split the rest."
    tax = rate * (A + B)
    Arand, Brand = random_split(A + B - tax, 0)
    return tax / 2 + Arand, tax / 2 + Brand
In [21]:
show(population, transaction=redistribute)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.11  20.1   54   75  100  126  148
 20,000 0.32  59.8   18   36   88  183  285
 40,000 0.33  62.1   16   34   86  187  289
 60,000 0.33  61.9   18   35   85  185  297
 80,000 0.33  60.8   17   35   87  181  287
100,000 0.33  61.7   17   34   87  183  297
120,000 0.33  61.2   18   35   88  183  291
140,000 0.33  61.1   16   35   87  185  291
160,000 0.33  61.6   17   35   86  186  294
180,000 0.33  61.6   17   35   86  183  301
200,000 0.33  61.3   17   34   87  187  295
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Another surprise: This transaction function does indeed lead to less inequality than split_randomly or winner_take_all, but surprisingly (to me) it still increases inequality compared to the initial (Gaussian) population.

Here's one more interaction function, status_quo, in which both actors keep half of their wealth out of the transaction, and the other half is randomly split using a triangular distribution in such a way that the most likely outcome is that each actor keeps what they started with, but from there probability falls off on either side, making larger and larger deviations from the status quo less and less likely:

In [22]:
def status_quo(A, B):
    "A transaction that is most likely to leave things unchanged, but could move any amount of wealth around."
    a = random.triangular(0, (A + B) / 2, A / 2)
    return (A / 2 + a), (A + B) - (A / 2 + a)
In [23]:
show(population, transaction=status_quo)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.11  20.1   54   75  100  126  148
 20,000 0.21  38.5   34   56   95  152  210
 40,000 0.22  40.5   31   53   95  155  222
 60,000 0.22  40.3   31   53   95  155  217
 80,000 0.23  41.3   31   52   95  154  222
100,000 0.23  41.8   32   52   94  158  224
120,000 0.23  41.0   32   53   95  157  223
140,000 0.23  40.9   31   53   94  155  216
160,000 0.23  41.2   31   53   94  156  223
180,000 0.23  40.9   31   52   95  156  219
200,000 0.22  40.6   31   53   95  155  219
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

The status_quo transaction increases inequality from the initial population, but not as much as the other transaction functions.

Effect of Interaction Function¶

We have been using anyone as our interaction function: anyone can enter into a transaction with anyone else. Suppose that transactions are constrained to be local—that you can only do business with your close neighbors. Will that make income more equitable, because there will be no large, global conglomorates?

In [24]:
def neighborhood(n, width=5): 
    "Choose two agents in the same neighborhood"
    i = random.randrange(n - width)
    return random.sample(range(i, i + width + 1), 2)
In [25]:
show(population, interaction=neighborhood)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.11  20.1   54   75  100  126  148
 20,000 0.48  94.6    1   12   73  223  431
 40,000 0.49  96.2    1   11   71  232  428
 60,000 0.49  95.4    1   11   73  226  426
 80,000 0.50  98.6    1   11   69  234  449
100,000 0.49  98.4    1   11   71  228  465
120,000 0.49  96.7    1   11   72  233  445
140,000 0.50  99.3    1   10   70  229  460
160,000 0.49  97.8    1   11   70  231  442
180,000 0.50  98.3    1   11   70  233  461
200,000 0.49  96.4    1   11   72  226  437
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Surprise: The neighborhood interaction is not too different from the anyone interaction.

Let's get even more local, allowing trade only with your immediate neighbor (to either side):

In [26]:
def adjacent(n): return neighborhood(n, 1)
    
show(population, interaction=adjacent)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.11  20.1   54   75  100  126  148
 20,000 0.45  85.3    1   13   79  218  373
 40,000 0.47  90.1    1   11   76  224  407
 60,000 0.48  92.0    1   12   73  228  404
 80,000 0.48  92.5    1   12   73  231  406
100,000 0.48  94.3    1   12   71  227  421
120,000 0.48  94.1    1   12   73  226  437
140,000 0.49  95.7    1   12   71  236  421
160,000 0.49  97.5    1   11   71  231  454
180,000 0.48  95.0    1   12   72  227  434
200,000 0.50  97.8    1   11   70  235  442
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

It is still surprising that we still have no effect from restricting trade.

United States Distribution¶

We've drawn from mathematical distributions; let's look at the actual distribution of family income in the United States. Each row in the following table is a tuple giving the lower bound and upper bound (in thousands of dollars of income), followed by the cumulative percentage of families in the row or a previous row. The table I got this from actually had "$250,000 or above" as the final row; I had to cut it off somewhere, and arbitrarily chose $300,000.

In [27]:
USA_table = [  
  (0,    10,  7.63),
  (10,   20, 19.20),
  (20,   30, 30.50),
  (30,   40, 41.08),
  (40,   50, 49.95),
  (50,   60, 57.73),
  (60,   70, 64.56),
  (70,   80, 70.39),
  (80,   90, 75.02),
  (90,  100, 79.02),
  (100, 110, 82.57),
  (110, 120, 85.29),
  (120, 130, 87.60),
  (130, 140, 89.36),
  (140, 150, 90.95),
  (150, 160, 92.52),
  (160, 170, 93.60),
  (170, 180, 94.55),
  (180, 190, 95.23),
  (190, 200, 95.80),
  (200, 250, 97.70),
  (250, 300, 100.0)]

def USA():
    "Sample from the USA distribution."
    p = random.uniform(0, 100)
    for (lo, hi, cum_pct) in USA_table:
        if p <= cum_pct:
            return random.uniform(lo, hi) 

Let's see what it looks like:

In [28]:
hist(samples(USA), label='USA')
No description has been provided for this image

Hey—that looks like the beta distribution. Let's compare:

In [29]:
hist(samples(beta), label='beta')
hist(samples(USA), label='USA')
No description has been provided for this image
In [30]:
show(samples(USA))
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.45  88.3    2   18   75  217  420
 20,000 0.50 100.3    1   10   69  228  451
 40,000 0.50 102.5    1   11   69  229  468
 60,000 0.50 101.6    1   11   69  231  481
 80,000 0.50 100.6    1   10   71  223  480
100,000 0.50 100.5    1   11   69  229  457
120,000 0.50  99.0    1   11   70  228  465
140,000 0.50  99.2    1   10   71  225  462
160,000 0.50 101.5    1   10   67  232  461
180,000 0.50 100.1    1   10   70  237  448
200,000 0.49  98.1    1   12   71  228  455
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

The USA distribution is indeed similar to the beta(0.9, 12) distribution, and to the stationary ending distribution.

Uri Wilensky Version¶

Another version of this simulation made the rounds in 2017. This version has these rules:

Imagine a room full of 100 people with 100 dollars each. With every tick of the clock, every person with money gives a dollar to one randomly chosen other person. After some time progresses, how will the money be distributed?

To implement this all-at-once transactions rather than one-at-a-time transactions, I'll define a new step function:

In [31]:
def give_dollar(population, transaction, interaction):
    "Everyone with at least a dollar gives a dollar to a random person."
    N = len(population)
    for i in range(N):
        if population[i] >= 1:
            population[i] -= 1
            population[random.randrange(N)] += 1
    return population

show([100] * 100, k=200, step=give_dollar)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.00   0.0  100  100  100  100  100
  2,000 0.24  43.4   16   48   93  165  208
  4,000 0.33  58.7    3   27   96  185  247
  6,000 0.37  66.1    2   15   99  188  326
  8,000 0.38  68.2    6   13   91  178  368
 10,000 0.40  72.5    2   15   85  200  321
 12,000 0.42  75.7    1   12   92  210  351
 14,000 0.41  74.2    3   15   79  230  295
 16,000 0.43  78.9    2   18   83  220  354
 18,000 0.45  84.9    5   16   78  227  424
 20,000 0.50  93.7    2    7   70  230  395
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

So we see that inequality rises from a Gini of 0 to about 1/2 over 20,000 time steps. Let's try again starting from the first 100 actors from our initial (Gaussian) population:

In [32]:
show(population[:100], k=200, step=give_dollar)
   t    Gini stdev   1%  10%  50%  90%  99%
------- ---- ----- ---- ---- ---- ---- ----
      0 0.12  19.4   58   71   95  119  151
  2,000 0.29  48.1   10   35   91  169  205
  4,000 0.39  66.5    3   13   88  187  329
  6,000 0.40  67.7    5   11   97  194  283
  8,000 0.39  65.8    3   13   87  179  274
 10,000 0.38  64.5    3   21   84  187  294
 12,000 0.40  71.8    5   27   78  187  376
 14,000 0.44  80.8    2   13   70  201  460
 16,000 0.47  84.3    2   11   80  193  443
 18,000 0.45  81.7    5   13   79  202  421
 20,000 0.45  80.6    0   10   79  204  400
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

A Mathematician, a Statistician, and a Programmer walk into a problem ...¶

In 2013, mathematician George Andrew's editorial Drowning in the Data Deluge complains of an "overblown enthusiasm for data analysis." The tone was that this new fad for "big data" was taking away from traditional mathematics.

Two Stanford professors, mathematician Persi Diaconis and statistician Susan Holmes, were more accepting of new ideas. The three of us got to discussing Andrew's editorial and the differences between mathematical, statistical, and computational thinking. At the time, I had just heard about the economics problem covered in this notebook, and I suggested the three of us work on it, and compare approaches and results. In the end, all three of us found similar results, in that we all identified the shape of the final distribution. But there were differences in how we got there:

Mathematical thinking (Persi):

  • Tool: Paper and pencil.
  • Notes: The process can be modeled by a Markov chain, using the same techniques as in this paper. In the limit, there is a difference between the continuous case (where money is a real number, and the distribution is stationary), and the discrete case (where money comes in integer amounts and the distribution is not ergodic as there is an absorbing state).

Statistical thinking (Susan):

  • Tool: Simulation in R, with N=10 actors for T=1,000 transactions.
  • Notes: this is extremely similar to what happens for random genetic drift in generations, that also gives clumping (and extinction of some of the alleles).

Computational thinking (Peter):

  • Tool: Simulation in IPython, with N=5000 actors for T=200,000 transactions.
  • Notes: Results are very similar to Susan's, but with more variations explored and more pretty pictures.

This website does not host notebooks, it only renders notebooks available on other websites.

Delivered by Fastly, Rendered by OVHcloud

nbviewer GitHub repository.

nbconvert version: 7.16.6

Rendered (Mon, 01 Dec 2025 10:28:29 UTC)