Peter Norvig
June 2021

Split the States¶

The 538 Riddler for 11 June 2021 poses this problem (paraphrased):

Given a map of the lower 48 states of the USA, remove a subset of the states so that the map is cut into two disjoint contiguous regions that are near-halves by area. Since Michigan’s upper and lower peninsulas are non-contiguous, you can treat them as two separate "states," for a total of 49. "Disjoint" means the two regions are distinct and not adjacent (due to the cut).

Since "near-halves" is vague, answer the two questions:

What states should you remove to maximize the area of the smaller of the two regions?
What states should you remove to minimize the difference of the areas of the two regions?

Vocabulary terms¶

Let's start by clarifying some concepts:

State: denoted by the standard 2-letter abbreviations like 'CA' (plus 'UP' and 'LP' for the Michigan peninsulas).
States: a set of states; implemented as a (hashable) frozenset. I can use states('OR CA').
Region: a set of states that are contiguous—they are all connected by a single tree of neighbor relations.
Neighbor: a relation saying two states share a border. Implemented as adjacency sets in the dict neighbors.
Cut: a set of states that, when removed from the map, cuts the map into disjoint regions.
Split: a tuple (A, B, C), where A and B are the two regions made by the cut C, with A larger than B in area.
Border: the states on the edge of the map (neighbors of Canada, Mexico, Atlantic, or Pacific).
Area: Each state has an area, given (in square miles) by the areas dict. A region has a total area, given by the function area.

For example, one cut is {NM, OK, MO, IL}, which partitions the states into groups A and B like this (ignore AK and HI):

Credit goes to mapchart.net, which is the only map-coloring site I've found that allows you to split Michigan.

Here is the code to implement the vocabulary concepts (with the help of two sites that had data on state's neighbors and area):

In [1]:

from typing import *

State  = str         # Two-letter abbrerviation
States = frozenset   # A set of states
Region = States      # A contiguous set of states (must be hashable)
Split  = Tuple[Region, Region, Region] # (A, B, C) = (large region, small region, cut)

def states(string: str)  -> States: "Set of states";  return States(string.split())
def statedict(**dic)     -> dict:   "{State:States}"; return {s: states(dic[s]) for s in dic}
def area(states: States) -> int:    "Total area";     return sum(areas[s] for s in states)

neighbors = statedict(
    AK='', AL='FL GA MS TN', AR='LA MO MS OK TN TX', AZ='CA CO NM NV UT', CA='AZ NV OR', 
    CO='AZ KS NE NM OK UT WY', CT='MA NY RI', DC='MD VA', DE='MD NJ PA', FL='AL GA', 
    GA='AL FL NC SC TN', HI='', IA='IL MN MO NE SD WI', ID='MT NV OR UT WA WY', IL='IA IN KY MO WI', 
    IN='IL KY LP MI OH', KS='CO MO NE OK', KY='IL IN MO OH TN VA WV', LA='AR MS TX', 
    MA='CT NH NY RI VT', MD='DC DE PA VA WV', ME='NH', MI='IN OH WI', MN='IA ND SD WI', 
    MO='AR IA IL KS KY NE OK TN', MS='AL AR LA TN', MT='ID ND SD WY', NC='GA SC TN VA', 
    ND='MN MT SD', NE='CO IA KS MO SD WY', NH='MA ME VT', NJ='DE NY PA', NM='AZ CO OK TX UT', 
    NV='AZ CA ID OR UT', NY='CT MA NJ PA VT', OH='IN KY LP MI PA WV', OK='AR CO KS MO NM TX', 
    OR='CA ID NV WA', PA='DE MD NJ NY OH WV', RI='CT MA', SC='GA NC', SD='IA MN MT ND NE WY', 
    TN='AL AR GA KY MO MS NC VA', TX='AR LA NM OK', UT='AZ CO ID NM NV WY', VA='DC KY MD NC TN WV', 
    VT='MA NH NY', WA='ID OR', WI='IA IL MI MN UP', WV='KY MD OH PA VA', WY='CO ID MT NE SD UT', 
    UP='WI', LP='IN OH')


areas = dict(
    AK=665384, AL=52420,  AZ=113990, AR=53179, CA=163695, CO=104094, CT=5543,  DE=2489,   DC=68, 
    FL=65758,  GA=59425,  HI=10932,  ID=83569, IL=57914,  IN=36420,  IA=56273, KS=82278,  KY=40408, 
    LA=52378,  ME=35380,  MD=12406,  MA=10554, MI=96714,  MN=86936,  MS=48432, MO=69707,  MT=147040, 
    NE=77348,  NV=110572, NH=9349,   NJ=8723,  NM=121590, NY=54555,  NC=53819, ND=70698,  OH=44826, 
    OK=69899,  OR=98379,  PA=46054,  RI=1545,  SC=32020,  SD=77116,  TN=42144, TX=268596, UT=84897, 
    VT=9616,   VA=42775,  WA=71298,  WV=24230, WI=65496,  WY=97813,  UP=16377, LP=80337)  

# Borders:
north  = states('WA ID MT ND MN WI MI UP IL IN LP OH PA NY VT NH ME') 
south  = states('CA AZ NM TX LA MS AL FL')                   
west   = states('WA OR CA')
east   = states('ME NH MA RI CT NY NJ DE MD VA NC SC GA FL')
border = north | south | west | east

# "Countries":
usa   = States(areas)                      # 50 states plus UP, LP, DC
usa50 = usa   - states('DC UP LP')         # 50 actual US states
usa48 = usa50 - states('AK HI')            # 48 contiguous states
usa49 = usa   - states('DC AK HI MI')      # 49 "states": MI split into UP, LP
western = states('WA OR CA ID NV UT AZ MT WY CO NM') # The 11 states west of the Rockies

Strategy for answering the two questions¶

My overall strategy:

Generate a large number of cuts.
For each cut C, determine the split into regions A and B.
From the valid splits, find the ones that:
1. Maximize the area of B
2. Minimize the difference in area of A and B.

Making cuts¶

Is it feasible to consider all possible cuts? A cut is a subset of the 49 states, so there are 2⁴⁹ or 500 trillion possible cuts, so no, we can't look at them all. I have four ideas to reduce the number of cuts considered:

Limit the total area in a cut. A large area in the cut means there won't be much area left to make B big for question 1. By default, I'll limit the area of the cut to be 1/5 the area of the country.
Limit the number of states in a cut. Similarly, if there are too many states in a cut, there won't be many left for A or B. By default, the limit is 8.
Make cuts contiguous. Noncontiguous cuts can't be optimal for question 1, so for now I won't consider them (although they can provide a good answer for question 2).
Make cuts go border-to-border. A cut can produce exactly two regions only if (a) the cut runs from one place on the border to another place on the border or (b) the cut forms a "donut" that surrounds some interior region. The US map isn't big enough to support a decent-sized donut (only KS and NE are not neighbors of a border state). Also, the US map seems to be too wide to have a good East-West cut, so I'll start with cuts that go from the northern border to the southern border.

The function make_cuts starts by building a set of partial cuts where each cut initially contains a single start state. Then in each iteration of the while loop, it yields any cut that has reached an end state, and creates a new set of cuts formed by adding a neighboring state to a current cut in all possible ways, as long as the area does not exceed maxarea and the size does not exceed maxsize.

In [2]:

def make_cuts(country, maxsize=8, maxarea=area(usa48)//5, start=north, end=south) -> Iterator[Region]:
    """All contiguous regions up to `maxsize` and `maxarea` that contain a `start` and `end` state."""
    cuts = {Region({s}) for s in start & country} 
    while cuts:
        yield from filter(end.intersection, cuts) # A cut is good if it contains an `end` state.
        cuts = {cut | {s1}
                   for cut in cuts if len(cut) < maxsize 
                   for s in cut 
                   for s1 in (neighbors[s] & country) - cut
                   if area(cut) + areas[s1] <= maxarea}

For example, the north-south cuts of size up to 3 states:

In [3]:

set(make_cuts(usa49, 3))

Out[3]:

{frozenset({'AZ', 'ID', 'UT'}),
 frozenset({'AZ', 'ID', 'NV'}),
 frozenset({'CA', 'ID', 'NV'}),
 frozenset({'CA', 'OR', 'WA'}),
 frozenset({'ID', 'NM', 'UT'}),
 frozenset({'CA', 'ID', 'OR'})}

How many cuts are there in usa49 (using the default parameter values)?

In [4]:

len(list(make_cuts(usa49)))

Out[4]:

That's a more manageable number than 500 trillion.

Making splits¶

Given some candidate cuts, the function make_splits creates splits: tuples (A, B, C), where A and B are the two regions defined by the cut C, with A larger than B. A split requires that the cut divides the country into exactly two regions, but not all cuts do that. For example, the cut {WA, OR, CA} leaves only one region (consisting of all the other states). The cut {ID, OR, NV, AZ} leaves three regions: {WA}, {CA}, and {everything else}. To verify whether the cut makes two regions, make_splits first finds a maximal-size contiguous region A from the non-cut states, then finds a region B from the remaining states, then checks that A and B are non-empty and make up all the non-cut states.

We find the extent of a region with the contiguous function, which implements a flood fill algorithm: it maintains a mutable region and a frontier of the states that neighbor the region but are not in the region. We iterate adding the frontier into the region and computing a new frontier until there is no new frontier.

In [5]:

def make_splits(country, cuts: Iterable[Region]) -> Iterator[Split]:
    """For each cut C, find regions A and B and yield the Split (A, B, C) if valid."""
    for C in cuts:
        AB = country - C
        A = contiguous(AB) 
        B = contiguous(AB - A)
        if A and B and (A | B | C == country):
            B, A = sorted([A, B], key=area) # Ensure A larger than B in area
            yield (A, B, C)
            
def contiguous(states: States) -> Region:
    """Starting at one of the states, expand out to all contiguous states; return them."""
    region   = set() 
    frontier = {min(states)} if states else None
    while frontier:
        region |= frontier
        frontier = {s1 for s in frontier for s1 in neighbors[s]
                    if s1 in states and s1 not in region}
    return Region(region)

For example, here are the size-3 cuts of the western states, and the splits (A, B, C) from those cuts:

In [6]:

list(make_cuts(western, 3))

Out[6]:

[frozenset({'AZ', 'ID', 'UT'}),
 frozenset({'AZ', 'ID', 'NV'}),
 frozenset({'CA', 'ID', 'NV'}),
 frozenset({'CA', 'OR', 'WA'}),
 frozenset({'CA', 'ID', 'OR'}),
 frozenset({'ID', 'NM', 'UT'})]

In [7]:

list(make_splits(western, make_cuts(western, 3)))

Out[7]:

[(frozenset({'CO', 'MT', 'NM', 'WY'}),
  frozenset({'CA', 'NV', 'OR', 'WA'}),
  frozenset({'AZ', 'ID', 'UT'})),
 (frozenset({'CO', 'MT', 'NM', 'UT', 'WY'}),
  frozenset({'CA', 'OR', 'WA'}),
  frozenset({'AZ', 'ID', 'NV'})),
 (frozenset({'AZ', 'CO', 'MT', 'NM', 'UT', 'WY'}),
  frozenset({'OR', 'WA'}),
  frozenset({'CA', 'ID', 'NV'})),
 (frozenset({'AZ', 'CO', 'MT', 'NM', 'NV', 'UT', 'WY'}),
  frozenset({'WA'}),
  frozenset({'CA', 'ID', 'OR'}))]

The answers¶

The function answers puts it all together: makes cuts; makes splits from those cuts; finds the splits that answer the two questions; and prints the results. For A, B, and C, it prints the total area, the percentage of the country's area, the number of states in the region, and the set of states in the region. Then it prints the delta between A and B's area.

In [8]:

def answers(country, maxsize=8, maxarea=area(usa49)//5, start=north, end=south) -> Optional[tuple]:
    """Find the splits that answer the 2 questions.
    Print information in pretty format if 'print' is in `do`.
    Return the tuple (cuts, splits, answer1, answer2) if 'return' is in `do`."""
    cuts    = list(make_cuts(country, maxsize, maxarea, start, end))
    splits  = list(make_splits(country, cuts))
    answer1 = max(splits, key=lambda s: area(s[1]))
    answer2 = min(splits, key=lambda s: area(s[0]) - area(s[1]))
    print(f'{len(country)} states ⇒ {len(cuts):,d} cuts',
          f'(cut size ≤ {maxsize}, cut area ≤ {maxarea:,d}) ⇒ {len(splits):,d} splits.')
    show('1. Split that maximizes area(B)',               country, answer1)
    show('2. Split that minimizes ∆ = area(A) - area(B)', country, answer2)
    
def show(title, country, split) -> None:
    """Print a title, and a summary of the split in four rows. The columns shown are:
    'region name|area|percent of country area|number of states in region|states in region'.
    The ∆ row of the table is not a region; it is the difference in area between A and B."""
    A, B, C = split
    def print_row(name, region, sqmi): 
        statelist = f'{len(region):2d}|{" ".join(sorted(region))}' if region else ''
        print(f'{name}|{sqmi:9,d}|{sqmi/area(country):7.3%}|{statelist}')
    print(f'\n{title}:')
    print_row('A', A,  area(A))
    print_row('B', B,  area(B))
    print_row('C', C,  area(C))
    print_row('∆', '', area(A) - area(B))

In [9]:

answers(usa49)

49 states ⇒ 41,700 cuts (cut size ≤ 8, cut area ≤ 624,072) ⇒ 13,146 splits.

1. Split that maximizes area(B):
A|1,345,558|43.122%|29|AL AR CT DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV
B|1,344,149|43.077%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY
C|  430,653|13.801%| 5|CO IL MO NE NM
∆|    1,409| 0.045%|

2. Split that minimizes ∆ = area(A) - area(B):
A|1,267,033|40.605%|14|AZ CA IA ID MN MT ND NV OR UP UT WA WI WY
B|1,266,994|40.604%|27|AL AR CT DE FL GA KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TX VA VT WV
C|  586,333|18.791%| 8|CO IL IN MO NE NM SD TN
∆|       39| 0.001%|

Here are maps of those two cuts, representing the best answers to questions 1 and 2 with my default parameter values:

Improving on question 1¶

The {CO,IL,MO,NE,NM} cut gave us two regions with 43% of the area each. But we limited cuts to 8 states, so this may not be the best possible result. If we looked at all possible cuts, the run time would be too long. Fortunately, we can tighten the area constraint. We saw above that the cut {CO,IL,MO,NE,NM} produces a region B with area 1,344,149, so that means that any cut that is better for question 1 must create a split where the areas of both A and B are greater than 1,344,149, Therefore, we can reduce the maxarea for the cut from area(usa49)/5 (which is 624,072) down to:

In [10]:

maxarea = area(usa49) - 2 * 1344149
maxarea

Out[10]:

With the tight area constraint, we don't need the restictions on maxsize, start, and end. This will be the first computation that takes more than a few seconds.

In [11]:

%time answers(usa49, maxsize=49, maxarea=maxarea, start=border, end=border)

49 states ⇒ 547,779 cuts (cut size ≤ 49, cut area ≤ 432,062) ⇒ 42,685 splits.

1. Split that maximizes area(B):
A|1,345,558|43.122%|29|AL AR CT DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV
B|1,344,149|43.077%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY
C|  430,653|13.801%| 5|CO IL MO NE NM
∆|    1,409| 0.045%|

2. Split that minimizes ∆ = area(A) - area(B):
A|1,345,558|43.122%|29|AL AR CT DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV
B|1,344,149|43.077%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY
C|  430,653|13.801%| 5|CO IL MO NE NM
∆|    1,409| 0.045%|
CPU times: user 36.3 s, sys: 437 ms, total: 36.8 s
Wall time: 37.3 s

This confirms that {CO,IL,MO,NE,NM} is indeed the optimal cut for question 1.

Improving on question 2¶

For question 1, good solutions necessarily have a small cut, but for question 2 we might need a big cut. I'll try going up to 10 states:

In [12]:

%time answers(usa49, maxsize=10, maxarea=1_000_000)

49 states ⇒ 1,180,878 cuts (cut size ≤ 10, cut area ≤ 1,000,000) ⇒ 266,863 splits.

1. Split that maximizes area(B):
A|1,345,558|43.122%|29|AL AR CT DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV
B|1,344,149|43.077%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY
C|  430,653|13.801%| 5|CO IL MO NE NM
∆|    1,409| 0.045%|

2. Split that minimizes ∆ = area(A) - area(B):
A|1,161,198|37.214%|29|AL CT DE FL GA IA IL IN LP MA MD ME MN NC ND NH NJ NY OH PA RI SC SD TN UP VA VT WI WV
B|1,161,195|37.213%|10|AZ CA CO KS LA NM OR TX UT WA
C|  797,967|25.573%|10|AR ID KY MO MS MT NE NV OK WY
∆|        3| 0.000%|
CPU times: user 1min 3s, sys: 1.05 s, total: 1min 4s
Wall time: 1min 4s

This gives us two regions that differ by only 3 square miles, which seems pretty good!

Could we can find two regions with exactly equal areas? It would be inefficient to enumerate all cuts for much more than 10 states. Instead I'll create two lists of regions As and Bs. I can then determine if there are two regions of equal size in time proportional to the sum of their lengths (not the product) by creating a table of {area: region} entries for all the As and checking if any of the Bs have an area that is in the table. (We also need to check that the two regions are disjoint. They are likely to be so because I start one in the east and one in the west, but we still need to check.) Here are two lists of regions:

In [13]:

make_regions = make_cuts # The function `make_cuts` can be used to make regions

As = list(make_regions(usa49, 5, start=west, end=usa49))
Bs = list(make_regions(usa49, 7, start=east, end=usa49))

How many regions are in each list?

In [14]:

a, b = len(As), len(Bs)
f'There are {a} regions in As and {b} regions in Bs and {a * b:,d} pairs of regions'

Out[14]:

'There are 376 regions in As and 22971 regions in Bs and 8,637,096 pairs of regions'

If the average region area is about a million square miles, there's a pretty good chance that one of the 8 million pairs will consist of an A and a B with exactly equal areas. It's like buying 8 million lottery tickets with random 6-digit numbers. Let's check:

In [15]:

def find_equal(As: List[Region], Bs: List[Region]) -> Iterator[Tuple[Region, Region, int]]:
    """From As and Bs, find disjoint regions A and B that have the exact same area. Yield (A, B, area(B))."""
    area_table = {area(A): A for A in As}
    for B in Bs:
        if area(B) in area_table:
            A = area_table[area(B)]
            if A.isdisjoint(neighborhood(B)):
                yield (A, B, area(B))

def neighborhood(A: Region) -> Region:
    """The region consisting of A and all the neighbors of any state in A."""
    return A | {s1 for s in A for s1 in neighbors[s]}

In [16]:

list(find_equal(As, Bs))

Out[16]:

[(frozenset({'CA', 'NV', 'UT'}),
  frozenset({'GA', 'IA', 'KY', 'MO', 'MS', 'TN', 'VA'}),
  359164),
 (frozenset({'ID', 'NV', 'WA'}),
  frozenset({'KY', 'MS', 'NY', 'PA', 'TN', 'VT', 'WV'}),
  265439),
 (frozenset({'ID', 'OR', 'UT', 'WY'}),
  frozenset({'GA', 'IA', 'IL', 'IN', 'MO', 'TN', 'VA'}),
  364658),
 (frozenset({'AZ', 'CA', 'OR'}),
  frozenset({'IL', 'IN', 'KS', 'MO', 'OH', 'TN', 'VA'}),
  376064),
 (frozenset({'ID', 'UT', 'WA', 'WY'}),
  frozenset({'AL', 'IL', 'IN', 'KY', 'TN', 'VA', 'WI'}),
  337577),
 (frozenset({'AZ', 'CA', 'OR'}),
  frozenset({'IA', 'IL', 'KY', 'NE', 'SD', 'VA', 'WV'}),
  376064)]

There are 6 solutions right away (and probably many more solutions with larger regions). I'll show the map for one of them (the smallest one, with area 265,439 square miles for both regions). Note that the cut (in yellow) is not contiguous, but that's in accord with the rules: all that matters is that A and B are contiguous and disjoint.

Enfranchising DC¶

Would anything change if we made DC a state (besides, obviously, the voting rights of the citizens)?

In [17]:

answers(usa49 | {'DC'})

50 states ⇒ 43,600 cuts (cut size ≤ 8, cut area ≤ 624,072) ⇒ 13,134 splits.

1. Split that maximizes area(B):
A|1,345,626|43.123%|30|AL AR CT DC DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV
B|1,344,149|43.076%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY
C|  430,653|13.801%| 5|CO IL MO NE NM
∆|    1,477| 0.047%|

2. Split that minimizes ∆ = area(A) - area(B):
A|1,267,062|40.605%|28|AL AR CT DC DE FL GA KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TX VA VT WV
B|1,267,033|40.604%|14|AZ CA IA ID MN MT ND NV OR UP UT WA WI WY
C|  586,333|18.790%| 8|CO IL IN MO NE NM SD TN
∆|       29| 0.001%|

We've seen these cuts before, but for question 2, adding DC's 68 square miles to the eastern region means that it is now only 29 square miles larger than the western region (previously it was 39 square miles smaller).

Reuniting Michigan¶

What if Michigan counts as one state, rather than two separate penninsulas?

In [18]:

answers(usa48)

48 states ⇒ 41,735 cuts (cut size ≤ 8, cut area ≤ 624,072) ⇒ 18,642 splits.

1. Split that maximizes area(B):
A|1,348,646|43.221%|30|AL CT DE FL GA IA IL IN KY MA MD ME MI MN MT NC ND NH NJ NY OH PA RI SC SD TN VA VT WI WV
B|1,341,666|42.997%|12|AZ CA CO KS LA NM NV OK OR TX UT WA
C|  430,048|13.782%| 6|AR ID MO MS NE WY
∆|    6,980| 0.224%|

2. Split that minimizes ∆ = area(A) - area(B):
A|1,267,816|40.630%|13|AZ CA ID KS MN MT ND NE NV OR SD UT WA
B|1,267,672|40.626%|28|AL AR CT DE FL GA IL IN KY LA MA MD ME MI MS NC NH NJ NY OH PA RI SC TN TX VA VT WV
C|  584,872|18.744%| 7|CO IA MO NM OK WI WY
∆|      144| 0.005%|

The results are not as good (I think because splitting MI allows IL and IN to be north border states rather than interior states).

Cuts with smallest number of states?¶

If we are restricted to four-state cuts, the {IL, MO, NM, OK} cut is best:

In [19]:

answers(usa49, 4, start=border, end=border)

49 states ⇒ 1,237 cuts (cut size ≤ 4, cut area ≤ 624,072) ⇒ 384 splits.

1. Split that maximizes area(B):
A|1,607,869|51.528%|18|AZ CA CO IA ID KS MN MT ND NE NV OR SD UP UT WA WI WY
B|1,193,381|38.245%|27|AL AR CT DE FL GA IN KY LA LP MA MD ME MS NC NH NJ NY OH PA RI SC TN TX VA VT WV
C|  319,110|10.227%| 4|IL MO NM OK
∆|  414,488|13.283%|

2. Split that minimizes ∆ = area(A) - area(B):
A|1,607,869|51.528%|18|AZ CA CO IA ID KS MN MT ND NE NV OR SD UP UT WA WI WY
B|1,193,381|38.245%|27|AL AR CT DE FL GA IN KY LA LP MA MD ME MS NC NH NJ NY OH PA RI SC TN TX VA VT WV
C|  319,110|10.227%| 4|IL MO NM OK
∆|  414,488|13.283%|

With a three-state cut there are no good answers; region B is at best only 14% of the area.

In [20]:

answers(usa49, 3, start=border, end=border)

49 states ⇒ 367 cuts (cut size ≤ 3, cut area ≤ 624,072) ⇒ 89 splits.

1. Split that maximizes area(B):
A|2,393,960|76.721%|42|AL AR CO CT DE FL GA IA IL IN KS KY LA LP MA MD ME MN MO MS MT NC ND NE NH NJ NM NY OH OK PA RI SC SD TN TX UP VA VT WI WV WY
B|  443,944|14.227%| 4|CA NV OR WA
C|  282,456| 9.052%| 3|AZ ID UT
∆|1,950,016|62.493%|

2. Split that minimizes ∆ = area(A) - area(B):
A|2,393,960|76.721%|42|AL AR CO CT DE FL GA IA IL IN KS KY LA LP MA MD ME MN MO MS MT NC ND NE NH NJ NM NY OH OK PA RI SC SD TN TX UP VA VT WI WV WY
B|  443,944|14.227%| 4|CA NV OR WA
C|  282,456| 9.052%| 3|AZ ID UT
∆|1,950,016|62.493%|

Unit tests¶

Here are some unit tests; they also serve as examples of input/output of the various functions:

In [21]:

def test():
    assert states('AZ CA OR') == frozenset({'AZ', 'CA', 'OR'})

    assert len(usa49) == 49 and len(usa50) == 50 and len(western) == 11
    
    assert set(areas) == set(neighbors)
    assert areas['MI'] == areas['UP'] + areas['LP'] 
    assert area(states('AZ CA OR')) == area(states('IA IL KY NE SD VA WV')) == 376_064

    assert all((x in neighbors[y]) == (y in neighbors[x]) 
               for x in neighbors for y in neighbors)

    assert neighborhood(states('CO UT')) == states('NV UT KS AZ NM ID NE OK WY CO')
    
    assert set(make_cuts(western, 3)) == {
        states('AZ ID NV'),
        states('CA ID OR'),
        states('CA OR WA'),
        states('CA ID NV'),
        states('ID NM UT'),
        states('AZ ID UT')}

    assert set(make_splits(western, make_cuts(western, 3))) == {
        (states('CO MT NM WY'), states('CA NV OR WA'), states('AZ ID UT')),
        (states('CO MT NM UT WY'), states('CA OR WA'), states('AZ ID NV')),
        (states('AZ CO MT NM UT WY'), states('OR WA'), states('CA ID NV')),
        (states('AZ CO MT NM NV UT WY'), states('WA'), states('CA ID OR'))}

    assert contiguous(western - states('AZ ID NV')) == states('CA OR WA')
    
    for country in (usa48, usa49, border, western):
        assert contiguous(country) == country

    assert contiguous(usa50) != usa50
            
    return 'ok'
               
test()

Out[21]:

'ok'