2

I am investigating the performance impact of a very broad inheritance setup.

  1. Start with 260 distinct attribute names, from a0 through z9.
  2. Create 260 classes with 1 uniquely-named attribute each. Create one class that inherits from those 260 classes.
  3. Create 130 classes with 2 uniquely-named attributes each. Create one class that inherits from those 130 classes.
  4. Repeat for 52 classes with 5 attributes each, 26 classes with 10 attributes each, and 1 class with all 260 attributes.
  5. Create one instance of each of the five classes, and then measure the time to read (and add together) all 260 attributes on each.

Average performance from 2.5M reads, interleaving in different orders.

 From260: 2.48
 From130: 1.55
  From52: 1.22
  From26: 1.15
AllInOne: 1.00

These values sort of fit on a linear regression...but they don't. And these relationships hold true across many different runs, of various sizes and test orders.

linear relationship

The values come closer to fitting a second-degree polynomial, or exponential fit...but again, the data does not fit so cleanly as to be obvious.

enter image description here

As I massively increase the number of subclasses, will the performance falloff be linear, or non-linear?


Here's some updated code that samples many different subclass combinations up to 2310:

from time import time

TOTAL_ATTRS = 2310

# Create attribute names "a0000" through "a2309"
attr_names = [f"a{i:04d}" for i in range(TOTAL_ATTRS)]

# Map each attribute to a default value (1..2310)
all_defaults = {name: i + 1 for i, name in enumerate(attr_names)}

# The provided factors of 2310
factors = [1, 2, 3, 5, 6, 7, 10, 11, 14, 15, 21, 22, 30, 33, 35, 42,
           55, 66, 70, 77, 105, 110, 154, 165, 210, 231, 330, 385,
           462, 770, 1155, 2310]

# Build a dictionary mapping each factor to a composite class.
# For factor f, create f subclasses each with (2310 // f) attributes,
# then create a composite class inheriting from all f subclasses.
composite_classes = {}
for f in factors:
    group_size = TOTAL_ATTRS // f
    subclasses = []
    for i in range(f):
        group = attr_names[i * group_size:(i + 1) * group_size]
        group_defaults = {name: all_defaults[name] for name in group}
        subclass = type(f"Sub_{f}_{i}", (object,), group_defaults)
        subclasses.append(subclass)
    composite_classes[f] = type(f"From_{f}", tuple(subclasses), {})

iterations = range(0, 1_000)

for n, c in composite_classes.items():
    i = c()
    t = time()
    for _ in iterations:
        for a in attr_names:
            getattr(i, a)
    print(f"Inheriting from {n} subclasses: {time()-t:.3f}s")

and the results, which seem far more linear than polynomial, but which have odd "ledges" in them:

data points with linear fitting

data points with quadratic fitting

9
  • 1
    How massive is massive? Since you have a testing framework, why not create a condition with 2500 attributes and see if that reveals anything? Commented Feb 3 at 17:54
  • 1
    CPython has an internal type attribute cache with a max size of 4096 entries, and instance attribute lookups do check this cache when trying to find an attribute through the class. I would expect most of your test lookups to hit the cache, but something is causing a significant slowdown anyway, so there seems to be something weird going on. Commented Feb 3 at 18:03
  • 1
    @JonSG Let me clean up my test code and I'll include it. Good point. Commented Feb 3 at 18:05
  • 1
    Is this just academic curiosity? What's the use case for a class with hundreds of superclass? Commented Feb 3 at 18:06
  • 1
    @JonSG "How massive is massive?" Probably not more than 1000. I'm uncertain at this stage. Commented Feb 3 at 18:25

2 Answers 2

3

The slowdown will be weird and hard to fully predict, depending on subtle details of memory allocation and attribute access order.

Worst-case, not only will you experience a linear slowdown, you'll slow down completely unrelated attribute accesses in unrelated code.


CPython has a 4096-entry type attribute cache, and instance attribute lookups do check this cache when looking for an attribute through the class. The cache entry used for an attribute lookup is determined using a simple hash based on the type's version tag and the address of the string object for the attribute being looked up:

#define MCACHE_HASH(version, name_hash)                                 \
        (((unsigned int)(version) ^ (unsigned int)(name_hash))          \
         & ((1 << MCACHE_SIZE_EXP) - 1))

#define MCACHE_HASH_METHOD(type, name)                                  \
    MCACHE_HASH(FT_ATOMIC_LOAD_UINT32_RELAXED((type)->tp_version_tag),   \
                ((Py_ssize_t)(name)) >> 3)

If an attribute is found through the cache, the lookup is quick, and doesn't depend on the depth of the inheritance hierarchy. But if an attribute is not found through the cache, the lookup has to go through the class's MRO, one class at a time, performing a dict lookup in each class until it finds (or fails to find) the attribute. This takes an amount of time linear in the number of classes it has to look through.

Note that because of the descriptor protocol, Python has to do this even if it finds an entry for the attribute directly in the instance dict.


So the more attributes you use, the greater the chance you run into a hash collision, either with another one of your attributes or with something completely unrelated, like list.append. The longer your MRO, the greater the impact of a collision.

If two attributes happen to produce a hash collision, then accessing one while the other is in the cache will need to perform a slow MRO search for the attribute. Then it'll evict the other attribute from the cache. If you access this attribute again before the other one, it'll be quick, but the next time you access the attribute that just got evicted, you'll go through another slow MRO search and eviction.

Because the cache cares about the address of the attribute string instead of the actual attribute name, using a different string object for a lookup will also cause a cache miss. In normal Python code, attribute names get interned, so this isn't a problem, but when you're generating attribute names dynamically, the interning doesn't happen.

Sign up to request clarification or add additional context in comments.

Comments

2

Anyway - the behavior seems to be "close enough to linear" for all practical purposes.

What may make a difference here is taking in account the behavior for reading/writing attributes.

(1) if the attribute is created at the instance level, it doesn't mind, after the class is created, how deep in the hierarchy its value had been set - all attributes will live the instance's flat namespace, which is a dictionary. (Ok, the descriptor protocol would mandate a MRO search for the attribute, regardless of it being in the instance dict - but then, also, there is the attribute cache, as described in @user235711 's answer which compensates for that. TL;DR: repeated instance attribute access will be O(1), regardless of inheritance depth)

(2) reading an attribute from a class, in code, (which is what your experiment does), will trigger a traversal, searching each superclass namespace for the attribute - and this is the only case where a deep inheritance chain makes a difference. So, for 250 subclasses, reading an attribute that is defined in the deep-most superclass, will require 250 namspace checks. While reading an attribute defined 125 classes upstream will just need this - amount of namespace seeks - thus each attribute will take a different amount of time to be read, depending on the depth it was defined at - this factor alone is likely responsible for the "non-linearity" in your fidings. If you just define all attributes on the first class, them stack 248 empty classes, and read those attributs from the 250th generation-child, your plot shape will change.

However it is mostly irrelevant - Any real system that would be impacted by this could trivially implement a metaclass, or a __init_subclass__ method which could flatten the namspace for each class with a large-inheritance chain. If one would need that to be live, i.e. runtime change on super-grand-parent-1 attribute reflects on the attribute as read on child-250 class, a mechanism a little more sophisticated, with an intermediary mapping, could be built on the __getattribute__ special method, and the to get the attribute would be flattened to O(1) nonetheless.

Just tell if you need such a mechanism. (You can write a new question with the "metaclass" tag, for example)

3 Comments

I almost gave you the accept mark for pointing out that attributes live in the instances dictionary. My concern still applies for methods and properties defined by the many classes it inherits from, but that mitigates it heavily. Though your second point mostly applies, note that my class hierarchy is flat. A single instance might belong to a class that inherits from 200 or 1000 other classes, but none of them inherit from anything else (other than object).
With regard to both points, I believe a descriptor lookup is always required. If a descriptor has a __set__ method, it takes precedence over instance attributes. If you add a data descriptor foo to a class of an instance with instance attribute foo, then the next lookup of foo invokes the data descriptor. eg. gist.github.com/Dunes/2a4884091e7d249fa886ec0b13fb4079 The type_cache_entry mentioned by @user235711 can have a NULL value, which describes a cache hit for a class with no class attribute. ie. use the instance attribute if it exists.
@Phrogz - So, you actually have a real-world scenario like this. Do you want me to produce some custom __getattribute__ code which probably could mitigate the issue (by building another cache, but a simpler one than Python's which will just assume base classes are static between calls - which Python's can't do as it must fully support the language dynamism. i.s. a new "foo" method by have been included in the middle of a complex inheritance hierarchy. (No matter that they look "flat' with multiple inheritance - the mro is hundreds of classes deep)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.