I was exploring the Python 3.11 source code and came across the _PyCode_CODE macro. I noticed that within this macro, the co_code_adaptive member of PyCodeObject is cast to a uint16_t* pointer. However, co_code_adaptive is defined as a char array of length 1.
/*******************************************/
// code in code.h
/*******************************************/
#define _PyCode_CODE(co) ((const uint16_t *)(co)->co_code_adaptive)
#define _PyCode_DEF(SIZE) { \
// ... \
char co_code_adaptive[(SIZE)]; \
}
/* Bytecode object */
struct PyCodeObject _PyCode_DEF(1);
/*******************************************/
// code in specialize.c
/*******************************************/
void
_PyCode_Quicken(PyCodeObject *code)
{
_Py_QuickenedCount++;
int previous_opcode = -1;
_Py_CODEUNIT *instructions = _PyCode_CODE(code);
for (int i = 0; i < Py_SIZE(code); i++) {
int opcode = _Py_OPCODE(instructions[i]);
// ...
}
This casting seems like it might introduce memory access risks, as char and uint16_t have different type sizes and alignment requirements. Why did the Python developers choose to implement it this way? How is memory safety ensured in this context?
Specifically, in the specialize.c file, within the _PyCode_Quicken function, this macro is used to define a uint16_t* pointer named instructions. The code then accesses memory using the [] operator. How does Python ensure that these memory accesses are valid?
Any insights into the underlying design decisions or documentation explaining this approach would be greatly appreciated.
co_code_adaptiveis at the end of thestruct(based on what you showed), could it be the case that*codeis "over-allocated" such that there is enough memory for auint16_t?