Skip to content

Making Py++ easier and reducing undefined behavior

Note: it's not recommended to read this page before you've read memory ownership tools. Furthermore, this is an advanced topic that you could come back to later after you know a little more about Py++, and have maybe experienced situations where your Py++ code runs differently via C++ vs. Python (that's my light recommendation). However, if you are very interested in programming language design, this page might be interesting for you early on.

I mentioned in the index page that Py++ would aspire to throw transpiler errors for programs that lead to undefined behavior and programs which run differently via the C++ executable vs. the Python interpreter.

I came up with a set of rules that can be followed, where if you follow them, then your code will run the same via the C++ executable and Python interpreter. For that reason, if you follow these rules, then I am thinking you can reason about your code as if it were Python code, which should make Py++ easier to use.

If we get to a point where the Py++ transpiler throws errors for each of these rules, when broken, then we might be able to say the code will always run the same via C++ and Python. Right now, none of these rules, when broken, throw transpiler errors.

If I am missing any rules, i.e. there are other ways you can get code that runs differently via C++ and Python, please let me know.

Rules

  • Only reassign a variable which is an owner and has no live references
  • For a function/method parameter that is pass-by-value, only pass temporaries or use mov()
  • For a class data member that is pass-by-value, only pass temporaries or use mov() (similar to above)
  • After doing mov(v), do not use v
  • Only use mov(v) if v is an owner and has no references
  • Do not end the lifetime of an owner if the owner has any living references
  • In a return-by-value function/method, do not return a variable whose lifetime does not end at the end of the function/method
  • When calling a return-by-reference function/method, the type annotation of the variable you assign the result to must be wrapped in Ref[]
  • When initializing list, set, dict, or tuple data structures with some initial values, only pass temporaries or use mov() for the elements

Examples of each rule

Only reassign a variable which is an owner and has no live references

from pypp_python import dataclass


@dataclass
class ClassA:
    my_list: list[int]


if __name__ == "__main__":
    my_list: list[int] = [1, 2, 3]
    my_list = [2, 3, 4]  # OK because it is the owner
    object_a: ClassA = ClassA(my_list)
    object_a.my_list = [4, 5, 6]  # ❌ it is not the owner
    my_list = [7, 8, 9]  # ❌ it is the owner, but it has a live reference

For a function/method parameter that is pass-by-value, only pass temporaries or use mov()

from pypp_python import Val, mov, auto


@dataclass
class ClassA:
    my_list: Val[list[int]]
    my_str: str


def a_factory(my_list: Val[list[int]]) -> ClassA:
    return ClassA(mov(my_list), "hello")


if __name__ == "__main__":
    object_a0: auto = a_factory([1, 2, 3])  # OK

    my_list_0: list[int] = [1, 2, 3]
    object_a1: auto = a_factory(mov(my_list_0))  # OK

    my_list_1: list[int] = [1, 2, 3]
    object_a2: auto = a_factory(my_list_1)  # ❌

For a class data member that is pass-by-value, only pass temporaries or use mov() (similar to above)

from pypp_python import Val, mov, auto


@dataclass
class ClassA:
    my_list: Val[list[int]]


if __name__ == "__main__":
    object_a0: auto = ClassA([1, 2, 3])  # OK

    my_list_0: list[int] = [1, 2, 3]
    object_a1: auto = ClassA(mov(my_list_0))  # OK

    my_list_1: list[int] = [1, 2, 3]
    object_a2: auto = ClassA(my_list_1)  # ❌

After doing mov(v), do not use v

from pypp_python import Val, mov, auto


@dataclass
class ClassA:
    my_list: Val[list[int]]


if __name__ == "__main__":
    my_list_0: list[int] = [1, 2, 3]
    object_a1: auto = ClassA(mov(my_list_0))

    min_val: int = min(my_list_0)  # ❌

Only use mov(v) if v is an owner and has no references

from pypp_python import Val, mov, auto


@dataclass
class ClassA:
    my_list: Val[list[int]]


def class_a_factory(my_list: Val[list[int]]) -> ClassA:
    return ClassA(mov(my_list))  # OK


@dataclass
class ClassB:
    my_list: list[int]

    def class_a_factory() -> ClassA:
        return ClassA(mov(self.my_list))  # ❌ self.my_list is a reference


if __name__ == "__main__":
    my_list_0: list[int] = [1, 2, 3]
    object_a1: auto = class_a_factory(mov(my_list_0))  # OK

    my_list_1: list[int] = [1, 2, 3]
    my_list_ref: Ref[list[int]] = my_list_1

    object_a0: auto = class_a_factory(mov(my_list_ref))  # ❌ my_list_ref is a reference

    my_list_2: list[int] = [1, 2, 3]
    my_list_2_ref: Ref[list[int]] = my_list_2
    object_a1: auto = class_a_factory(mov(my_list_2))  # ❌ my_list_2 is the owner, but it has a reference

Do not end the lifetime of an owner if the owner has any living references

from pypp_python import Val, mov, auto


@dataclass
class ClassA:
    my_list: list[int]


def a_factory() -> ClassA:
    my_list: list[int] = [1, 2, 3]
    object_a: auto = ClassA(my_list)  # ❌ my_list lifetime ends when this function returns
    return object_a


if __name__ == "__main__":
    object_a: auto = a_factory()

In a return-by-value function/method, do not return a variable whose lifetime does not end at the end of the function/method

from pypp_python import auto, Val, dataclass


@dataclass
class ClassA:
    my_list: Val[list[int]]

    def get_my_list(self) -> list[int]:
        return self.my_list  # ❌ my_list lifetime does not end


if __name__ == "__main__":
    object_a: auto = ClassA([1, 2, 3])
    my_list: list[int] = object_a.get_my_list()

When calling a return-by-reference function/method, the type annotation of the variable you assign the result to must be wrapped in Ref[]

from pypp_python import auto, Val, Ref, dataclass


@dataclass
class ClassA:
    my_list: Val[list[int]]

    def get_my_list(self) -> Ref[list[int]]:
        return self.my_list


if __name__ == "__main__":
    object_a: auto = ClassA([1, 2, 3])
    my_list_1: Ref[list[int]] = object_a.get_my_list()  # OK
    my_list_2: list[int] = object_a.get_my_list()  # ❌

When initializing list, set, dict, or tuple data structures with some initial values, only pass temporaries or use mov() for the elements

from pypp_python import mov


if __name__ == "__main__":
    a: list[int] = [3, 4]
    b: list[list[int]] = [[1, 2], mov(a)]  # OK

    c: list[int] = [3, 4]
    d: list[list[int]] = [[1, 2], a]  # ❌