Making Py++ easier and reducing undefined behavior
Note: it's not recommended to read this page before you've read memory ownership tools. Furthermore, this is an advanced topic that you could come back to later after you know a little more about Py++, and have maybe experienced situations where your Py++ code runs differently via C++ vs. Python (that's my light recommendation). However, if you are very interested in programming language design, this page might be interesting for you early on.
I mentioned in the index page that Py++ would aspire to throw transpiler errors for programs that lead to undefined behavior and programs which run differently via the C++ executable vs. the Python interpreter.
I came up with a set of rules that can be followed, where if you follow them, then your code will run the same via the C++ executable and Python interpreter. For that reason, if you follow these rules, then I am thinking you can reason about your code as if it were Python code, which should make Py++ easier to use.
If we get to a point where the Py++ transpiler throws errors for each of these rules, when broken, then we might be able to say the code will always run the same via C++ and Python. Right now, none of these rules, when broken, throw transpiler errors.
If I am missing any rules, i.e. there are other ways you can get code that runs differently via C++ and Python, please let me know.
Rules
- Only reassign a variable which is an owner and has no live references
- For a function/method parameter that is pass-by-value, only pass temporaries or use
mov() - For a class data member that is pass-by-value, only pass temporaries or use
mov()(similar to above) - After doing
mov(v), do not usev - Only use
mov(v)ifvis an owner and has no references - Do not end the lifetime of an owner if the owner has any living references
- In a return-by-value function/method, do not return a variable whose lifetime does not end at the end of the function/method
- When calling a return-by-reference function/method, the type annotation of the variable you assign the result to must be wrapped in
Ref[] - When initializing list, set, dict, or tuple data structures with some initial values, only pass temporaries or use
mov()for the elements
Examples of each rule
Only reassign a variable which is an owner and has no live references
from pypp_python import dataclass
@dataclass
class ClassA:
my_list: list[int]
if __name__ == "__main__":
my_list: list[int] = [1, 2, 3]
my_list = [2, 3, 4] # OK because it is the owner
object_a: ClassA = ClassA(my_list)
object_a.my_list = [4, 5, 6] # ❌ it is not the owner
my_list = [7, 8, 9] # ❌ it is the owner, but it has a live reference
For a function/method parameter that is pass-by-value, only pass temporaries or use mov()
from pypp_python import Val, mov, auto
@dataclass
class ClassA:
my_list: Val[list[int]]
my_str: str
def a_factory(my_list: Val[list[int]]) -> ClassA:
return ClassA(mov(my_list), "hello")
if __name__ == "__main__":
object_a0: auto = a_factory([1, 2, 3]) # OK
my_list_0: list[int] = [1, 2, 3]
object_a1: auto = a_factory(mov(my_list_0)) # OK
my_list_1: list[int] = [1, 2, 3]
object_a2: auto = a_factory(my_list_1) # ❌
For a class data member that is pass-by-value, only pass temporaries or use mov() (similar to above)
from pypp_python import Val, mov, auto
@dataclass
class ClassA:
my_list: Val[list[int]]
if __name__ == "__main__":
object_a0: auto = ClassA([1, 2, 3]) # OK
my_list_0: list[int] = [1, 2, 3]
object_a1: auto = ClassA(mov(my_list_0)) # OK
my_list_1: list[int] = [1, 2, 3]
object_a2: auto = ClassA(my_list_1) # ❌
After doing mov(v), do not use v
from pypp_python import Val, mov, auto
@dataclass
class ClassA:
my_list: Val[list[int]]
if __name__ == "__main__":
my_list_0: list[int] = [1, 2, 3]
object_a1: auto = ClassA(mov(my_list_0))
min_val: int = min(my_list_0) # ❌
Only use mov(v) if v is an owner and has no references
from pypp_python import Val, mov, auto
@dataclass
class ClassA:
my_list: Val[list[int]]
def class_a_factory(my_list: Val[list[int]]) -> ClassA:
return ClassA(mov(my_list)) # OK
@dataclass
class ClassB:
my_list: list[int]
def class_a_factory() -> ClassA:
return ClassA(mov(self.my_list)) # ❌ self.my_list is a reference
if __name__ == "__main__":
my_list_0: list[int] = [1, 2, 3]
object_a1: auto = class_a_factory(mov(my_list_0)) # OK
my_list_1: list[int] = [1, 2, 3]
my_list_ref: Ref[list[int]] = my_list_1
object_a0: auto = class_a_factory(mov(my_list_ref)) # ❌ my_list_ref is a reference
my_list_2: list[int] = [1, 2, 3]
my_list_2_ref: Ref[list[int]] = my_list_2
object_a1: auto = class_a_factory(mov(my_list_2)) # ❌ my_list_2 is the owner, but it has a reference
Do not end the lifetime of an owner if the owner has any living references
from pypp_python import Val, mov, auto
@dataclass
class ClassA:
my_list: list[int]
def a_factory() -> ClassA:
my_list: list[int] = [1, 2, 3]
object_a: auto = ClassA(my_list) # ❌ my_list lifetime ends when this function returns
return object_a
if __name__ == "__main__":
object_a: auto = a_factory()
In a return-by-value function/method, do not return a variable whose lifetime does not end at the end of the function/method
from pypp_python import auto, Val, dataclass
@dataclass
class ClassA:
my_list: Val[list[int]]
def get_my_list(self) -> list[int]:
return self.my_list # ❌ my_list lifetime does not end
if __name__ == "__main__":
object_a: auto = ClassA([1, 2, 3])
my_list: list[int] = object_a.get_my_list()
When calling a return-by-reference function/method, the type annotation of the variable you assign the result to must be wrapped in Ref[]
from pypp_python import auto, Val, Ref, dataclass
@dataclass
class ClassA:
my_list: Val[list[int]]
def get_my_list(self) -> Ref[list[int]]:
return self.my_list
if __name__ == "__main__":
object_a: auto = ClassA([1, 2, 3])
my_list_1: Ref[list[int]] = object_a.get_my_list() # OK
my_list_2: list[int] = object_a.get_my_list() # ❌
When initializing list, set, dict, or tuple data structures with some initial values, only pass temporaries or use mov() for the elements
from pypp_python import mov
if __name__ == "__main__":
a: list[int] = [3, 4]
b: list[list[int]] = [[1, 2], mov(a)] # OK
c: list[int] = [3, 4]
d: list[list[int]] = [[1, 2], a] # ❌