Context
Currently sammine-lang has:
- alloc(value) → ptr<T> — heap-allocates a single value
- [T;N] — fixed-size stack arrays with bounds checking
- No way to heap-allocate dynamically-sized arrays
The goal is to add ptr<[T]> (no size parameter) as a fat pointer type — a {data_ptr, length} pair that carries its length at runtime, enabling bounds-checked indexing on heap arrays.
Syntax (driven by type annotation)
let p : ptr<[i32]> = alloc(32); # 32 zero-initialized i32s
let p : ptr<[i32]> = alloc(n); # n zero-initialized i32s (runtime size)
let p : ptr<[i32]> = alloc([1,2,3]); # 3 i32s from literal
p[1] = 42; # bounds-checked index assignment
printf("%d\n", p[2]); # bounds-checked index read
free(p); # frees the heap array
Type distinction
| Type | Representation | Bounds checking |
| ptr<i32> | thin pointer (opaque ptr) | none |
| ptr<[i32;3]> | thin pointer (opaque ptr) | compile-time size known |
| ptr<[i32]> | fat pointer ({ ptr, i64 }) | runtime length stored |
| [i32;3] | stack [3 x i32] | compile-time size known |
Phase 1: Parser — Allow [T] without size
File: src/Parser.cpp — ParseTypeExpr() (~line 391)
Currently [T;SIZE] requires both element type and integer size. Modify to also accept [T] (no semicolon, no size):
- After parsing element type, check for ; vs ]
- If ; → parse size as before → ArrayTypeExprAST(element, size)
- If ] directly → dynamic array → ArrayTypeExprAST(element, std::nullopt)
File: include/ast/Ast.h — ArrayTypeExprAST (~line 51)
Change size_t size to std::optional<size_t> size. Update to_string() to handle both forms.
Build & test after this phase.
Phase 2: Type system — Dynamic array support
File: include/typecheck/Types.h — ArrayType (~line 59)
- Change size_t size → std::optional<size_t> size
- Add bool is_dynamic() const { return !size.has_value(); }
- Update get_size() to assert non-dynamic or return optional
- Update operator== and operator<
File: include/typecheck/Types.h — Type struct
File: src/typecheck/BiTypeChecker.cpp — resolve_type_expr() or equivalent
- When resolving ArrayTypeExprAST with no size → produce Type::DynArray(element_type)
- When resolving with size → produce Type::Array(element_type, size) (existing)
No new TypeKind needed — both fixed and dynamic arrays use TypeKind::Array. The is_dynamic() method distinguishes them.
Build & test after this phase.
Phase 3: TypeConverter — Fat pointer LLVM type
File: src/codegen/TypeConverter.cpp
Update get_type() for TypeKind::Pointer:
case TypeKind::Pointer: {
auto pointee = std::get<PointerType>(t.type_data).get_pointee();
if (pointee.type_kind == TypeKind::Array) {
auto &arr = std::get<ArrayType>(pointee.type_data);
if (arr.is_dynamic()) {
return llvm::StructType::get(context, {
llvm::PointerType::get(context, 0),
llvm::Type::getInt64Ty(context)
});
}
}
return llvm::PointerType::get(context, 0);
}
Consider adding a helper: TypeConverter::is_fat_pointer(Type t) and TypeConverter::get_fat_ptr_type().
Build & test after this phase (existing tests should still pass — no existing code creates dynamic arrays).
Phase 4: Type checker — Bidirectional alloc checking
File: src/typecheck/BiTypeChecker.cpp
4a: Update visit(VarDefAST*) (~line 48)
Add special handling when RHS is AllocExprAST and declared type is ptr<[T]> (dynamic):
- Synthesize declared type from annotation
- If declared type is ptr<[T]> where arr is dynamic:
- If alloc operand is ArrayLiteralExprAST:
→ synthesize literal, check element types match T
→ set alloc type to ptr<[T]> with dynamic length
- Else (operand is count expression):
→ synthesize operand, verify it's integer (i32 or i64)
→ set alloc type to ptr<[T]> with dynamic length
- Else: existing behavior (alloc(value) → ptr<typeof(value)>)
4b: Update synthesize(AllocExprAST*) (~line 534)
Keep existing behavior for the synthesis direction. The check direction is handled in VarDefAST.
4c: Update synthesize(IndexExprAST*) (~line 578)
Currently only allows TypeKind::Array. Add support for ptr<[T]>:
- If array_expr type is ptr<[T]> (pointer to dynamic array):
→ validate index is integer
→ return element type T
- If array_expr type is [T;N] (existing):
→ existing behavior with static bounds check
4d: Update synthesize(LenExprAST*) (~line 612)
Currently only allows TypeKind::Array. Add support for ptr<[T]>:
- Extract element type from the pointee array type
- Return Type::I64_t() for dynamic arrays (length is i64)
4e: Update synthesize(FreeExprAST*) (~line 542)
Should already work — ptr<[T]> has TypeKind::Pointer. No change needed.
Build & test after this phase.
Phase 5: Codegen — Fat pointer operations
File: src/codegen/CodegenVisitor.cpp
5a: Alloc codegen — postorder_walk(AllocExprAST*) (~line 400)
Add a branch for when ast->type is ptr<[T]> (fat pointer):
Count-based allocation:
1. Get count value from operand (i32/i64)
2. Get element LLVM type from T
3. Compute: size = sizeof(T) * count
4. Call malloc(size)
5. Call memset(ptr, 0, size) — zero-initialize
6. Build fat pointer struct: { malloc_result, count }
7. ast->val = fat_ptr_struct
Array-literal-based allocation:
1. Get elements from the literal
2. count = number of elements
3. Call malloc(sizeof(T) * count)
4. Store each element value into the allocated memory via GEP
5. Build fat pointer struct: { malloc_result, count }
6. ast->val = fat_ptr_struct
Need helper to declare memset in CodegenUtils (alongside malloc/free).
5b: Index codegen — visit(IndexExprAST*) + postorder_walk(IndexExprAST*) (~line 476)
Currently hardcoded for stack arrays (looks up alloca by variable name).
Add branch for ptr<[T]>:
1. Visit array_expr normally (get the fat pointer value)
2. Extract data pointer: CreateExtractValue(fat_ptr, 0)
3. Extract length: CreateExtractValue(fat_ptr, 1)
4. Visit index_expr to get index value
5. Emit bounds check: index >= 0 && index < length
6. GEP into data pointer: CreateGEP(element_type, data_ptr, {index})
7. Load element: CreateLoad(element_type, gep)
Note: single-index GEP for pointer arithmetic (not {0, idx} form used for arrays).
5c: Index assignment — postorder_walk(BinaryExprAST*) (~line 144)
Currently handles arr[idx] = value. Add branch for ptr<[T]>:
1. Detect LHS is IndexExprAST on a ptr<[T]>
2. Get fat pointer, extract data_ptr and length
3. Bounds check
4. GEP into data_ptr
5. Store RHS value
5d: Free codegen — postorder_walk(FreeExprAST*) (~line 420)
Currently calls free(ptr_val). For fat pointer:
1. Extract data pointer: CreateExtractValue(fat_ptr, 0)
2. Call free(data_ptr)
5e: Len codegen — postorder_walk(LenExprAST*) (~line 505)
Currently returns compile-time constant for fixed arrays. For fat pointer:
1. Extract length: CreateExtractValue(fat_ptr, 1)
2. ast->val = length
5f: Variable store/load
VarDefAST codegen needs to handle fat pointer structs — the alloca must be of struct type { ptr, i64 }, and store/load the whole struct.
Build & test after this phase.
Phase 6: E2E tests
New test files in e2e-tests/compilables/ptr/:
- heap_arr_basic.mn — alloc with count, index, assign, free
- heap_arr_literal.mn — alloc with array literal
- heap_arr_runtime_size.mn — alloc with variable count
- heap_arr_len.mn — len() on heap array
- heap_arr_f64.mn — works with other element types
- heap_arr_oob.mn — bounds check triggers on out-of-bounds
Error test in e2e-tests/errors/:
- heap_arr_count_type_error.mn — non-integer count produces type error
Files to modify (summary)
Future: Custom Allocators
The current design hardcodes malloc/free. The fat pointer representation {ptr, len} is allocator-agnostic — it doesn't care where the memory came from.
Note: The typeclass system now exists (see type_checking.md and imports.md). A custom allocator could be modeled as a typeclass:
typeclass Allocator<A> {
tc_alloc(size: i64) -> ptr<i32>;
tc_free(p: ptr<i32>) -> ();
}
No changes to the fat pointer type or indexing/bounds-checking logic would be needed — only the allocation/deallocation backend changes.
Verification
After each phase, run:
cmake --build build -j --target unit-tests e2e-tests
All existing 60 tests must continue passing. New tests added in Phase 6.