From other languages I picked up the habit of scoping variables as tightly as possible. In MMBasic that’s LOCAL inside subs and functions. Clean code, no clashes with globals, all correct, or so I thought. Then I profiled my game and saw what it was costing me inside the hot loop.

What I measured

Profiling my game loop, I noticed that a sub called once per frame per asteroid was eating more time than the work inside it should justify. The body of the sub was simple, the cost didn’t add up.

The suspect, in the end, was the LOCAL declarations themselves. Every call reserves and initialises those variables anew. In a sub that runs 40 times per frame, so 2400 times a second, the allocation alone is real cost.

Global working arrays instead of LOCAL

The rewrite was straightforward: pull the working arrays for inner-loop routines out into globals, then reuse them in the sub.

Before:

SUB DrawAsteroid(idx)
  LOCAL FLOAT src_x(15), src_y(15)
  LOCAL FLOAT tmp_x(15), tmp_y(15)
  ' ... rotate, translate, draw ...
END SUB

After:

DIM FLOAT da_src_x(15), da_src_y(15)
DIM FLOAT da_tmp_x(15), da_tmp_y(15)

SUB DrawAsteroid(idx)
  ' uses da_src_x, da_src_y, da_tmp_x, da_tmp_y
END SUB

I prefix the names with da_ (for DrawAsteroid) so it’s obvious which routine owns each working array. That doesn’t replace compiler-enforced scoping, but for readability and to avoid clashes with other subs, it’s enough.

Things like LOCAL FLOAT i, j, k for loop counters got the same treatment in hot subs, replaced by a small set of global loop variables that several routines share.

Across my game-loop subs this shaved real frame time, particularly in DrawAsteroid and the particle update routine. I’m not down on LOCAL in general — in normal code it’s the right tool and it keeps things readable. But in the three or four subs that fire hundreds of times a second, the per-call reservation costs more than it’s worth, and that’s where pulling the buffers out into globals pays off. I didn’t touch the rest of my code, because outside the hot paths you can’t measure the difference and the namespace gets messy fast.