A Smaller, "Cheaper" Nand2Tetris CPU
I’ve done three different “enhancements” to the nand2tetris design now, and each one has added complexity in order to make programs run faster, or to allow larger programs to run.
That’s fun, but in a way making the chip better by using more gates is the easy problem. A trickier challenge is to make a smaller chip that’s still as useful as the original. It seems almost impossible; the nand2tetris design is so cleverly simple, how can we take anything away and still be able to do what it does?
Eventually I hit upon a simple idea. About 45% the ~1,200 gates of the chip go into its very wide ALU. 16 bits is a lot for a chip of this size; most design on this scale in the 1970s and ‘80s were 8-bit or mixed 8/16-bit architectures. What if I build a chip with only a single 8-bit ALU, and use two cycles to push the two bytes of each input through the ALU?
You can see how that worked out in alt/eight.py.
There’s some extra logic to keep track of cycles in pairs: In the “top-half” cycle, the ALU computes the low-byte of its result, which is stored in a simple latch (8 DFFs), along with a carry bit and the ALU’s zr
condition bit. In the following “bottom-half” cycle, the ALU computes the high-byte (incorporating the carry from the previous result). A number of the CPU’s tasks happen only in the bottom-half cycle: the PC is incremented (or loaded, if jumping), the registers are updated from the ALU/memory, memory is written, etc.
The narrower ALU is in fact just about half the size of the original (286 vs. 560 gates.) Unfortunately the rest of the chip gets more complicated to deal with the two-cycle cadence. There are now two possible sources for each ALU input, as well as 19 new DFFs to hold results between cycles, and assorted logic to keep track of what happens when. In the end, the chip is smaller, but only by about 18%, nowhere near half the size of the original. Non-ALU logic is up to 744 from 682 gates.
So, partial success I suppose. I wonder what a late-‘70s microprocessor engineer would have made of an 18% savings in chip area that came with a 50% reduction in speed? Would this chip have sold as a low-cost alternative?