Wednesday, November 30, 2016

Measure Twice, Cut Once

In a previous post, I observed that, as I increased the number of cores in my GPGPU, performance began to plateau and hardware threads spent more time stalled because their store queue was full. I speculated that the latter might cause the latter, although that wasn't definitive. The current implementation only has a single store queue entry for each thread. One optimization I've been considering is adding more store queue entries, but this has subtle and complex design tradeoffs.

Sunday, September 4, 2016

Further Reading

A passage from Mortimer J. Adler and Charles Van Doren's "How to Read a Book," discusses reading for understanding:
"If the book is completely intelligible to you from start to finish, then the author and you are as two minds in the same mold. The symbols on the page merely express common understanding you had before you met.
Let us take our second alternative. You do not understand the book perfectly. Let us even assume--what unhappily is not always true--that you understand enough to know that you do not understand it all. You know the book has more to say than you understand and hence it contains something that can increase your understanding... 
Without external help of any sort, you go to work on the book. With nothing but the power of your own mind, you operate on the symbols before you in such a way that you gradually lift yourself from a state of understanding less to one of understanding more. Such elevation, accomplished by the mind working on a book, is highly skilled reading, the kind of reading that a book which challenges your understanding deserves"

Sunday, July 24, 2016

GPLGPU Walkthrough

A few years ago, an interesting kickstarter projected popped up:

The goal was to publish source code to a GPU that is register compatible with the late 90's era Number Nine "Ticket To Ride IV" GPU. Although the project didn't meet its funding goal, the person behind it later published the code on github.

Although this is an older design, it has a lots that is worth studying. It's instructive to compare it to the VideoCore GPU that I walked through in a previous post. While there are some fundamental differences, there are surprising number of similarities, which shows how modern GPUs evolved from earlier ones.

Tuesday, March 1, 2016

VideoCore QPU Pipeline

As a followup to the last post, I've taken a closer look at the Quad Processor Unit that executes shaders on the VideoCore GPU. Although it superficially looks like a CPU, there are some important fundamental differences, and the reasons for them are interesting.

Friday, February 26, 2016

Life of a Triangle

A few years ago, Broadcom released full specifications for their VideoCore IV GPU, which is in the system-on-chip on the popular Raspberry Pi dev board. Before this, most details of commercial GPUs were secret. Although GPU manufacturers released white papers and some academic publications, they were often greatly simplified and lacked important details.

Tuesday, November 3, 2015

Lost in Translation

I sometimes sympathize with Mike Mulligan, staring up from the bottom of his freshly dug basement at four neat walls and four neat corners and realizing he's forgotten to leave a way out. The advantage of construction in the virtual world is that it's easy to back up and start over. Sometimes my approach to a complex new feature is to implement until I run into a dead end, revert my changes, and think about it some more. It's not uncommon for me to make several runs at a feature, each time building a better mental model of the problem.