The very basic instrument used in my implementation is called control-point feature (renamed to brightness binary feature to reflect that the implementation in ccv works only on brightness value). For a given WxH image region, one feature consists of two sets of control points, a, a, ... a[n] and b, b, ... , b[m]. To classify the given image region, a feature examines the pixel values at control points in group a and group b in relevant images (at original size, half-size and quarter-size). The feature only answers "yes" if all pixel values in group a is greater / less than any pixel values in group b. The details can be found in the original paper YEF: Real-time Object Detection and a follow-up High-Performance Rotation Invariant Multiview Face Detection. Long story short, the training program bbfcreate will create several strong linear classifiers from control-point features using AdaBoost.
Once the image pyramid is generated, the detection process is just following the paper. The algorithm sweep over the whole image at different resolutions to check if a face exists there with control-point feature (line 290). I have no other tricks to improve speed-wise beyond this point. At the end of this process, it merges detected areas and returns that with confidence score.
OK, let's reconfirm how fast it is:
This 2808x1805 image takes 6 seconds on Firefox with Web Worker off, and 10 seconds with Web Worker on. It takes 4 seconds on Google Chrome (Web Worker doesn't work as smooth in Google Chrome).
Please let me know what else in this implementation you want to be explained in the comments.