[PDF]
Highlight: document-vector is sparse, therefore, it can be recovered by compressed sensing.
Today’s robust object detection algorithm need large training dataset. For example, THU’s high accuracy face detection system uses 30k positive faces, and the negative examples collected from background images are countless. It poses a very serious problem for researchers. Because good result relies on both novelty of algorithm and size of training dataset. Completeness of dataset will help the algorithm to generate good assumptions of the subject. But the collecting of dataset is heavily labored. It requires human input for every sample, which makes the collection process not scalable.
On other hand, the photo-realistic 3D graphics is very much mature. Nowadays’ PC software can produce very real graphics. The only difference is the rendering is very computational expensive. However, one stage of effective training typically requires ten thousands positive images, it is not a small number, a 6 min length 3D movies roughly has 10,000 frames. However, only a very low-resolution image is something we really need in later training stage, a 32x32 resolution is enough for many tasks. The specific requirement makes the problem feasible.
And all hail to Turing. Who can tell me how the “Remapping PCI Memory to Memory” option affects Linux 64bit and Win7 64bit?
The constructing of visual word set is essentially a method to find clusters/exemplars in given feature space. The result by comparing a local feature vector with visual word set is the index of visual word. It compacted the reverse index to each image, but the underlying mapping is the same. It performs the calculation h(v) to convert one vector to an integer.
Though the mapping function of looking up visual word is very similar to the hashing function of locality sensitive hash scheme, the idea behind it is very different. LSH more cares about preserve the distance measurement and coverage to whole feature space. Visual word generating cares more about concentration of points. Thus, for L2 distance, visual words tend to use different diameter balls to cover existing points, but LSH uses more same diameter balls to cover the whole range.
It is easy to recognize that LSH encoded more information because it preserved the distance information through a set of hashing functions. It carved each point more precisely by providing several measurements. On the other hand, looking up visual word lost all the distance information, only yielded a categorical result. Comparing as one hashing function, visual word method is good because it obtained approximately best partition for the existing points. But the overall performance are restricted by the limited output.
I am looking into an algorithm that combines dynamic generated visual word set, the sparsity nature of self-similarity descriptor and locality sensitive hashing into an online local feature comparison.
写这种程序总是两难,也连累了进度。写检测器吧没训练结果不知道正确性。写训练器吧没检测器收集负样本没法训练。这种时候用MatLab的优势就体现出来了。
孔子的男子年龄阶段论显然是不适合国家的。但是我们六十年又干了什么呢?很泄气地说,前三十年破除一切阶级,后三十年培养暴富阶级。前三十年公有化,后三十年私有化。土地革命是对的,改革开放也是对的。那么,饿死的三千万人做错了什么。
我多灾多难的祖国,六十年了,有些人还在封锁TOR。