Posts from March, 2009
No comment yet
March 31st, 2009

The one major common sense shared in machine learning community is that euclidian distance is poor. To attack this problem, one way is to use another distance measurement, and the other is to learn a better distance representation. Mahanalobis distance is a good practice by linear tranform our data to a more suitable space. As it is only do one linear transformation, after the transformation, it is still a normal euclidian distance.

By finding a better linear space to retain NN, it may dramatically improve the result (>2x). However, it cannot dilute our concern to the imperfection of euclidian space. By simply turn to another “nonlinear” method cannot serve any good too. Turning a simply question to a space which has more degree of freedom and tuning a better result is a way to avoid harder and realistic problem. Stick to the linear way is not something too shy to say.

At the monment, we are still largely depended on lower-dimensional euclidian distance and hoping to find another unified way to do distance measurement.

No comment yet
March 24th, 2009

在以前的一些分析中,通常都忽略了情感的作用而只注重于理性分析。将情感认为是一个不可预知量,并利用一切手段进行保守估算,通过理性分析,虽然能获得一些利益,但是相对于导入情感作为可控量后的结果而言还是显得太少太少了。

论断所得结果过于保守,以至于在实际生活中毫无用处,但引入情感的分析之后,很多现象可以大胆一步。

情感分析的作用不仅仅是被动的。主动的情感导入也可以帮助人。人的记忆体是非常强壮的,而且记忆的存取是联想式的,这也是为什么背文章要花很大的主观努力而记忆场景基本不需要自我参与的原因。很多时候甚至人们很难分辨自己记忆的真实性,因为联想就在那,没有更多的现象可循。

这为导入情感做了铺垫。用强烈的情感来做比方,如果在极端血腥暴力的镜头中主动突出种族特征,那么大多数接受的人都会将其联系起来。当然,这只是在几十年前被用过的一种宣传手段而已。但是和几十年前的情况不同,现代的研究,我们可以论断,即使只有很少量的这种宣传,也会引起和大规模宣传类似的效果。因此,以前的大范围催化方式可以被点对点的传播方式所替代。也就是说,特定的情感注入可以作用到特定的人身上而不会被察觉。

更加地,如果可以引发大脑从自身承认并不断被强化的事实出发引起联想,且故意模糊这段联想的时间,那么在被诱发之后,会更主动地强化这一认知。通过这样的方法,特定人甚至会捍卫这一自己的所产生的认知。这种力量是不能被小窥的。

人类具有原始情感的特质妨碍了零和博弈的现实,这可以认为是群体保护的一种基因特点。同时,这些原始情感和记忆的联想特质,也导致了容易受到精心设计的意识攻击的可能。

No comment yet
March 8th, 2009
The holy grail of computer vision is to understand the scene, and output with proper language. In ideal senerio, it should be able to answer questions like “how many people visited our college this afternoon” or at least given result to a query like “SELECT COUNT(*) FROM Camera1 Camera2 Camera3->VideoStream, DateTime->VideoStream WHERE FaceDetector LIKE (SELECT FaceDetector WHERE Tag=Face) AND Time > ‘12:00:00’ AND Time < ‘23:59:59’”.

In this senerio, we extracted a goal that with existing technology could be achieved. Other than to distort structural data which introduced in the article, here we try to structuralize visual data. The result of the effort is a new structural query language for visual data. It should be a subset of existing SQL and more over, ideally, it should be able to collabrate with other SQL engines. In practice, we sacrified the compatibility with exisiting SQL frontface in order to get better performance. As a result, we yield a incompatible query syntax with SQL. In fact, for the current stage, I’d better describe it as a process to find similar visual data.

Visual data is processed with several different feature extractors when the first time put in database. The different feature representations become columns for every visual data. It also generate a unique fingerprnt in order to avoid duplicate visual data. To interact with text, tags are introduced again. Every piece of visual data can be tagged. With tags, one can write a nested query to do classification like the query in the beginning, it is actually a kNN classification. The feature extractor is atomic coomponent in the construction. Three feature extractors are provided in the start: face extractor, SURF extractor and MSER extractor. The structure is fleaxible, any extractor can be added later. Ideally, the extractors should provide a function to measure similarity between two visual data. In current case, it would be a huge performance penalty. To avoid these penalities, several helper extractors are introduced. The general extractors can have very different output, it can be variable length, binary data, or serialized data. a list, or a tree. However, the helper extractors output fixed length float sequence; they also have a implication that they can be measured by L1/L2 distance. In the system, we use helper extractors to mimic the similarities that are output by general extractors. Three helper extractors are used: global histogram, local histogram and gabor filter.

Overall, it looks far from a query language. One may think it as a table engine like what MyISAM/InnoDB does. Giving it time, maybe it could be more powerful, who knows?