No comment yet
September 17th, 2009

人真是有拖沓的毛病,特别是把有意思的东西想完了之后甚至都懒得动手做。BLOG的草稿里堆了好多,决定还是先贴出来一些,等有空了再填。

谈谈生活 2009-09-09

鸡肉是最没有味道的,所以总要放在调料里泡上几个小时才能入味。周三下午的课总是很伤感,又会有一个星期不见好多的教授。Grimshaw总是很sensitive,老是怀疑自己教得不好,虽然大家都听得兴高采烈。

**进化的理性作用 **2009-05-31

理性主义者常常宣称,进化已经产生了智能,因此完全可以不依赖于自然选择的进化而经由理性的逻辑推理进行设计,找出最适应的方法。这一想法的狂妄之 处在于忽视了理性的逻辑推理不是独立于进化而存在的,他本身是由进化所产生,因此不会是永久可靠,也自然不会是“找出最适应”的最优办法了。同时,定义最 优的约束条件本身就是困难的,这也使得由理性推导来获得最适应族群是更加困难的。

但是,这样的论断并没有否定理性决定最适应的可能性。

公共服务的效率 2009-06-04

记得在20世纪的科幻小说中,通常描述的情景是包括公共可视电话亭(Space Odyssey Series),发达的公共自行道交通,通用的货币单位。在近一些的科幻小说中,更存在一位无处不达的计算机先知(Ender’s Game)。

然而,这一些的公共设施并没有成为现实。研究为何19世纪末20世纪初是由公共设施主导这一现象,会有助于了解21世纪初的这一个人化趋势的来源和走向。19世纪末到20世纪初建造的公共设施包括公路、铁路、电网、固定电话网、加油站,输气线路等。

No comment yet
September 12nd, 2009

作为一个欠缺执行力和有计划的人,我总是瞎担心,虽然每次在最后一分钟都把事情办对也没有改掉这种提前十天半个月就开始担心的坏毛病。

No comment yet
September 11st, 2009

One problem about nowadays visual word based image retrieval system is to generate visual word set. Visual word set tend to be big (~50k), different cluster methods such as approximate k-center, affinity propagation are applied. However, the process is somehow periodical. Imagine you are an engineer in Flickr who roll out the new image retrieval system, and you generated visual word set based on yesterday’s Flickr photo set; but today, there are 200k more photos just uploaded to Flickr, at least you have to monthly generate visual word set in order to avoid mis-classify new visual word in new photo.

In today’s real-time world, the method looks like old fashion. The need is to add new visual word as soon as new photo is uploaded. The problem is, how we know a visual word is “new”? Though it is very obvious for us or computer to judge if a text word is new or not (or not so obvious for computer ie. wrong spelling?), visual word is hard. Maybe we can put some threshold for the k-NN visual word search and claim that the new query which similarity to the nearest neighbor below certain bar is a new visual word. But it is very unlikely the naive method can work in real world. The similarity measurement is very tricky part, maybe the query is not a new visual word at all, maybe it is just an old visual word under very different illumination. If that is the case, another naive method can be suggested. We may want to measure the difference between the 1st nearest neighbor and 2nd nearest neighbor. If the similarity of the query to 1st and 2nd NN are relatively the same, we may argue it is a new visual word because in high-dimensional Euclidean space, two random points tend to have relatively the same distance. The 2nd naive method is still not so persuasive because one can argue that the approximation method is very arbitrary.

Though I cannot provide more evidence to support this, but it may work to apply sparse analysis to filter out new visual words. Let’s imagine that we try to get a representation from exist word list A: y=Ax and minimize   x   _{L1}. And the SCI (sparsity concentration index) can give good indication about if a query is new visual word or not.
No comment yet
September 3rd, 2009

大部分人已经清楚地知道,股份在公司合作中具有重要的作用,大家都明了的有这样几种:激励,利益捆绑,决策,利益分配。在前几天和朋友的谈话中,他说不希望他自己公司股份说得太明白,大家需要钱就从公司拿就可以了,这样对大家都好。在不上市的年代里,似乎私人所有(Private-Hold)的公司中,股份显得无足轻重了。

但是,现代公司架构的股份制产生于公开交易市场之前。这样做是有道理的,首先,与常识相反,股份制将个人目标与公司目标分离开了。公司作为一个理性人,首要目的是追求高回报;而大部分个人追求的是生活的舒适性。在支出方面,两者有着严重的对立。公司一方,会想方设法减少支出,以获取更高的利润空间;个人一方,更情愿通过增加支出获得更好的生活体验。也就是说,作为理性的公司,衡量回报的标准是金钱,而个人衡量回报的标准是生活体验。

想象不通过股份方式建立起来的这样一种激励方式:公司高层可以自由支配公司花销满足个人需要。这样混乱的结果就是作为个人无法再分辨服务对象的目标,而造成大量不应有的开销。看看国内的发票报销制度,抛开其避税的初衷不谈,这样一个畸形的激励制度造成了许多可见的可避免的开销。

而股份制度提供了一种将个人目标和公司目标分开的激励手段。通常,股份制度的现金返还方式有这样的几种:公开市场转手,年度分红,公司回购。公开市场转手虽然与公司表现有关,但是并不影响管理层的运行。分红模式将个人利益与公司利益一致化。只有公司回购提供了一种个人与公司博弈的场景。而其他的激励制度,无论是自由支配花销还是发票报销,其实质都是提供一种个人与公司博弈的途径。而这样一种博弈,尤其是高层与公司的博弈,对于公司一方是不利的。

合理的股份激励避免了这种博弈情况,因此协调了公司与个人的关系,分隔了两者的目标。这种专业化,也为以后的管理层与股东分离提供了基础,保证了公司的长期发展。

No comment yet
August 27th, 2009

In my several former articles, I mentioned a system from different degrees and now I think it is the time to bring the stealth project: NDQI (Non-structural Data Query Interface) to the spot light.

The decades study in content-based image search (CBIR) was focused on the accuracy. Some earlier researches in CBIR has shown that if the size of database expanded, the accuracy will be dramatically improved. Recent years, my research on CBIR has two directions, one is to scale out, the other is to make it more user friendly. Two years ago, the experimental software ClusCom tried to solve the first problem. Now, I believe that I have reached a point which I can solve the second problem, partially. Instead of pursuit “user friendly”, I transformed the problem to “developer friendly”.

NDQI promises to provide the same accessibility to multimedia content (currently, only still image is supported) as today’s SQL system provided to structural data. It takes many good ideas from OSS and should be open-sourced in the future.**

**Basic Utilities **

The idea of NDQI is to design a special-purpose database which can access multimedia data efficiently with simple, SQL-like language. The first concern is how the storage layer works. As for now, NDQI only works with still image, a 16-byte string is used to identify one image. A radix-tree like in-memory structure is the basic utility of NDQI which takes over the key-value storage scheme inside NDQI. The radix-tree like structure is designed for memory efficient situation and have a comparable performance with other in-memory data structure such as Google sparsehash/APR hashtable etc. However, the in-memory storage layer is not designed for storing images. Specially, it is designed to store the meta-data and other indexes. Where to store the image is really not my concern because there are already in-production solutions out there such as Facebook’s Haystack.

The foundation of NDQI is a bunch of routines with c-style language. It is not really a c implementation because it depends on OpenCV lib. However, c-style interface is provided for the manually manipulation of database. Besides, it is natively thread-safe and take advantages of write-read lock internally. Thus, an upper layer (parser, indexer etc.) based on scripting language is possible.

**Database Types **

Two types of database designed specially for multimedia content search is provided. The first type is called bags of words database. In this scenario, an input can be extracted to various fixed-length words. For example, a picture with N people can be explained as N words, each word stands for a people in the picture. Actually, for a wide range of image recognition problems, “bags of words” idea is a good generalization.

The second type is called fixed-length vector database. For still image, the fixed-length vector database can store some simple global features such as histogram/gabor filter etc. This kind of database provide a more superior speed performance than bwdb because it can be speed up with tree-based method or local sensitive hash. Actually, in reality, it outperforms bwdb by 100x speed up, both with indexes.

Other meta-data such as date, location, tags and camera types are stored within Tokyo Cabinet which can do most things just like SQL-DB.

Language Specs

NDQI uses a set of SQL-like language for users to access the funtionalities provided by NDQI. It is SQL-like but unfortunately, not compatible with SQL. You can see more specifications on http://limpq.com/docs/ndqi. Here I’ll brief some of them. There is no real INSERT/REPLACE/UPDATE functionality as all images are uploaded and indexed automatically. INSERT and DELETE keyword can add/delete image and indexes from database immediately. SELECT keywords support nested SELECT natively which means query like “SELECT # WHERE lfd LIKE (SELECT # WHERE tag=’tree’)” can be performed efficiently with internal mechanism. # is a symbol in NDQI which represents the 16-byte string identifier of image. #F9IEkdfneI328jfek-3Et can define a specific image with identifier “F9IEkdfneI328jfek-3Et”(base64 encoded). Without point the table with FROM keyword, NDQI assume that the SELECT clause wants a global search. Actually, there is really only one big table and a smaller table can only be created runtime with SELECT clause.

User can modified several attributes with INSERT tag=”whatsoever” INTO #F9IEkdfneI328jfek-3Et clause and delete with DELETE tag=”whatsoever” FROM #F9IEkdfneI328jfek-3Et. You can also update attributes with WHERE clause: INSERT exif.gps.latitude=123.4 WHERE lpd LIKE #F9IEkdfneI328jfek-3Et.

High-Level Architecture

Though at first, I want to implement high-level functionality (parser, client-server app etc.) in scripting language, but in reality, I end up to program them all in C. The parser was implemented with lex and yacc; the server side was done by libevent. The workflow looks like this: the service was open through HTTP protocol. Once client side requests by “q=” parameter, server will try to parse the query into NQPARSERESULT PREQRY “>struct. Notably, it parses a SELECT clause into so-called PREQRY struct. A PREQRY may be nested that requires external knowledge to get through (contains references, subqueries).

Embedded into Current Architecture

NDQI is nothing more than a collection of database routine for multimedia content. Alter, the successor of ClusCom is responsible for scale NDQI to multi-machines and provide the network protocol access. Other than modern SQL-DB, the parser of QL is deployed in front. Because there is no join etc. functionality, it is really easy to scale. The front-end will parse a query into PREQRY, and “plan” the PREQRY into series of NQQRY struct with the help of nqclient.h. NQQRY is a stand-alone struct, which means that it relies on zero external knowledge. Front-end can just execute NQQRY on several computers and synthesize the returned results. To reimplement functions in nqclient.h is the most efficient way to embed with current scale architecture.

A new set of javascript APIs is also provided for users which can just “query” the server with a NDQI-valid query string. The functionality is called limpq.ndqi.Q.

No comment yet
August 24th, 2009

free memory with double-pointer is a good practice, otherwise, free and null pointer. stuck on this for 3 hours. trust your oldest code.

No comment yet
August 3rd, 2009

The dataset are so hard that even SVM can perform no better than chance (54%). ANN got 52%. On the original dataset. I thought that is impossible.

No comment yet
July 24th, 2009

Recent days I came across a paper which describe a tracking algorithm called ensemble tracking. It is an interesting reading and really easy to implement. Actually, I spent 30 hours to finish this algorithm.

One funny part of this algorithm is that whether intention or not, the author took advantage of self-similarity feature to make his algorithm useful.

The feature used for ensemble tracking is per-pixel based, in that way, a linear classifier can be gathered with any boosting algorithm. By applying the classifier to all pixels on the image, a probability image can be generated. On the probability image, all the traditional tracking algorithm can be applied.

My first thought of the algorithm is the rising doubt about the segmentation ability of the simple linear classifier (a linear combination of ~5 weak classifier). The result is not comparable to well-trained detector, but it shows the attempt of the algorithm to classify very similar pixels (in color perspective), and that is a success.

Ensemble Tracking 1 Ensemble Tracking 2

(notice how the classifier improved over time)

Without the property that most of the images are sparse and shares many common parts (self-similarity), it cannot gain any knowledge through the per-pixel feature. In fact, if an image fulfill with noise, the per-pixel features within a rectangle are mostly self-contradiction.

We can grab more fruits with the nice the sparsity and self-similarity of images.

(Update about NDQI: it is not dead. the command tool & parser seems to be a time sinker but luckily, now it is in test phase)

No comment yet
June 19th, 2009

Good evening, London. Allow me first to apologize. I do, like many of you, appreciate the comforts of the everyday routine, the security of the familiar, the tranquility of repetition. I enjoy them as much as any bloke. But in the spirit of commemoration. Whereby important events of the past usually associated with someone’s death or the end of some awful, bloody struggle are celebrated with a nice holiday. I thought we could mark this November the 5th a day that is, sadly, no longer remembered by taking some time out of our daily lives to sit down and have a little chat. There are, of course, those who do not want us to speak. Even now, orders are being shouted into telephones and men with guns will soon be on their way. Why? Because while the truncheon may be used in lieu of conversation, words will always retain their power. Words offer the means to meaning and, for those who will listen, the enunciation of truth.And the truth is: there is something terribly wrong with this country, isn’t there? Cruelty and injustice, intolerance and oppression. And where once you had the freedom to object to think and speak as you saw fit, you now have censors and surveillance coercing your conformity and soliciting submission.

How did this happen? Who’s to blame? Certainly there are those who are more responsible than others. And they will be held accountable. But again, truth be told, if you’re looking for the guilty you need only look into a mirror. I know why you did it. I know you were afraid. Who wouldn’t be? War, terror, disease. There were a myriad of problems which conspired to corrupt your reason and rob you of your common sense.

Fear got the best of you. And in your panic, you turned to the now High Chancellor Adam Sutler. He promised you order, he promised you peace and all he demanded in return was your silent, obedient consent. Last night, I sought to end that silence. Last night, I destroyed the Old Bailey to remind this country of what it has forgotten. More than 400 years ago, a great citizen wished to embed the 5th of November forever in our memory. His hope was to remind the world that fairness, justice and freedom are more than words. They are perspectives. So if you’ve seen nothing, if the crimes of this government remain unknown to you, then I would suggest that you allow the 5th of November to pass unmarked. But if you see what I see, if you feel as I feel, and if you would seek as I seek, then I ask you to stand beside me, one year from tonight, outside the gates of Parliament. And together, we shall give them a 5th of November that shall never, ever be forgot.

No comment yet
June 10th, 2009

考虑到草稿箱里面有5、6篇没发布的文章,终于下决心写一篇中文的、完整的、图文并茂的文章。

最近由于中国强制安装的一款软件带有色情图像检出功能,这方面的讨论在国内又热了起来。事实上,色情图像检测的研究已经有多年的历史,各大搜索引擎都带有或好或坏的色情图像检出功能(关于Google可参见:Large Scale Image-Based Adult-Content Filtering)。在进入CV领域之初,我也曾研究过一段时间的色情图像过滤技术,但是觉得单独而言没有太大的学术/商业前景而放弃了,要是知道会有4000万这样的单子,就研究下去了,扯远了。

在90年代人们开始研究这一问题的时候,自然而然地会想到通过检测图像上人的相对位置和动作来获取图像的语义信息。但是,考虑到写Google那篇文章的H.A.Rowley在98年才做出一个实用的人脸检测器,行人的检测到2004年后才有一些可以见人的结果,在当时要作出和Natal一样的动作检测器真是太难了。

Project Natal

因此,90年代更有效的方法就是用颜色直方图来训练分类器,得到一些在实际生活中比乱猜好不了多少的结果。

用多了繁琐的颜色直方图和分类器之后,越来越多人意识到如果用颜色的话,色情图像的检测根本就不用分类器这么麻烦,或者说用个简单的分段函数就好了。这就是后来大家常用的肤色统计方法。正巧,这时候很多人也对肤色检测感兴趣了,于是又有了好多的通过肤色统计的色情图像检测法。

Skin Detection

通过肤色检测只不过是直方图的一个简化,固有的问题还是没有解决,比如很多风景图片也有大块的类肤色区域,况且还有肤色高光部分导致的漏判,还有大块人脸等。通过一个人脸检测器来过滤大块的人脸区域于是成了标配。引入更多的图像特征,比如纹理等,也可以过滤掉一些误判的风景图像。根据提供给官方的谈判响应书说明,这家企业的色情图像过滤无非也是用了这样一些在2000年后成为标配的方法,可能还有一些形态学上进行处理的方法,加一起做出来的罢了。

大家都知道,现在我的兴趣也转向了局部特征描述子,当然了,现在也有一些用局部特征描述子的方法来解决这一问题。事实上,通过局部特征描述字应该是最接近通过语义解决这问题的方法了。当然,检测时间上仍然显得不划算。

另,很多人说国内的那款软件没有OpenCV的版权声明。其实,OpenCV的协议已经很宽松了,只需要含有版权声明就可,实在是很让人费解。

本文部分图片和内容从以下来源获得: http://gandolf.homelinux.org/~smhanov/blog/?id=63 http://groups.google.com/group/pongba/browse_thread/thread/78095c0bd8a90fe6?hl=zh-CN http://i.gizmodo.com/5282974/yes-but-which-48-points-does-project-natal-track