Posts from December, 2009
No comment yet
December 31st, 2009

刚才看了个10年前的对话节目,意识到逻辑混乱是从60年代生人就开始了。

No comment yet
December 30th, 2009

在1989年6月4日之前,几乎所有的学生和知识分子都不相信政党控制的军队会对他们开枪的。即使在中共内部的一些人,在两个月前也想不到会发生这样的事情。

最近几次群体性事件出动武警、特警已成常态,对群众开枪屡见不鲜。在这样的情况下,让民众为并非生与死的选择上街头是很难的。而只对少部分人进行生死选择已经成为该政党威吓民众的重要手段,所谓的“鼓动群众斗群众”。在这种情况下,提前走上街头,鼓动政权更迭是意识到当前政党这一特性的民众的必然选择。

唯一的问题在于,大部分持此观点的知识分子和民众没有掌握军队,而在现代化战争中,一名训练合格的特种兵足可以对抗数以百计的持冷武器的民众。这种数量对比使得拥有3000万常备军的北京在和民众的对抗中具有绝对优势。然而,当权者只把士兵当炮灰,我们却知道每一个士兵都有自己的家庭、情人和朋友,他们是活生生的人,而不是可以肆意拿来计算得失的工具。

在改革开放以前,中国每年都有大量的壮年人口死亡。这些死亡,造成在那个时代出生的孩子双亲不在的比例相当高。同样的,在49年后,90年代以前,中国拥有最为严密的人口管理体系,从出生到死亡,都能够凭借各种各样的证明循迹,每个人都是“体制内”的人,缺一张证明便可以将你驱逐到“体制外”,无异于等死。

因此,虽然花了很多时间,我感觉到仍然可以理解1989年那些对着连冷兵器都没准备的民众开枪的动机。本来去当兵,就是因为家庭有困难或者想有更好的出路,如果因为不服从命令,被开除到了“体制外”,那么一切都没了。每个人都是自私的,而开除到“体制外”的不确定性是如此可怕。另一方面,由于命令链仍然能够有效传达,士兵的罪恶感通过这种命令传达得到了极大的减轻。他们有着这样的心理暗示:这不是我的责任,而是“那个叫我开枪的人”的责任。

然而,现在已经有了些许的区别。大部分普通士兵退伍后是一次性的发钱,自由择业;而军官如果退休也只是到地方享受同级别待遇。他们失去了“体制”的枷锁,而获得了拥抱新世界的权利。失去了枷锁的他们让我们能够抛开过去,拥抱他们。

如果你身边有军队的朋友,无论是普通士兵也好,军官也好,请教他们上网,教他们学会看看外面的世界。如果有机会,请带他们去国外走一走,看一看。他们训练或许很辛苦,多打打电话,告诉他们一些网上的奇闻轶事。如果他们的家庭本来不好,那么多花点时间,花点闲钱照顾一下他们的父母,如同照顾自己的父母就行。

在某一天,我们或许要走上街头,那时候,带着热武器站在我们面前的将是我们素未谋面的其他朋友的朋友,而他们清楚的知道,摆在面前的是一道清楚的选择题:是等待那0.001%的机会能够坐在民众头上作威作福,还是和妻儿父亲朋友一起享受富足的无忧无虑的自由生活。我们知道并相信,站在对面的朋友不是体制的工具,他们是有血肉的人,能够自己做出理性的回答。(当然,我们还得先一步破坏命令链,让决定成为真正的,我们的朋友自己的决定)

No comment yet
December 29th, 2009

觉得这两年就要表明立场了,到底是反党还是反人民。再搅浑水过几年局势清楚了就捞不到什么利益了。

No comment yet
December 23rd, 2009

那些打着环保主义幌子的人是彻底的人类中心主义者。难道你真的相信这个星球脆弱不堪多排放点二氧化碳就会崩溃?这个星球的调整能力很强,退一万步说,哪怕真的有影响,这个星球把人类全灭了也就搞定二氧化碳了,至于拿行星的前途命运作为筹码么。还是关心关心那些涂上石油的企鹅实际多了,多可怜啊。

No comment yet
December 19th, 2009

It is always a good idea to revise some basic settings for your work after years. I was motivated by the work of revising basic ideas of AI research (http://web.mit.edu/newsoffice/2009/ai-overview-1207.html) and thinking about revising some basic concepts in my research related area (much smaller). One of my interested area in computer vision is local feature descriptor. Though you can examine local feature descriptor densely, it is always more economically to use an interesting point detector as a preprocess step.

The problem of using interesting point detector or not boils down to two fundamentally different paths for object detection some researchers will describe as feature-centric and window-centric detection. It is a curious case to investigate that we human beings are more likely to use feature-centric detection for observation. The feature-centric solution usually gives a reasonable good (where the object is obvious) result within much less time. I'd like to avoid the word of "superior" in this case since window-centric method is usually better for dedicate object detection (the case you have tens of thousands positive examples).

Interesting point detector is important if we attempt to gain cheap speed up with some loss of accuracy. The problem is so true in Internet age that the daily uploaded photo is about 0.2 million in Flickr which makes the densely examination nearly impossible. For many widely adopted local feature descriptors, authors themselves purposed their own interesting point detectors (local maxima in DoG for SIFT, local maxima in approximate hessian pyramid for SURF etc.). There are combinations and cross examinations to test which interesting point detector works better with which local feature descriptor. Few work analysed why certain interesting point detector works better with certain local feature descriptor other than provided empirical results.

Because most interesting point detectors are actually corner point detector, it tends not work well with small objects, objects with large plain indistinguishable surface or objects with complex 3D structure. However, the ambitious current state-of-art descriptors are hoping to have a general good performance in every case. Thus, I believe with careful designed experiments, some improvements can be done for repeatability and representativeness of interesting point detector based on sampling local feature descriptor.

No comment yet
December 17th, 2009

今天终于见识到学术大牛的忽悠了。那吹得叫一个清新脱俗,把一个URL缩短服务(好吧好吧,还有很多,比如只是RSS输出的URL缩短服务,还支持Mashup)能吹成下一代互联网技术,还扯了一个小时的淡。相比之下,我只能算是解构型人才了,一个巨NB无比的概念经过我口就变得平淡无奇了。

No comment yet
December 8th, 2009

没有100%的公平,也要做到100%的完美,总是会有回报的时候。即使没有,只要做了自认为对人类实在有益的事情,就能默默地NB下去。人类不认可上帝创造了人,也没见他老人家把人给回炉重造,而是默默地NB下去。

No comment yet
December 4th, 2009

一箪食,一瓢饮,在陋巷。人不堪其忧,回也不改其乐。贤哉回也!

No comment yet
December 2nd, 2009

Introduction

In past decades, the maturity of hardware and design of large-scale system incubated number of state-of-art image retrieval systems. Early days' research includes query-refine-query model which people believe would eventually approximate the desired picture through this kind of "boosting" process. Later researches are more focus on the accuracy of image retrieval. Instead of naive global features such as color histogram or global momentum, a class of local feature descriptors are proved to be more accuracy and robust for deformation and lumination change. Some research show scale does matter as they reveal that the result on 10 million scale is much better than it on 100,000 scale.

Recent years, the scale problem becomes the central interests in CBIR system. M-trees, local sensitive hashing are named few that showed successful application in consumer market. However, all the efforts are paid to improve the recall rate of similarity measure itself is not flawless. The class of local feature descriptors are intended to capture identical objects in the scene which has little ability to derive similarity model for an object. Some recent local feature descriptors such as self-similarity descriptor shows its potential to reveal visual structure of object and ability to capture correct object with hand-sketch. But these additional abilities only make it more vague about what is the intention of querist for ''similar'' image if only one image or even chain of images (like what we do in old days) are provided.

Full-text search system has this fuzzy too. Usually, people have to iterate several times in order to get better results. But for structured text, there is no such problem. Once text is formalized, the meaning is clear and can be calculated. Any formalized language, such as structured query language, can analyze formalized text.

Because the lacking of image database architecture that sophisticated enough for interesting applications. Many existing applications that ultimately utilize image database end up to create the image database from scratch.

Developing a new language for query and manipulate image database will strength the image database to fit more challenging problems. Further more, as a language, it can liberate developers from the details of implementation and focus on interesting applications.

Targeting Database__

Database definition

I narrow the database that the language operates on to image database. Especially, it has no relations between whatsoever. Image database contains several image objects. These objects has a full list of properties, such as EXIF header, and global/local feature descriptors. There are mutable and immutable properties. For most text/number fields (such as EXIF header), they are mutable. All feature descriptors are immutable. You can certainly fork a new image that has different feature descriptors which will describe later.

Database discussion

Because there is no relations between image objects, a SQL-like query will downgrade to combination of binary operators. Only queries on text/number fields is not good enough for image database. After all, content-based methods are hard to embedded into the SQL-like query language.

Language Objective

The language for query should have minimum syntax. It should easy to query, and easy to manipulate and output human-readable result set. It has fewer keywords (possibly no keywords) and extensible properties. Property fields should be scoped and protected to minimize developer's mistakes. It should be as powerful as any query language on text/number fields. There should be no magic, every syntax is explainable by underlying mechanism and no special methods that requires specific knowledge to interpreter. It is not a general purpose language, so that Turing-complete is not important. If it is possible, make the language have restricted data dependency that benefits parallelism. By all means, it can really reduce the workloads for developing real-world applications based on image database.

Language Examples

Before dig more into the details of the language, I'd like to present several examples to show why it has to be in this particular form.

Estimating geographic information from single image

This is a research project in CMU. Here I will rephrase their main contribution with the new language.

q(gist(#OwkIEjNk8aMkewJ) > 0.2 and gps != nil) (
    oa = nil
    a = ^.reshape(by=[gist(#OwkIEjNk8aMkewJ), gps.latitude, gps.longitude])
    while (a != oa) (
        oa = a
        a = a.foreach() <$> (
            r = ^.q(gps.latitude < $[1] + 5 and
                gps.latitude > $[1] - 5 and
                gps.longitude < $[2] + 5 and
                gps.longitude > $[2] - 5)
            ? (r.size < 4) (return yield nil)
            ot = r.gist(#OwkIEjNk8aMkewJ)' * r.gist(#OwkIEjNk8aMkewJ)
            yield [ot,
                r.gist(#OwkIEjNk8aMkewJ)' * r.gps.latitude / ot,
                r.gist(#OwkIEjNk8aMkewJ)' * r.gps.longitude / ot]
        )
    )
    return [a.sort(by=@0)[0][1],
        a.sort(by=@0)[0][2]]
)

The first impression is that it is similar to functional programming language. Some syntax are very similar to Matlab. The language natively support matrix/vector operation and hope to speed up these operations. Though syntax are similar, while and foreach are not keywords any more. They are ordinary methods.

The script above first takes images that similar to #OwkIEjNk8aMkewJ (image identifier) on gist feature with threshold 0.2 and have gps information. The while loop is to calculate windowed mean-shift until it converged. Then return the location with maximal likelihood.

It is an interesting case to observe how the philosophy behind interact with real-world case.

Sketch2Photo: Internet Image Montage

It is a research project in Tsinghua University which presented on SIGGRAPH Asia 2009. It uses a rather complex algorithm for image blending, here I will only use poisson edit technique instead.

q() (
    sunset_beach = ^.q(tag="sunset" and tag="beach").shuffle()[0].image
    wedding_kiss = ^.q(ssd(#Le6aq9mkj38fahjK)).sort(by=ssd(#Le6aq9mkj38fahjK))[0].image(region=ssd(#Le6aq9mkj38fahjK))
    sail_boat = ^.q(ssd(#ewf_kefIwlE328f2)).sort(by=ssd(#ewf_kefIwlE328f2))[0].image(region=ssd(#ewf_kefIwlE328f2))
    seagull = ^.q(ssd(#94xJ9WEkehR82-3j)).sort(by=ssd(#94xJ9WEkehR82-3j))[0].image(region=ssd(#94xJ9WEkehR82-3j))
    sunset_beach.poisson(at=(120, 100), with=wedding_kiss)
    sunset_beach.poisson(at=(50, 50), size=50%, with=sail_boat)
    sunset_beach.poisson(at=(240, 10), size=20%, with=seagull)
    return sunset_beach
)

The code is very straightforward. Here I ignore the fact that the real system has user-interactive part and just stick everything together.

Language Syntax (Draft)

Type

There are seven types in this language: nil, Boolean, Number, String, Image, Object, Array. All six types are very intuitive, the reason why makes Image as a basic type is to satisfy the human-readable philosophy. Though every image can be represented by a 3-D array, it is not feasible to output a huge 3-D array to end-user. We human need a more readable case for image output. Only support in language level will give such flexibility.

Keyword

There are two keywords in this language: \emph{yield} and \emph{return}. But I am intended to eliminate these two in the future. There are several special characters however that is useful. ^\ is the same as \emph{this} in traditional language. $<$ will refer to last condition statement. array.@n where n is an integer, is the same as array[n] which will serve as identifier for indices.

Function

Function is in the core aspect of the language. However, it is arguably if it is useful to enable user-defined functions in such a light-weight language. Functions in this section are all about built-in functions.

For built-in function, there are two parts: \emph{condition} and \emph{action}. Form a call to function is flexible, you can ignore the \emph{action} part or both. A function call take this form:

\emph{function} (\emph{condition}) $<$\emph{self}$>$ (\emph{action})

For common case, \emph{action} part will executed when the function ends. But it solely depends on how the function's decision about what to do with action. However, one thing is certain, with \emph{return} keyword in \emph{action}, it will immediately return whatever value to the caller. The \emph{yield} keyword will, on the other hand, return value to the function itself. \emph{self} part is optional, it specifies how the script in \emph{action} part refer to the result value of function itself, by default, it is ^\ .

Control Structure

There exist basic control structures in this language such as while, foreach and if. However, they are all functions now. In the two examples provided before, you can see how it functions. Since all functions are taking \emph{condition} as parameter, it is very natural to make if and while as a function. foreach is not a global function, it is scoped to only result set. Usually \emph{yield} are coupled with foreach to generate a new result array. You can use keyword \emph{return} to jump out of a loop of if statement if you want. To jump out of a nested function call, you need to nest the \emph{return} statement.

Script Structure

A simple usage of this language can only take advantage of its query feature, call q function.

q(tag="rose")

More sophisticated operation requires to extend the q (like what I do in the first two examples). It looks much like C's main function statement. But for outside observer, the only input and output can achieved through return value and calling condition instead of standard i/o stream.