那些打着环保主义幌子的人是彻底的人类中心主义者。难道你真的相信这个星球脆弱不堪多排放点二氧化碳就会崩溃?这个星球的调整能力很强,退一万步说,哪怕真的有影响,这个星球把人类全灭了也就搞定二氧化碳了,至于拿行星的前途命运作为筹码么。还是关心关心那些涂上石油的企鹅实际多了,多可怜啊。
It is always a good idea to revise some basic settings for your work after years. I was motivated by the work of revising basic ideas of AI research (http://web.mit.edu/newsoffice/2009/ai-overview-1207.html) and thinking about revising some basic concepts in my research related area (much smaller). One of my interested area in computer vision is local feature descriptor. Though you can examine local feature descriptor densely, it is always more economically to use an interesting point detector as a preprocess step.
The problem of using interesting point detector or not boils down to two fundamentally different paths for object detection some researchers will describe as feature-centric and window-centric detection. It is a curious case to investigate that we human beings are more likely to use feature-centric detection for observation. The feature-centric solution usually gives a reasonable good (where the object is obvious) result within much less time. I’d like to avoid the word of “superior” in this case since window-centric method is usually better for dedicate object detection (the case you have tens of thousands positive examples).
Interesting point detector is important if we attempt to gain cheap speed up with some loss of accuracy. The problem is so true in Internet age that the daily uploaded photo is about 0.2 million in Flickr which makes the densely examination nearly impossible. For many widely adopted local feature descriptors, authors themselves purposed their own interesting point detectors (local maxima in DoG for SIFT, local maxima in approximate hessian pyramid for SURF etc.). There are combinations and cross examinations to test which interesting point detector works better with which local feature descriptor. Few work analysed why certain interesting point detector works better with certain local feature descriptor other than provided empirical results.
Because most interesting point detectors are actually corner point detector, it tends not work well with small objects, objects with large plain indistinguishable surface or objects with complex 3D structure. However, the ambitious current state-of-art descriptors are hoping to have a general good performance in every case. Thus, I believe with careful designed experiments, some improvements can be done for repeatability and representativeness of interesting point detector based on sampling local feature descriptor.
今天终于见识到学术大牛的忽悠了。那吹得叫一个清新脱俗,把一个URL缩短服务(好吧好吧,还有很多,比如只是RSS输出的URL缩短服务,还支持Mashup)能吹成下一代互联网技术,还扯了一个小时的淡。相比之下,我只能算是解构型人才了,一个巨NB无比的概念经过我口就变得平淡无奇了。
没有100%的公平,也要做到100%的完美,总是会有回报的时候。即使没有,只要做了自认为对人类实在有益的事情,就能默默地NB下去。人类不认可上帝创造了人,也没见他老人家把人给回炉重造,而是默默地NB下去。
Introduction
In past decades, the maturity of hardware and design of large-scale system incubated number of state-of-art image retrieval systems. Early days’ research includes query-refine-query model which people believe would eventually approximate the desired picture through this kind of “boosting” process. Later researches are more focus on the accuracy of image retrieval. Instead of naive global features such as color histogram or global momentum, a class of local feature descriptors are proved to be more accuracy and robust for deformation and lumination change. Some research show scale does matter as they reveal that the result on 10 million scale is much better than it on 100,000 scale.
Recent years, the scale problem becomes the central interests in CBIR system. M-trees, local sensitive hashing are named few that showed successful application in consumer market. However, all the efforts are paid to improve the recall rate of similarity measure itself is not flawless. The class of local feature descriptors are intended to capture identical objects in the scene which has little ability to derive similarity model for an object. Some recent local feature descriptors such as self-similarity descriptor shows its potential to reveal visual structure of object and ability to capture correct object with hand-sketch. But these additional abilities only make it more vague about what is the intention of querist for ‘‘similar’’ image if only one image or even chain of images (like what we do in old days) are provided.
Full-text search system has this fuzzy too. Usually, people have to iterate several times in order to get better results. But for structured text, there is no such problem. Once text is formalized, the meaning is clear and can be calculated. Any formalized language, such as structured query language, can analyze formalized text.
Because the lacking of image database architecture that sophisticated enough for interesting applications. Many existing applications that ultimately utilize image database end up to create the image database from scratch.
Developing a new language for query and manipulate image database will strength the image database to fit more challenging problems. Further more, as a language, it can liberate developers from the details of implementation and focus on interesting applications.
Targeting Database__
Database definition
I narrow the database that the language operates on to image database. Especially, it has no relations between whatsoever. Image database contains several image objects. These objects has a full list of properties, such as EXIF header, and global/local feature descriptors. There are mutable and immutable properties. For most text/number fields (such as EXIF header), they are mutable. All feature descriptors are immutable. You can certainly fork a new image that has different feature descriptors which will describe later.
Database discussion
Because there is no relations between image objects, a SQL-like query will downgrade to combination of binary operators. Only queries on text/number fields is not good enough for image database. After all, content-based methods are hard to embedded into the SQL-like query language.
Language Objective
The language for query should have minimum syntax. It should easy to query, and easy to manipulate and output human-readable result set. It has fewer keywords (possibly no keywords) and extensible properties. Property fields should be scoped and protected to minimize developer’s mistakes. It should be as powerful as any query language on text/number fields. There should be no magic, every syntax is explainable by underlying mechanism and no special methods that requires specific knowledge to interpreter. It is not a general purpose language, so that Turing-complete is not important. If it is possible, make the language have restricted data dependency that benefits parallelism. By all means, it can really reduce the workloads for developing real-world applications based on image database.
Language Examples
Before dig more into the details of the language, I’d like to present several examples to show why it has to be in this particular form.
Estimating geographic information from single image
This is a research project in CMU. Here I will rephrase their main contribution with the new language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
q(gist(#OwkIEjNk8aMkewJ) > 0.2 and gps != nil) (
oa = nil
a = ^.reshape(by=[gist(#OwkIEjNk8aMkewJ), gps.latitude, gps.longitude])
while (a != oa) (
oa = a
a = a.foreach() <$> (
r = ^.q(gps.latitude < $[1] + 5 and
gps.latitude > $[1] - 5 and
gps.longitude < $[2] + 5 and
gps.longitude > $[2] - 5)
? (r.size < 4) (return yield nil)
ot = r.gist(#OwkIEjNk8aMkewJ)' * r.gist(#OwkIEjNk8aMkewJ)
yield [ot,
r.gist(#OwkIEjNk8aMkewJ)' * r.gps.latitude / ot,
r.gist(#OwkIEjNk8aMkewJ)' * r.gps.longitude / ot]
)
)
return [a.sort(by=@0)[0][1],
a.sort(by=@0)[0][2]]
)
The first impression is that it is similar to functional programming language. Some syntax are very similar to Matlab. The language natively support matrix/vector operation and hope to speed up these operations. Though syntax are similar, while and foreach are not keywords any more. They are ordinary methods.
The script above first takes images that similar to #OwkIEjNk8aMkewJ (image identifier) on gist feature with threshold 0.2 and have gps information. The while loop is to calculate windowed mean-shift until it converged. Then return the location with maximal likelihood.
It is an interesting case to observe how the philosophy behind interact with real-world case.
Sketch2Photo: Internet Image Montage
It is a research project in Tsinghua University which presented on SIGGRAPH Asia 2009. It uses a rather complex algorithm for image blending, here I will only use poisson edit technique instead.
1
2
3
4
5
6
7
8
9
10
q() (
sunset_beach = ^.q(tag="sunset" and tag="beach").shuffle()[0].image
wedding_kiss = ^.q(ssd(#Le6aq9mkj38fahjK)).sort(by=ssd(#Le6aq9mkj38fahjK))[0].image(region=ssd(#Le6aq9mkj38fahjK))
sail_boat = ^.q(ssd(#ewf_kefIwlE328f2)).sort(by=ssd(#ewf_kefIwlE328f2))[0].image(region=ssd(#ewf_kefIwlE328f2))
seagull = ^.q(ssd(#94xJ9WEkehR82-3j)).sort(by=ssd(#94xJ9WEkehR82-3j))[0].image(region=ssd(#94xJ9WEkehR82-3j))
sunset_beach.poisson(at=(120, 100), with=wedding_kiss)
sunset_beach.poisson(at=(50, 50), size=50%, with=sail_boat)
sunset_beach.poisson(at=(240, 10), size=20%, with=seagull)
return sunset_beach
)
The code is very straightforward. Here I ignore the fact that the real system has user-interactive part and just stick everything together.
Language Syntax (Draft)
Type
There are seven types in this language: nil, Boolean, Number, String, Image, Object, Array. All six types are very intuitive, the reason why makes Image as a basic type is to satisfy the human-readable philosophy. Though every image can be represented by a 3-D array, it is not feasible to output a huge 3-D array to end-user. We human need a more readable case for image output. Only support in language level will give such flexibility.
Keyword
There are two keywords in this language: \emph{yield} and \emph{return}. But I am intended to eliminate these two in the future. There are several special characters however that is useful. \^\ is the same as \emph{this} in traditional language. $<$ will refer to last condition statement. array.@n where n is an integer, is the same as array[n] which will serve as identifier for indices.
Function
Function is in the core aspect of the language. However, it is arguably if it is useful to enable user-defined functions in such a light-weight language. Functions in this section are all about built-in functions.
For built-in function, there are two parts: \emph{condition} and \emph{action}. Form a call to function is flexible, you can ignore the \emph{action} part or both. A function call take this form:
\emph{function} (\emph{condition}) $<$\emph{self}$>$ (\emph{action})
For common case, \emph{action} part will executed when the function ends. But it solely depends on how the function’s decision about what to do with action. However, one thing is certain, with \emph{return} keyword in \emph{action}, it will immediately return whatever value to the caller. The \emph{yield} keyword will, on the other hand, return value to the function itself. \emph{self} part is optional, it specifies how the script in \emph{action} part refer to the result value of function itself, by default, it is \^\ .
Control Structure
There exist basic control structures in this language such as while, foreach and if. However, they are all functions now. In the two examples provided before, you can see how it functions. Since all functions are taking \emph{condition} as parameter, it is very natural to make if and while as a function. foreach is not a global function, it is scoped to only result set. Usually \emph{yield} are coupled with foreach to generate a new result array. You can use keyword \emph{return} to jump out of a loop of if statement if you want. To jump out of a nested function call, you need to nest the \emph{return} statement.
Script Structure
A simple usage of this language can only take advantage of its query feature, call q function.
q(tag=”rose”)
More sophisticated operation requires to extend the q (like what I do in the first two examples). It looks much like C’s main function statement. But for outside observer, the only input and output can achieved through return value and calling condition instead of standard i/o stream.
本来是在写篇讲如何架构创业团队的文章。突然觉得不高兴写了,况且我也没太多资本在这个问题上发言。
越来越喜欢Grimshaw的课了。就感觉这种有钱的教授讲课没有压迫感,因为自己的心情本来就好,然后还用一些很有趣的词。dump into memory啊,make program scream啊之类的,这些动词太可爱了。
Final Year Project的Proposal赶出来了,回头还得找几个Prof聊聊看有没有什么大的纰漏,自己觉得还挺好的。然后自己的项目也要开始忙的,呃,Flash啊,我真不熟;还得赶快出明年会议的稿子。
看到illusion出了个游戏,叫Real Kanojo,传说用体感控制来玩的。结果居然只是调用的OpenCV 1.0版本的Face Detector。您好歹也是个大企业,专业点行不。
Bullshitting到此为止,其实我想发个愿,明年考GRE能让偶上1600不,刺激下某人。
-
Clearly stated philosophy is important, but hard to keep;
-
Some user cases are optimal, some are not;
-
Flexibility means hard to implement.
Though it rains a day outside, 我却心情大好,反反复复跑了两周的检测器结果不错,听Prof扯淡很开心。烤个香肠吃。