No comment yet
May 22nd, 2019

War starts when people fail to communicate with each other. The current U.S. and China dispute is so complex and overreaching, any rational discussions online can devolve into flame wars. There are so many topics, making the multi-variable optimizations difficult. Overlaying all this with a gloomy long-term implications of technology, it is far easier to just pick a side and rooting for the red / blue team.

The Gloomy Long-Term Implications of Technology

It is far easier for the Bay Area people thinking themselves as a force of good. But the technology we developed over the past a few years greatly expanded central governments’ ability. It is too easy to track down a person, collect all their communication records, for profiling and categorization. Alternative technologies to combat these implications such as end-to-end encryption can be easily outlawed at governments’ will. It is pleasantly surprising to see the United States resisted so long. As Republican given up their ideology completely for the totalitarianism fantasy, finally, the expansion of the executive branch power will result, not necessarily a president for life (although likely), but at the very least, a one-party state. Whether it is Republican or Democrat are besides the point. Populists, on either far right or far left, come dangerously close in ideology terms. After all, the United States has a Republican president running unprecedented fiscal deficit and issuing orders to anyone by the name of national security right now.

The Chinese has been playing the one-party state game for too long. The art of ruling, lies in appeasing many, allowing a few to vent, and exterminating anomalies. The digital technologies allow them to scale up. With such surveillance power, the crime rate will fall, so does the freedom.

The Gear Up to a New War

When a new war begins to break out, both sides first stop talk with each other. The media on both sides seem to have agendas. In China, the media appeals to the nationalistic honor, tries to remind the average Chinese the the past under western imperialism with Opium War and Korean War. In the United States, the media paints an evil axis of China, tries to gain a moral high-ground for the U.S. position. The sheer number of fanatics for both sides makes civil discussion impossible. It seems that media are well-positioned to setup the war between the two power.

What the United States Wants

The current trade war is difficult partly because the United States demands are fairly opaque. It is a baggage of things, ranging from pure economical to pure political. It is understandable because the Trump administration are not known for making crisp clear demands. There are feelings, numbers, ideologies, all bagged together in the trade deal.

The Feelings: the United States felt that they were in a one-sided relationship. In the past two decades, it benefitted more to the Chinese. This can be seen from stagnation of the U.S. growth and the stellar growth of China. More specifically, the feeling can be seen from the broad ban of the U.S. internet companies in China, the joint-venture requirements for any U.S. adventures to the Chinese domestic market. The fact of great many made-in-China products means the less of made-in-America. That again, attributed back to the stagnation of the U.S. common people for the past decade.

The Numbers: the United States sees the hard-cold trade imbalance as a proof that the relationship is truly one-sided. If the Americans make less than the Chinese from this relationship, isn’t it enough to prove the United States lost?

The Ideologies: to many Americans, the Communist China by the prefix is evil. The behavior in Tibet, Xinjiang and South China Sea is a proof the communists will go far to suppress oppositions. Many years of propaganda in the United States attributed the end of the Cold War to the superiority of Capitalism over Communism (rather than, for example, the open government over the authoritarian government).

It makes the U.S. demand unlikely to be simply economical. If the U.S. wants a balanced trade, the problem should already be solved last year. The Chinese wants to buy from the U.S. to the extent of anything the U.S. wants to sell. The agricultural products in a little over the past decade rose from 0% to almost 20% of total U.S. exports to China. There are a long list of things that the Chinese want to import but banned by national security reasons.

Beyond the economical demands, the U.S. wants to fix the open-market problem. The Chinese was quick to extend the olive branch on that front with the 100% Tesla-owned factory in China, even with some Chinese investments.

The sticky points, lie in the alleged IP theft, cyber warfare and the humanitarian concerns. The Chinese was quick to promise. But the United States wants more than a promise.

What’s China’s Red Line

One misunderstanding from the U.S. media and discussions, is how serious the Chinese regarding sovereignty. There are many disputes in China about how the slow progress to implement open-market hurts the mutual trusts within WTO. During the interview with Ren Zhengfei on May 21st, he mentioned this as well. The humanitarian aspects with current regime is another topic has many resonating audience within China. However, imposing a U.S. based overseeing body in the Chinese governing system is difficult for the Chinese to swallow. The sovereignty issue is a big part of Chinese education in the past half a century. The extraterritorial rights granted to westerners since Opium War are something the Chinese will not forget.

The Endgame

With the United States being the only world super power, it has the full range of options to play out the endgame. Given the unpredictability of the Trump administration, the trade war could end tomorrow with only a lip service to appeal the electoral base. It is always back to what the United States views China in the long term. If the United States sees its role to contain China and sees China as the evil axis that endangers the U.S. dominated world order, the United States should escalate fearlessly to a war with China while it can, do what it is the most familiar with (toppling the regime). The consequence of that, is a far weaker, poorer China, with 1.5 billion people that cannot feed themselves. I wish to appeal to many of my American friends, this is an undesirable humanitarian dilemma.

Alternatively, the United States could fool itself into the sanction game. Even without coordinated efforts with Europe and Japan, the sanction from the United States will greatly damage the Chinese with limited negative impact to the U.S. corporations. However, it is unlikely the United States will see a more friendly China there. With the us v.s. them mentality, it is hard to imagine a pro-American regime being born that way. An inward-looking China will ultimately poses greater threat than an outward-looking one.

The United States has to recognize that without a hot war, it needs to work with China. The shared sovereignty request is not acceptable, by both the regime and the people. On the other hand, if the United States wants a friendlier China, the demands should be a rule-based mechanism that enforces IP protection and the participation of foreign capital. The right to participate made-in-China 2025 would also be a far more interesting play for the United States than forcing China to abandon them.

No comment yet
August 15th, 2018

When programming with CUDA, there are several ways to exploit concurrency for CUDA kernel launches. As explained in some of these slides, you can either:

  1. Create thread corresponding each execution flow, execute serially on stream per thread, coordinate with either cudaEventSynchronize or cudaStreamSynchronize;
  2. Carefully setup CUDA events and streams such that the correct execution flow will follow.

The 2. seems more appealing to untrained eyes (you don’t have to deal with threads!) but in practice, often error-prune. One of the major issue, is that the cudaEventRecord / cudaStreamWaitEvent pair doesn’t capture all synchronization needs. Comparing this to Grand Central Dispatch provided primitives: dispatch_group_enter / dispatch_group_leave / dispatch_group_notify, the under-specified part is where the cudaEventEnter happens. This often leads to a surprising fact that when you cudaStreamWaitEvent on a event not yet recorded on another stream (with cudaEventRecord), the current stream will treat as if this event is already happened and won’t wait at all.

This is OK if your execution flows is static, thus, all the kernels need to be executed on which stream, are fully specified upfront. Requires some careful arrangement? Yes, but it is doable. However, it all breaks down if some coordinations need to happen after some kernel computations are done. For example, based on the newly computed losses, to determine whether decrease learn rate or not. Generally-speaking, for any computation graph that supports control structure, these coordinations are necessary.

The obvious way to solve this, is to go route 1. However, that imposes other problems, especially given pthread’s handling of spawn / join is something much left to be desired.

For a few brave souls wanting to go route 2. to solve this, how?

After CUDA 5.x, a new method cudaStreamAddCallback is provided. This method itself carries some major flaws (before Kepler, cudaStreamAddCallback could cause unintended kernel launch serializations; the callback itself happens on the driver thread; and you cannot call any CUDA API inside that callback). But if we can gloss over some of these fundamental flaws and imagine, here is how I could make use of it with the imaginary cudaEventEnter / cudaEventLeave pair.

At the point I need to branch to determine whether to decrease learn rate, before cudaStreamAddCallback, I call cudaEventEnter to say that a event need to happen before certain stream to continue. Inside the callback, I get the loss from GPU, makes the decision, and call cudaEventLeave on the right event to continue the stream I want to branch into.

In real world, the above just cannot happen. We miss cudaEventEnter / cudaEventLeave primitives, and you cannot do any CUDA API call inside such callback. More over, the code will be complicated with these callbacks anyway (these are old-fashioned callbacks, not even lambda functions or dispatch blocks!).

What if, I can write code as if it is all synchronous, but under the hood, it all happens on one thread, so I don’t have to worry about thread spawn / join when just scheduling work from CPU?

In the past a few days, I’ve been experimenting how to make coroutines work along cudaStreamAddCallback, and it seems all working! To make this actually useful in NNC probably will take more time, but I just cannot wait to share this first :P

First, we need to have a functional coroutine implementation. There are a lot stackful C coroutine implementations online and my implementation borrowed heavily from these sources. This particular coroutine implementation just uses makecontext / swapcontext / getcontext.

Setup basic data structures:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
union ptr_splitter {
	void *ptr;
	uint32_t part[2];
};

static const int default_stack_size = 65536;

typedef struct schd_s schd_t;
typedef struct task_s task_t;
typedef void (*task_fn_t)(task_t *task);

struct task_s {
	struct task_s* prev;
	struct task_s* next;
	schd_t* schd;
	int done;
	struct task_s* waitfor;
	// For swapcontext / makecontext / getcontext.
	ucontext_t context;
	char *stack;
	task_fn_t fn;
};

struct schd_s {
	task_t* head;
	task_t* tail;
	struct {
		int suspend;
	} count;
	pthread_cond_t cv;
	pthread_mutex_t mutex;
	ucontext_t caller, callee;
};

Setup a main run loop that can schedule coroutines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
static void deltask(schd_t* const schd, task_t* const t)
{
	if (t->prev)
		t->prev->next = t->next;
	else
		schd->head = t->next;
	if (t->next)
		t->next->prev = t->prev;
	else
		schd->tail = t->prev;
}

static void* schdmain(void* userdata)
{
	schd_t* const schd = (schd_t*)userdata;
	for (;;) {
		pthread_mutex_lock(&schd->mutex);
		// No one is waiting, and no more tasks. exit.
		if (schd->head == 0 && schd->count.suspend == 0)
		{
			pthread_mutex_unlock(&schd->mutex);
			break;
		}
		if (schd->head == 0)
		{
			pthread_cond_wait(&schd->cv, &schd->mutex);
			pthread_mutex_unlock(&schd->mutex);
			continue;
		}
		task_t* const t = schd->head;
		deltask(schd, t);
		pthread_mutex_unlock(&schd->mutex);
		swapcontext(&schd->caller, &t->context);
		t->context = schd->callee;
		if (t->done)
			taskfree(t);
	}
	return 0;
}

Now, create a new task:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
static void _task_entry_point(uint32_t part0, uint32_t part1)
{
	union ptr_splitter p;
	p.part[0] = part0;
	p.part[1] = part1;
	task_t *task = (task_t*)p.ptr;
	task->fn(task);
	task->done = 1;
	swapcontext(&task->schd->callee, &task->schd->caller);
}

static task_t* taskcreate(schd_t* const schd, task_fn_t fn)
{
	task_t *task = (task_t*)calloc(1, sizeof(task_t));

	task->schd = schd;
	task->stack = (char*)calloc(1, default_stack_size);
	task->fn = fn;

	getcontext(&task->context);
	task->context.uc_stack.ss_sp = task->stack;
	task->context.uc_stack.ss_size = default_stack_size;
	task->context.uc_link = 0;

	union ptr_splitter p;
	p.ptr = task;
	makecontext(&task->context, (void (*)(void))_task_entry_point, 2, p.part[0], p.part[1]);
	return task;
}

static void addtask(schd_t* const schd, task_t* const t)
{
	if (schd->tail)
	{
		schd->tail->next = t;
		t->prev = schd->tail;
	} else {
		schd->head = t;
		t->prev = 0;
	}
	schd->tail = t;
	t->next = 0;
}

static void taskfree(task_t* const task)
{
	task_t* waitfor = task->waitfor;
	while (waitfor)
	{
		task_t* const next = waitfor->next;
		addtask(task->schd, waitfor);
		waitfor = next;
	}
	free(task->stack);
	free(task);
}

Usual utilities for coroutine (ability to yield, launch a new coroutine, and wait for existing coroutine to finish):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static void taskyield(task_t* const task)
{
	addtask(task->schd, task);
	swapcontext(&task->schd->callee, &task->schd->caller);
}

static void taskresume(task_t* const task)
{
	ucontext_t old_context = task->schd->caller;
	swapcontext(&task->schd->caller, &task->context);
	task->context = task->schd->callee;
	task->schd->caller = old_context;
	if (task->done) // If the task is done here, we should just remove it.
		taskfree(task);
}

static void taskwait(task_t* const task, task_t* const waiton)
{
	task->prev = 0;
	task->next = waiton->waitfor;
	waiton->waitfor = task;
	swapcontext(&task->schd->callee, &task->schd->caller);
}

With above utilities, you can already experiment with coroutines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
static void g(task_t* const task)
{
	printf("start task %p\n", task);
	taskyield(task);
	printf("back to task %p to finish\n", task);
}

static void f(task_t* const task)
{
	printf("create a new task to resume %p\n", task);
	task_t* gtask = taskcreate(task->schd, g);
	taskresume(gtask); // Run the gtask directly.
	printf("done task %p\n", task);
}

int main(void)
{
	schd_t schd = {};
	pthread_cond_init(&schd.cv, 0);
	pthread_mutex_init(&schd.mutex, 0);
	task_t* task = taskcreate(&schd, f);
	addtask(&schd, task);
	schdmain(&schd);
	pthread_cond_destroy(&schd.cv);
	pthread_mutex_destroy(&schd.mutex);
	return 0;
}

Unsurprisingly, you should be able to see print outs in order of:

1
2
3
4
create a new task to resume 0x288d010
start task 0x289d410
done task 0x288d010
back to task 0x289d410 to finish

coroutine f first executed, it launches coroutine g. When g gives up control (taskyield), coroutine f continues to execute until finish. After that, scheduler resumes coroutine g, and it finishes as well.

You can also try to taskwait(task, gtask) in coroutine f, to see that f will finish only after coroutine g is scheduled again until finish.

So far, we have a functional coroutine implementation in C. Some of these code doesn’t seem to make sense, for example, why we need a mutex and a condition variable? Because a secret function that enables us to wait on a stream is not included above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
static void taskcudaresume(cudaStream_t stream, cudaError_t status, void* userdata)
{
	task_t* const task = (task_t*)userdata;
	pthread_mutex_lock(&task->schd->mutex);
	addtask(task->schd, task);
	--task->schd->count.suspend;
	pthread_cond_signal(&task->schd->cv);
	pthread_mutex_unlock(&task->schd->mutex);
}

static void taskcudawait(task_t* const task, cudaStream_t stream)
{
	pthread_mutex_lock(&task->schd->mutex);
	++task->schd->count.suspend;
	cudaStreamAddCallback(stream, taskcudaresume, task, 0);
	pthread_mutex_unlock(&task->schd->mutex);
	// Compare to taskyield, this function doesn't do addtask(task->schd, task);
	swapcontext(&task->schd->callee, &task->schd->caller);
}

taskcudawait will put the current coroutine on-hold until the said stream finishes. Afterwards, you can do branch, and knowing comfortably kernels in the stream above are all done. The condition variable and the mutex is necessary because the callback happens on the driver thread.

You can see the full code that demonstrated the usage here: https://gist.github.com/liuliu/7366373d0824a915a26ff295c468b6e4

It seems above utilities would cover all my usages (the taskwait and taskresume are important to me because I don’t want too much hard to control async-y when launch sub-coroutines). Will report back if some of these doesn’t hold and I failed to implement fully-asynchronous, control structure supported computation graph with these cute little coroutines.

No comment yet
May 3rd, 2018

NNC is a tiny deep learning framework I was working on for the past three years. Before you close the page on yet another deep learning framework. let me quickly summarize why: starting from scratch enables me to toy with some new ideas on the implementation, and some of these ideas, after implemented, has some interesting properties.

After three years, and given the fresh new takes on both APIs and the implementation, I am increasingly convinced this will also be a good foundation to implement high-level deep learning APIs in any host languages (Ruby, Python, Java, Kotlin, Swift etc.).

What are these fresh new takes? Well, before we jump into that, let’s start with some not-so-new ideas inside NNC: Like every other deep learning framework, NNC operates dataflow graphs. Data dependencies on the graph are explicitly specified. NNC also keeps the separation of symbolic dataflow graphs v.s. concrete dataflow graphs. Again, like every other deep learning framework, NNC supports dynamic execution, which is called dynamic graph in NNC.

With all that get out of the way, the interesting bits:

  • NNC supports control flows, with a very specific while loop construct and multi-way branch construct;

  • NNC implements a sophisticated tensor allocation algorithm that treats tensors as a region of memory, which enables tensor partial reuse;

  • The above allocation algorithm handles control flows, eliminates data transfers for while loop, and minimizes data transfers for branching;

  • Dynamic execution in NNC is implemented on top of its static graph counterpart, thus, all optimization passes available for static graph can be applied when doing automatic differentiation in the dynamic execution mode;

  • Tensors used during the dynamic execution can be reclaimed, there is no explicit tape session or requires_grad flag;

You can read more about it on http://libnnc.org/. Over the next a few months, I will write more about this. There are still tremendous amount of work ahead for me to get to a point of release. But getting ahead of myself and put some pressure on is not a bad thing either :P The code lives in the unstable branch of libccv: ccv_nnc.h.

No comment yet
March 21st, 2018

十年前,我开始每两三年写一篇 BLOG ,内容是关于四年之后的一些预测。预测的原理也很简单,就是根据和大家聊的内容,还有每天看的新闻,做一些合理的臆测。后来也开始正儿八经的先描述一下大致的人文市场政治环境,再做猜测。这么一个设置,让很多地方也就看起来扯得更有道理了。然而追根究底之所以如此,不过是其实接下去几年大家要做什么也都了解得八九不离十。说四年后苹果会生产第十四代 iPhone ,这又有什么好猜的呢。

那么,站在2018年的的刚开始看2022年,又是什么样的呢?

首先,是中国的消除贫困。在2021年,也就是建党一百周年的时候,中国领导人会宣布全面建成小康社会。而在中国的语境下,全面建成小康社会是明确可量的,那就是消除贫困。在2022年,中国人均 GDP 将超过一万元美金。任何其他的情况都没有什么意义,因为如果这2021年中国不能全面建成小康社会,那么整个国际环境也将很不稳定,接下去的预测自然就会漏洞百出了。

美国的衰落是21世纪的主要命题。但就这四年而言,我们只能看到美国衰落的点点迹象,而这个国家仍然是世界经济的主要影响力,是世界科技发展的引擎。

计算性能在混合计算领域仍然有大的提升,单机的计算能力会接近 0.5 Petaflops (全 32bit 浮点),GPU的内存将会达到 48GiB 每片。而移动系统的CPU / GPU发展终于遇到了性能瓶颈。在未来4年,他们的价格仍然会继续下降,但是性能的提升只在两倍左右。移动芯片的大部分工作将在于特殊优化和功能集成。

有了这些铺垫,那么接下去会发生什么就顺理成章了。

在2022年,大部分销售的电动车和豪华车都有了L3的自动驾驶功能。中产阶级更倾向于开电动车,而年收入在三万元以下的家庭则大多会开汽车。也因如此,美国对于电动车的补贴在2022年将会聊胜于无。虽然大部分在售的电动车都有了L3的自动驾驶功能,在2022年,并没有什么加装的自动驾驶组件在卖。

L3:在大部分的高速/低速行驶中不需要人工注意,特殊情况下会警示驾驶员,要求驾驶员干预。

比较意外的是,现在在电动车行业起步较早的传统汽车厂商并没有抢得什么先机。具体而言,宝马的i系销量会越来越不尽人意,而普遍认为已经失掉推出纯电车先机的福特和丰田都推出了成功(总销量在十万辆以上)的纯电动车型。尽管如此,在全球范围内,仍然有两到三家新兴的电动车企业成长了起来。特斯拉,虽然在自动驾驶方面步履蹒跚,也在2020年做到了L3的自动驾驶水平。而Model 3,要么是一个巨大的成功(年销量二十万辆以上),要么是一个中等的成功(年销量八万辆以上)。真正的爆款,或许是一种 Minivan 和紧凑型 SUV 的混合体。

高铁没什么可说的。从孟买到海德拉巴的高铁从2013年就开始跟大家讲,在2022年的尾巴才会勉强完工。而美国西海岸的高铁仍然遥遥无期。

由于油价的稳定和新一代客机的大规模使用,超过16个小时的直线长途航班又多了起来。但基于超音速客机的航班仍然没有运行起来。

HIV 的疫苗终于在未来的四年上市了。这或许是普罗大众所知道的医学方面最重磅的消息了。很多行业都是这样,一些大的变化,都是从很小的地方开始的。比如便宜的无刷电机和芯片的组合,还有因为智能手机而便宜起来的传感器。这些小东西组合起来,就变成了信息化农业落地的重要元素。尤其是在亚洲,未来四年你会逐渐发现一些产量高,产品一致性好的现代化精细生产的农场。利用信息化技术,这些农场的产量达到了普通工业化农场产量的两倍,接近于需要大量劳力的农场产量。

虽然2017年号称是 AR 元年,然而,即使四年之后,也不会有一款大众的AR硬件产品(单品总销量在五百万件以上)。

另一件终于发生的事情,就是亚马逊会赶在2022年结束前,在一些地区开始商用的无人机送货了。这些送货的无人机,和开始在州际公路上测试的无人货车(一位司机,多辆货车)标志着一些低附加值的工作也开始逐渐被机器所取代了。

而我们,在经历了一轮经济的回调之后,或许也不会把比特币作为正常的投资项目了吧。

以上。

Ten years ago, I began to post some predictions of 4-year in the future. The principle of these predictions are simple: it was a combination of things we chatted, things I read, and a stash of reasonable imaginations. Later, to make this a bit more fun and educating, I would also map out the potential market political environments before the prediction. With this setup, everything now looks more systematic and professional. But to be honest, everything that is going to happen in the next a few years has already set in motion today. It won’t be that entertaining to predict that Apple will design the 14th generation of iPhone in 2022.

That’s been said, what it would look like in 2022, now the 2018 starts to unfolding?

First, the elimination of poverty in China. In 2021, the 100th anniversary of Communist Party of China, the leadership in China will announce that they have finished building the moderately prosperous society in all respects. For China, the moderately prosperous society in all respects is a measurable goal, and the end result is the elimination of poverty. In 2022, China’s GDP per capita will reach 10,000 USD. If China cannot reach that goal, everything else is not very meaningful to predict due to the global instability.

The main theme of 21st century is the decline of American power. But in this 4 years, we can only see occasional hints of such, this nation is and continue to be the main player in global economy, and the major powerhouse for technology development.

There are at least 5x improvements in raw computation power from heterogeneous computing paradigm. Single chip can reach 0.5 Petaflops (full 32-bit floating point) by the end of 2022. On-device memory per GPU card can reach up to 48GiB. The view for mobile is not as rosy however. In the next 4 years, the price of mobile system-on-chip (SoC) will continue to go down, but the speed on traditional workload will not improve much, and at max, 2x. More work will go into function-specific optimizations and feature integration.

Now, grand scene has been set, what will happen next?

Cars. In 2022, most production electric cars and luxury vehicles will have level-3 autonomous driving capability. Middle-class will now drive more electric cars while for families with annual income less than $30,000, they will continue to drive cars with internal-combustion engine. Although most electric cars on sales have level-3 autonomous driving capability, there is no viable after-market component for level-3 autonomous driving.

Level-3: No human attention needed in most highway and local environments. The system will alert the driver under certain conditions.

To many people’s surprise, traditional car manufacturers who started early in the electric vehicle market don’t have much first-mover advantage. Specifically, BMW’s i-series sales number will plunge. Ford and Toyota, who were once considered late-comers to the market now both have successful battery electric vehicles (total unit sales exceeding 100,000). Even so, globally, there will be two or three new but established all-electric car manufacturer. Tesla, who had some false starts in autonomous driving technology finally gets to level-3 in 2020. Its Model 3, is either a huge success (200,000 unit sales per year) or a moderate one (80,000 unit sales per year). The most popular battery electric car? We probably haven’t seen it yet, and it is likely to be a cross-over between minivan and compact SUV.

High-speed railway. We’ve been talking about the high-speed railway from Mumbai to Hyderabad since 2013. At the end of 2022, it will finish. The high-speed railway between San Francisco and Los Angeles? It probably hasn’t even broken the ground.

With the stability of oil price, and the mass use of the new generation airplanes, there will be more ultra-long distance non-stop flights (more than 16 hours). There will be no regular supersonic commercial flights by 2022 though.

The HIV vaccine will hit the market in the next 4 years. This probably will be the single most known medical breakthrough in that 4 years. A lot of important breakthroughs, often have some miniscule starts. Cheap brush-less motors, SoC, and cheap sensors, thanks to the ubiquitousness of smartphones. These gadgets becomes the important ingredients of why information agriculture now works. Especially in Asia, you will find some modern lean production farms with high yield and high quality produce. Equipped with information technology, these farms have yield more than 2x of their industrial-farm counterparts, closer to the yield of small labor intensive farming.

2017 was called the origin year of AR. However, even after 4 years, there will be no mass-market successful AR hardware (more than 5 million total unit sales).

And it finally happened, Amazon, just before the end of 2022, starts deliveries with drones in certain area of North America. These delivery drones, along with autonomous trunk on the inter-state high ways (one driver, many trunks), symbolizes the beginning of elimination of low-paying jobs.

Lucky for all of us, after a economic downturn, Bitcoin will stop being an investment vehicle.

No comment yet
June 1st, 2016

Decades have passed before we had a yet high quality consumer software. It is now taken that software supposes to be crashy, laggy and barely functional. Why and how we get here? When the question is asked, many people felt the nostalgia, where the software is simpler, and people crafts their cathedral. They often overlooked the fact that the software we built today, was many orders of magnitudes more complex than software we had in 1980s. Even today’s software with simplest tableau operations, its graphic user interface combining with the complex animations and multi-touch interactions, if, built from scratch, requires many months of developer time.

For what I can remember, concept of quality was popularized in 1970s from Japan. 1970s, through the quest for quality, the Japanese auto industry reached the level of low cost that its American competitors could only dream of.

No comment yet
March 23rd, 2016

四岁的小孩子,睁开眼睛。星辰,银河,月亮,太阳。我站着河边,望着你大笑。都市里的灯光,桌上一碗稀米汤。

No comment yet
September 27th, 2015

离上次预测未来2016已经过去了四年。本打算每两年一次预测,每次预测四年之后的情况。因盖茨曾说过,我们总是高估两年之后的科技发展,而低估十年的科技发展。而十年之前,将奔腾四芯片的运算能力放到手掌大小的运算设备上几乎是不可能的事情。哪怕是八年之前,当我们要订制有当时高端台式机处理能力的一体化电视也是困难重重。

回顾过去四年的预测

虽然四年前的预测现在看来相当精准。但是经济政治稳定的大前提在各国政府的量化宽松政策主导下得以保持,否则这样的预测难言准确。回顾四年前的预测,互联网接入、3D电视悲观市场前景、电视剧点播、计算机性能、辅助驾驶、自拍技术均和现实没有太大出入。然而,对于无线电源的普及、Pad和笔记本融合的趋势、商用超音速客机投产、失业率以及人工智能的发展预测都出现了较大失误,没有预测到无人机的发展。总结起来,一些是过于乐观,而另一些是知识的储备不够。

未来四年经济社会总体概况的预测

对于接下去四年的预测很难以经济社会总体稳定作为前提了。就全球而言,经济增长将会放缓。美国反而会因为向全球输出货币,成为受影响最小的国家之一。四年中,在欧洲,西班牙、希腊等地中海国家的情况很难得到改善。虽然政治情况的发展总比预测的缓慢,一个或多个国家退出欧元区的前景将会变得非常确定。在这样困难重重的国际环境中,日本的情况却在数次有计划的税率提升后有些许的改善。然而,最难以预测的情况是中国。在未来四年,中国可能会出现如下情况中的任意一种:

1). 在货币和经济政策调控下经济成功硬着陆,中国的GDP增长会降到每年4.5%到5.5%。总体上,中国的财政将会更加平衡,作为全球制造工厂,中国所整合的系统生产效率将是其他即使有低成本劳动力的经济体难以企及的。在这种情况下,中国理所应当地成为了人均年GDP产值在九千到一万美金左右的新兴发达国家。

2). 在未来的两年中,中国的GDP增长将降到致命的4.5%或者更低。经济和货币政策在大量的货币对外流失和自2008年以来的资本失控中效果乏乏。社会动乱比想象中的更加容易。地方政府将会疲于应付各种社会暴动。而中央政府可能会和反对党领导人展开对话。而下一步的发展会变得无法预测。

于我而言,选择1作为中国未来四年的背景将是能够做出有意义预测的唯一方法。如果选择2更接近未来四年的现实的话,以下的所有预测将会和现实完全脱钩。

对于印度,我没有任何的系统知识。这也使得我无法预测印度对于全球科技和政治经济的影响。对于俄罗斯和中东产油国,油价将会徘徊在每桶40到100美金。因为油价会在未来四年大幅波动,而俄罗斯的经济也将会受到拖累。

预测的基础

如果想对于任何预测的成功有任何把握的话,我们只有参考过去。在过去的100年中,最重要的事件是人类对于指数级增长的理解。在各种书籍文献中,我们对于指数级增长有着近乎狂热的崇拜。然而,如果只是将指数增长的图表画出来,而不了解其后的科技规律的话,我们也许会遭遇到物理学的根本极限而不自知(但是,过于聪明的理解物理学的极限也会让我们过早放弃对于指数级增长的发掘)。

过去100年人类的指数级增长成为可能只是因为这两个关键字:标准化和大规模生产的效率。现代集这两者为一体的就是iPhone了。如果没有iPhone的生产规模,现代的高分辨率电容触控显示屏将会花费数千美金每平方英寸。但是现在,人人都可以仅花十几美金获得这样的一块屏幕。

标准化和大规模生产,在未来的四年中,将会以各种形式,继续展现其惊人的有效性。

预测

智能硬件的概念提出已经有十多年了。究竟智能硬件应该是什么样的呢?

  • 智能硬件上,它原有的功能应该变得极其傻瓜化。轻松、一次完成、不用动脑的操作;

  • 除去基本的功能以外,智能硬件能够用有限的方法解决一些之前使用中的“痛点”(一个好的例子是可以自动下载云端内容的路由,而能自动预订食物的冰箱就是一个坏案例);

  • 在未来四年,家中的智能硬件不大可能是一种全新的东西。

接下来的四年将是非PC的时代。大多数家庭将不会拥有桌面电脑,虽然在大部分家庭中,所有器件的总计算能力将超过10Tflops。交互式界面也将发生变化。人们将主要通过触控和语音与设备交互。图形化界面将会被对话式界面改造或替代。

虽然有地方冲突和不稳定因素,总体上来说,交通的成本效率更高了。地面交通而言,自动驾驶或者驾驶辅助技术成为新汽车的标配,但离强制标准仍然差很远。曾经Abu Dhabi的PRT虽然不会在中东成功,但类似的交通工具将在一些城市商业化。长距离的交通工具革新在美国仍然没有摆脱实验阶段。不仅如此,一些长距离的不间断航班由于成本上升而被取消。商业交通将会更慢,也更贵(虽然成本效率更高)。

在经济萎缩的时候,娱乐业反而会迎来一次增长。人们仍然会花大部分的时间在电视上,但是选择不订阅任何有线电视的人群将会大大增加。在美国,15到35岁的有线电视观众将以年均10%到20%的速度加速减少。现在收视率最高的电视(首映时五百万左右的观众)在四年之后仍然会保持大致这样的收视率。但是大部分收视率在两百到三百万左右的剧集将会降到一百万以下。在未来四年,某一家在线播放服务商会和一家主要的体育赛事签订在美国独家直播的协议。总体上,从电视和移动终端上,人们每天将会花超过三个小时观看在线视频。

共享经济的未来或许不会是像大家所想的那样。共享经济的核心是将重资产从公司的资产负债表中剔除,从而提高公司的盈利能力。在经济增长期,轻资产使得公司能够快速发展,将盈利能力不佳的项目轻松砍掉。但是在经济萎缩的时候,这些公司将会尝试在资产价格划算的时候收购一些资产。但是最流行的方式或许不是直接的购买。这些公司将会提供给共享经济工作者们各种融资项目让他们能够有资本去购买资产,将资产的贬值风险留给这些共享经济工作者。

移动消息服务将会集中在少数的几家公司手中。这些公司每家将会拥有至少三亿的日活跃用户,每天的消息发送将会超过二十亿条。现在的移动消息服务商中不能达到这些数字的公司将会死掉。全球主要的移动消息服务商只会剩下三到四家或更少。所有的这些移动消息服务商都会提供语音和视频通话服务。这将进一步使得传统电信服务商的语音通信服务变得无关紧要。在美国,至少有一家大型的在线互联网服务商将会进入互联网通信服务市场。互联网接入速度将会进一步提高。全球而言,家庭有线互联网服务接入平均速度将在100Mbps左右。全球移动互联网接入平均速度将在10Mbps左右。在中南非洲,互联网接入速度将会到500Kbps。换句话说,只要有钱支付,在全球除了南极洲以外的任何陆地上,你都能够通过手机进行稳定的视频通话。

医疗仪器也会变得更加便宜。低价的计算能力和机器学习在信号处理中的广泛应用使得必须的医疗设备变得更加便宜耐用。它们能够在地球上最偏远的地方正常工作。这一变化将会对全人类的平均寿命产生深远的影响。

在其他方面,虚拟现实设备将会出现在更多的家庭中。它们仍然在寻找一个杀手级的应用。即使这样,在2019年底,全球将会每年售出三千万台左右的虚拟现实设备。工业机器人将会在更多的工厂取代人类,而这对于中国而言是一个利好。航天技术的私有化仍在继续,私人企业将会成功完成至少一次载人航天任务。

就这样,2020年还没有到人类历史上最坏的时代呢!

It has been 4 years since the last prediction for the year 2016. My original plan is to draft a prediction every 2 years, and scope for the next 4 years. Gates once said, we always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten. A decade ago, having computing devices as small as a palm with Pentium 4 computational power was unimaginable. Even 8 years ago, it was a difficult fate for us to build an all-in-one TV with high-end PC capability.

Review the Prediction of the Past 4 Years

The prediction of the past 4 years has been accurate. The biggest promise of economic stability has been kept with all the unusual fiscal policies, otherwise such predictions can hardly be any believable if at all. Reviewing the prediction I made 4 years ago, Internet connection speed, the unfortunate market share of 3D TV, Television on demand, computational power, driving assistance (self-driving), and photography technology have matched the reality pretty well. However, for wireless power source, Pads and ultrabook merging, commercial supersonic flight, unemployment rate, and artificial intelligence has been off quite a bit. No predictions on unmanned aerial vehicles. Overall, some of these predictions are too optimistic, and some of these are simply ignorant.

The Economic / Social Outlook for the Next 4 Years

However, it is harder to predict the next 4 years on the same social / economic stability promise. Globally, the economy growth slowdown will be a given. On the contrary, the United States will be least affected due to the dominance of Dollar in Global economy. In Europe, it is unlikely the economic situation in Spain, Greece and other Mediterranean countries will get any better. As slow as politics go, the possibility of one or several countries exiting euro-zone becomes ever more real. However, under the gloomy environment, Japan’s outlook improved marginally after several scheduled tax hikes. The tricky bits, is China. China would likely to take either two paths:

1). Its GDP will land at around 4.5% to 5.5% yoy growth in the next 4 years. This is after a controlled turbulence landing, with some finesse mix of fiscal / monetary stimulus. Overall, the fiscal sheet is more balanced, and as the world manufacturer, China integrates more efficiency in its system, and it is harder to compete on efficiency front even with much lower labor costs. This is a China as a newly-minted developed nation, seating comfortably among the rest of developed nations with GDP per capita between $9,000 and $10,000.

2). Its GDP will land at 4.5% or even below in the next 2 years and will be considered as fatal. Fiscal and monetary tools seem ineffective due to large amounts of capital outflow, as well as loosen control over capital in general after 2008. The social uprising turns out to be much easier than expected. The regional government would be hard to contain the unrest, and the central government would likely to have several rounds of negotiations with opposition leaders, it becomes impossible to predict what would happen afterwards.

For the sake of making any progress on this prediction, I will pick the China option 1 as the background for the next 4 years. If option 2 turns out to be closer to the reality, it nullifies all the predictions I am going to make below.

India, for the lack of systematic knowledge in that area, it is hard to predict the impact of India to the global technology and economy outlook. For Russian and Middle-East oil-producing countries, the assumption will be that oil per barrel will float around $40 to $100, and Russian’s economy will struggle nevertheless due to the more volatility in the oil price.

The Basis of Any Predictions

The success of any prediction, if at all, looks at the past patterns. For the past 100 years or so, it has been the capturing and interpretation of exponential growth. It has been emphasized in enormous books and talks about the fascination of exponential growth. However,by applying exponential growth, without the underlying understanding of technological principles, we risk of hitting some fundamental laws of the physics, and makes no progress at all (and on the other hand, a premature prejudice of “understanding” the fundamental limits of physics, can be fatal too).

The exponential growth is made possible only with two key terms: standardization and the economy of the scale. The modern marvel of this kind, is the iPhone. Without the scale of the iPhone, modern high resolution screen with capacitive touch will cost thousands dollars to manufacture per square inch. But now, everyone gets a modern high resolution touchscreen with a few bucks.

These two key words, will manifest themselves in many forms, and will continue to play wonders in the next 4 years.

The Prediction

The smart hardware has been around for more than 10 years. But what makes sense as a “smart hardware”?

  • It makes the basic functionalities we assumed about that hardware a no-brainer. Smooth, one touch, perfect and care-free integration;

  • It extends beyond the basic functionalities, but operates under well-defined principles (good example, a router that caches cloud content and make the access instantaneous, bad example, a refrigerator that orders food for you);

  • It is unlikely to be something completely new.

Then, there is the un-PC era. In the next 4 years, homes rarely own any desktop computers, even though aggregated processing power in a single-family house can easily reach more than 10Tflops. There is a change of the interface too. People now interact with these devices by either touching or talk. The graphical interfaces now have a meaningful conversational re-touch.

Despite the potential conflicts and regional instability, the transportation will be more cost effective. In terms of the land transportation, self-driving or smarter driving assistant will be standard add-on in newly shipped vehicles. However, it is far from becoming the mandatory standard. The Abu Dhabi PRT was a failure in the Middle-East, but similar transportation services will run commercially in some cities. The next generation of long distance land-transportation is still in experimental phase in the United States. Not only that, some of the longest commercial flights are cancelled due to the cost. Commercial transportation is going to be more expensive, and slower.

Entertainment industry gets a big boost in time of recession. People still spend disproportionate time on big television, The movement of “cutting-the-cord” will happen much faster than expected. The United States 15 to 35 year viewership on cable will drop at the rate of 10% to 20% year over year and accelerating. Today’s top TV show numbers (5m viewer at the premiere) will keep steady. But shows with 2m to 3m premiere viewership will see a drop to 1m or less. In the United States, online streaming players will ink deals with major sports and have exclusive rights to stream online. People will spend more than 3 hours a day on streaming services, either on television or on their mobile devices.

Shared economy is not going the way you would expect. At its core, shared economy moves the assets out of the company such as AirBnb or Uber’s balance sheet and bumped up its profitability. At boom times, asset-light companies can move fast and quickly get rid of less profitable businesses painlessly. At down times, these companies will try to own more assets as the asset prices are all cheap. However, the most popular way for them to do so will not be out-right purchase. Instead, they will launch finance programs to help its share economy workers to own these assets, and leave the risk of asset depreciation to them.

The mobile messaging service will consolidate. Respectable players on messaging service will reach 300m daily active users, and have at least 2b message sent per day. Any player cannot reach that hallmark will be dead. There will be only 3 to 4 major players in that space, if not less. All the messaging services will have the ability to make audio and video calls, which will continue to marginalize the phone call service business for traditional phone service providers. In the United States at least, more than one online-based business will enter ISP business. The speed of the Internet will continue to improve. Home Internet speed globally will average to 100Mbps. Global mobile Internet speed will average to 10Mbps. Specifically, the mobile Internet service in Middle / South Africa will reach average 500Kbps. In the other word, as long as you can pay, with your cellphone, you can have semi-stable Internet connection and will be able to do video calls anywhere in the world except Antarctic.

Cost-effectiveness is penetrating medical equipments. With lower cost of processing power and general application of machine learning techniques in signal processing, popular and essential medical equipments will reach a point that are cheap and versatile enough to even be delivered to the most remote area on Earth. The profound impact will be a global lift in life expectancy.

Virtual reality gears will have tractions in many more homes. They are still struggling to find its killer applications. But on average, shipped units per year will be around 30m globally at the end of 2019. Industrial robots will replace more human labor, which is a good thing for China. Privatization of space technology continues. One or more private companies will accomplish at least one low-orbit manned mission.

Thus, it is not the worst time of humanity yet for the year of 2020.

No comment yet
August 16th, 2015

八月的夏天总是特别漫长。暑假的最后几天,家里百无聊赖,休整得已经不能再好了就等着开学,和同学们扯蛋聊暑假发生的大小事情。暑假的尾巴,八六版的西游记也快结尾了,都开始放西游记续了。

嗯,其实从来也没有过过这样的暑假。大概除了去年的夏天,也总是忙忙忙。忙工作啊,忙实习啊,忙一些上学时候不应该忙的事情。太阳公公晒啊晒,把白天晒得格外长。漫长的白天,又跟着漫长的黑夜。重复的八月,赶着同一个Deadline。结果也都有好有坏。有让大家不满意的地方,也有让自己开心的时候。

九局下半,两人出局。

最早的漫长的八月在上海,出奇地重复一致。每天早上在学校门口吃碗馄饨,然后把主机搬到实验室,开始下载东西,准备资料,写代码。傍晚的时候,把主机再搬回寝室,继续做没做完的事情。

紧接着的八月在军训。就这么就到了大学。每天一样的出勤,请病假,坐在树下和同样请病假的刘同学扯蛋。每次和这些小伙伴聊到音乐啊,历史啊,追忆似水年华啊,就觉得自己很Pedestrian。喂,从小读的应该是茨威格博尔赫斯而不是基业长青啊!

满垒。

09年的八月在湾区租的一个小房间里。一早到Sunnyvale的另一个小房间,中午在Castro吃饭,将近九点的时候回到Mountain View的那个小房间。然后第一次到Charlottesville。

之后的夏天总是在Facebook实习。说是实习,其实也是每天到办公室,写东西,然后八月的某天晚上有Zuck家里的Party。

12年的夏天,一边骗着父母说已经毕业了,一边去Berkeley补没满的5个学分。清晨六点往学校赶,听完教授吹嘘自己在KPMG的光辉事迹之后九点半就往公司赶。就这样墨墨迹迹地就真的从大学毕业了。

两好三坏。

就这样到了下半场,往下看也看不到村上春树。手头还有好多想做完然而一点没动的事。漫长的八月,却没有外星人、未来人和超能力者。如果再长一点、再长一点,把想做的事情都做到,不让任何人失望,这样的重复能不能在第15532次的时候结束呢?

看台上没有观众起立鼓掌,我把帽子反戴,心里默念,老夫子,全垒打,安打安打全垒打,全垒打。

No comment yet
May 23rd, 2015

我是表现得有多不热爱LA啊!过去的八周,只有三个周末呆在LA,忙着各种Routines就过了。人家一问这个城市怎么样啊,还是噼里啪啦地答不上来。所以,趁着长周末,就,嗯,跑到大岛来了。

不过,我是有多热爱在这样的海边小镇瞎逛。一不小心就在Kona的小镇上溜达了一下午。啊,刚刚才意识到这里土特产应该是咖啡!在Venice工作的时候也算是喜欢在Oceanfront Walk上瞎逛。但最近都是匆匆忙忙走过,连抓杯咖啡的时间也没有。这大概或许是人生走上正轨的写照?那当年在Facebook没事抓个咖啡,找人扯淡聊天喝奶茶的人生或许大概就是在瞎胡闹?或者现在这样假装很忙才是在瞎胡闹,在办公室里面看党报喝茶可能才是正途,谁知道呢。

之前有段时间,对自己想住在哪这件事一直很纠结。嗯,纽约,那个,是吧。旧金山湾区,现在回去也有了一层陌生感。LA总是在慌乱地工作,夏天好不容易到了,Hollywood Bowl有演出了,却还一次都没去。呆在上海,也花了两三年时间,才听懂了沪语,觉得是自己的城市。北京一直没搞明白,但是在北京的时候,亲切感却很强。之前总感觉年轻人每个地方都要住那么两三年看看,就这样晃荡了十几年,也眼看着不年轻啦。

于是想了想,既然每个地方都不是家,还不如抓紧时间多跑些地方,至少瞅瞅别人家是什么样子,说不定哪天觉得好就搬到Tangier了呢。

No comment yet
May 3rd, 2015

Birthdays are often joyless for me. I’ve yet to find this as an excuse to celebrate for. But the clock is ticking, and every time, it kicks me hard on the back this day of the year. It is always a thing for me to accomplish something, to set a goal, and work towards it. But looking back, I’ve done nothing tangible, not even to mention worthy causes. When I die, I die.

硅谷老是说,让世界变得更美好。再年轻一点的时候,也觉得热血沸腾。But now, I only want to touch lives. So be it one, or two, or many. 但是成长这么大,除了父母,也没有人会在乎我是活着还是死去了。过去三四年,却也没有做成什么事情。Life is not about a house, a car or a pack of children. What I want, is to make beautiful objects and put these into people’s hands.

还是是很贪心啊!什么都想要,又怎么可能什么都能得到。