Posts from December, 2020
No comment yet
December 29th, 2020

每两年,我都会做一次四年为期的预测。对2020年的预测,怎么也不会预料到这样的突发事件。这样的突变,反而会使得两年前和四年前过于悲观的预测显得乐观不少。

回顾2016年的预测,辅助驾驶技术已经广泛应用于汽车之中。Cruise和Waymo也分别在旧金山和凤凰城开始了无驾驶员的自动驾驶汽车的运营。存活的移动消息服务,基本也都达到了三亿日活左右的标准。

许多长线航班的取消,虽然确实实现了,却是因为不同的原因。而石油价格虽然大幅变化,且维持在40美元左右,变化的方向却完全错了。

VR设备并没有在19年底售出三千万,仅仅售出了三百万台左右。

写了这么几篇的四年预测之后,我的目的也从单纯的追求准确变成了在追求准确的同时,尽量写一些以前没写过的新趋势。这也就是为什么在絮絮叨叨了将近十年的自动驾驶之后,这一篇并没有提了。因为他终于开始走上了产业化的正轨,接下去就是按部就班了。


2024年,却应该是乐观的。在经历了一场大瘟疫之后,短暂的萧条却由于多国政府的措施得当,变成了经济增长期。尤其是传统的发达国家,因为美国的国家正常化(美元占全球交易总额在30%以下)举措,迎来了一波复苏。

科技的发展也并没有因为瘟疫而停下脚步。更多的数据中心、更快的网络,是人们对疫情后世界的基本要求。更多的更持续的工作虚拟化需求加速了往云端迁徙的脚步。到2024年,全世界而言,线上支付会占到消费者总支付的一半以上。

无线互联网的延迟在90%的情况下会在50毫秒以下,25%以上的人口会有机会使用1G以上带宽的互联网。在算力方面,最好的单机工作站可以达到1 Petaflops的全32位浮点矩阵运算速度,近32位浮点矩阵运算速度将会达到10 Petaflops。2011年的超级计算机京也不过是10 Petaflops左右。

但是计算不止发生在云端。在2024年,我们会看到Hyperscalers运营耗电量接近1 Gigawatt的单个数据中心,但是也会看到耗电量不到15 Kilowatt的最后一公里边缘计算中心。成功的边缘计算服务提供商会管理将近25,000个边缘计算中心。这些边缘计算中心是提供高速互动视频和云游戏的关键。

即使有了云游戏,玩3A游戏的人却并没有增加多少,只是现有玩家的游戏时长增加了而已。另一方面,游戏引擎的使用却越来越平易近人。到2024年,20%的独立视频制作者将会使用游戏引擎实现实时的视频特效。一半以上的在线教学课件也会使用游戏引擎进行展示。

mRNA疫苗的成功应用和对疫苗监管的放松使得在2024年,一小部分人最先享受到了个性化疫苗的服务。这些个性化疫苗会从最先预防普通感冒开始,我们甚至可能会听到阿尔茨海默症疫苗的好消息。

虽然在2024年,我们不大可能完成一整套AGI系统的设计,但是一种通用的增强学习算法架构成功的应用到了从短期趋势预测到长期战略计划的方方面面。除了少数在数据融合输入方面尚有问题的领域(比如多边贸易谈判,民事刑事侦查诉讼等),大部分领域的执行将会是这套增强学习算法,而我们只是负责简单的数据整理和维护。从这个角度来看,我们离能自我进化的AGI系统又近了一步。

因为疫情而导致的电影网络同步上映并不会因为疫情的消失而消失。事实上,到了2024年,全年50%的电影会在院线上映的30天内在网络上映。每十几分钟分段的直播和录播混合的互动视频占据了我们剩余的娱乐注意力。我们希望视频的创作者更快的做出我们想要看的东西。90%头部创作者的视频上传周期从每周两更变成了一天一更。

虽然我们迎来了经济的复苏,全球化和全球化相关的全球变暖议题并没有实质的进步。除了山火和热带风暴,我们将在2024年以前见证一次大规模的近大陆架海洋动物死亡事件。因为海水酸化,太平洋沿大陆架将会迎来大量的海带养殖投资。因为海带养殖的盈利性,我们甚至会往深海发展,养殖更多的海带。海带会在美国成为一种时尚的健康食品。

到了2024年,因为植树造林和海带养殖,地球的肺得以有所喘息。但是对于控制温室气体的排放而言,这远远不够。除了少数的例外,大部分国家,包括中国、印度和美国,都将放弃或者达不到2024年的减排目标。我们仍然没有一种可以大规模快速捕获温室气体的技术投入使用。唯一的好消息可能是俄罗斯人多了几个天然不冻港。

但是,站在2024年的门槛上,虽然我们看到了全球化的受阻和意识形态斗争的暗流涌动,却充满希望:只要我们进步得够快,或许不会那么糟呢。

Every two years, I will make a 4-year scenario prediction / planning. However, the previous prediction for 2020 won’t even touch the slightest how dramatic it was. Such drastic change, however, makes seemingly gloomy predictions of 4 years more optimistic.

Right on mark for the 4-year prediction made in 2016, is the prevalence of driver-assistance technologies in all types of cars. Cruise and Waymo finally started their human driver-free autonomous vehicle commercialization in San Francisco and Phoenix, Arizona. The surviving mobile messaging services all will have around 300 million DAUs by now.

A lot of long-haul flights have been cancelled, but for reasons other than what was predicted. There were wide swings of crude oil prices, and stayed at $40 year-end. The directions of these swings are wrong from the prediction.

VR devices haven’t sold 30 millions in 2019. There were only 3 millions sold that year.

After the posts in the past few years, my goal for these changed from the singular pursuit of accuracy to unmask new trends while being accurate. That’s why after almost ten years religiously repetitive discussions on autonomous driving, there is none this time. It is on track of real-world productionization now, the rest will happen over time.


2024 should be an optimistic year. After the global pandemic, the predicted depression, with a combined monetary and fiscal policies from the world governments, will not happen. Developed countries, especially the United States, after its dollar normalization effort (dollar will only account for 30% and lower in global transactions), will be on a strong recovery path.

The technology developments won’t slow down a bit, even during the pandemic. More data centers, faster the internet, is what people expect post-pandemic. More and sustained work digitalization accelerated the migration to the cloud. Online transactions globally, by 2024, would be more than 50% of total consumer transactions.

The latency of wireless networks will improve. 90% of them will be below 50ms. 25% of the world population will have access to 1Gbps internet connection. The best workstation will do 1 petaflops full 32-bit floating-point matrix multiplication, and around 10 petaflops near 32-bit floating-point matrix multiplication. To put this in perspective, Fujitsu K in 2011 can only do 10 petaflops.

Computations are not only monopolized to the cloud. In 2024, we can see at least 1 hyperscaler runs a 1-gigawatt data center. At the same time, there will be data centers with less than 15 kilowatt power consumption on the edge. Successful edge computing providers will manage close to 25,000 edge computing data centers. These data centers will be the key for high-speed interactive videos and cloud-gaming.

Even with the moderate success of cloud-gaming, there won’t be much more gamers for AAA titles. Existing gamers will simply play more of them. On the other hand, game engines penetrated more markets. By 2024, 20% of independent content creators will use game engines for real-time visual effects. More than half of online education kits will use game engines for presentation.

mRNA vaccine and the subsequent streamlined regulation made it possible that in 2024, a privileged few can enjoy personalized vaccines. These vaccines will start with common cold, we will even hear some good news on Alzheimer vaccine development.

We cannot finish the full-blown AGI system by 2024. However, a generalized reinforcement learning system will be successfully applied from short-term trend analysis to long-term strategic planning. Besides a few areas with problems on data input (such as multilateral trade negotiations and civil / criminal litigations), many more places will use this generalized reinforcement learning system, and we will simply be responsible for data input and maintenance. We are inch-closer to a self-evolving artificial general intelligence system.

The online / in-theater lock-step movie releases won’t be gone post-pandemic. By 2024, 50% of movies will have digital release within 30-days of its in-theater release. The rest of our entertainment will embrace a few minutes a-piece live / clips mixed interactive videos. We demand faster feedback loops from content creators. 90% of top content creators will perform daily, compared to twice a week today.

During our economic recovery, globalization and its related topics such as climate change won’t make much progress. In between mega wildfires and mega storms, we will witness a large-scale offshore ocean-going animal’s synchronized death. Due to the acidification of the ocean, we will see growing investment in kelp farming around pacific shore. It is so profitable that we will even build kelp farms in the deep sea. Kelp will be a fashionable health diet.

With the reversal of deforestation and kelp farming, by 2024, the lung of the Earth slowly starts to heal. This is far from enough. With some exceptions, most countries, in particular, China, India and the United States will either give up or cannot reach their 2024 carbon reduction goals. There is no viable large-scale carbon capturing technology in existence. The only good news from the climate-change, is probably a few more year-round non-freezing ports for the Russians.

With the stagnation of globalization and the slow-motion ideology conflicts, we will still be hopeful in 2024: if we can progress fast enough like we did in the past 4 years, maybe, just maybe, the worst won’t come.

No comment yet
December 6th, 2020

From the onset of implementing libnnc, it meant to be a common ground for higher-level language bindings beyond Python. The underlying architecture has been stable for a year or so, and I have been using it for some personal projects for a while. But the raw C interface is not the easiest to use, it is the time to implement some high-level language bindings for that library.

The default high-level language for deep learning likely would be Python. However, I am not happy with how it performs on a typical many-core system even with things that are supposed to help. Swift, on the other hand, has no issues with saturating my many-core system and it has a reasonable Python binding to tap into the rich Python tool-kits. Not to mention the calling C functions from Swift path is as easy as you possibly can get.

This conviction resulted s4nnc, a Swift language binding for libnnc. Because s4nnc is a pure interface to interact with the underlying deep learning library. I paid close attention to its API design. Below are some design notes around why it is, and how Swift as a language fares on such a task. If you want to read the introduction to s4nnc, feel free to visit the GitHub homepage.

What is a Deep Learning Library API

A good deep learning API to me, can be largely modeled after Keras and PyTorch. It concerns, above all, with 2 questions:

  1. How to specify a deep learning model?
  2. How to construct a training loop?

Everything else is nice-to-have and largely orthogonal to these two questions (but these whistles-and-bells are a lot of hard work!).

A training loop consists of a repeated sequence of operations to: evaluate a model, compute gradients against the loss, apply gradients to update model parameters.

The details can be flexible: you could evaluate one part of the model in one round, and another part in another round; you could have different losses for different model outputs each round; you could modify the gradients, scale them, truncate them to whatever you liked; and you could apply gradients to update different model parameters with different optimizers. But at the core, I didn’t see much changes for this 3-step throughout many model training code.

What constitutes a model can be more interesting, but it seems we converged to a concept where a model consists of some inputs, some outputs, and some parameters. Particularly, parameters are stateful and internals to the model itself. Go beyond that, a model could have different inputs / outputs and different input / output shapes during the training loop. However, the shapes and number of parameters during the training are likely to be constant.

Basic Data Types

A deep learning library operates on multi-dimensional arrays (or tensors). In Swift, a concrete tensor can be represented as a value type like Array itself in Swift. That means the Tensor type would need things such as copy-on-write to implement said value-type semantics. Extra attention needs to be paid to make sure throughout the implementation of the API, no unnecessary copy was made. This value-type choice is a bigger deal than it sounds (it sounds like a no-brainer given S4TF made exactly the same choice) because in Python, everything is a reference type.

This becomes more interesting when deciding whether tensor variables could be value types or not. Tensor variable is an abstract tensor type which you can compute gradients (has a grad property). It is bound to a computation graph, and can be the parameter to update during the training loop. While it is possible to make many functions associated with tensor variables taking inout parameters and marking some of tensor variables’ functions as mutating, the more hairy part is about updates during the training loop.

In PyTorch, an optimizer takes a list of parameters and then applies new gradients to update these parameters when step() method is called. This is possible because parameters are reference types in Python. Any updates to the parameters will be reflected to the model who holds these parameters. This is not possible if tensor variables are value types. In Swift, you cannot hold a reference to a value type.

Thus, practically, tensor variables have to be implemented as reference types in Swift.

Despite my best intention, it turns out most of the objects, including Graph, Model, StreamContext are still implemented as reference types. It is possible for some of them (for example: the Model) to be value types. The lack of deinit in struct requires us to wrap a reference type inside a value type to create such API. At the end of day, I don’t see much of the value from API aesthetics or performance-wise to make these value types.

Automatic Differentiation

While Swift has a proposal for automatic differentiation, the automatic differentiation right now is implemented at library level and only applicable to models and tensor variables.

On the API side, it is popular to have a backward() method on the final loss variable. The said method will compute gradients against all variables associated with the final computation of the loss.

This also means we need to keep track of all variables in the computation graph, unless some point is reached and we can free them. In PyTorch, such point is when the step() method is called.

libnnc early on made the decision to avoid holding vast amount of memory by default. That resulted in the interface backward(to tensors: Sequence<Tensor>) where you have to specify to which tensors you compute the gradients against. Because we are doing backward-mode AD, we still compute gradients on all variables up until these tensors. But we don’t compute gradients against variables passed that point. In effect, we can rely on reference-counting to free memory associated with tensors beyond that point.

In return for this a bit more complex interface, you don’t have to worry about scoping to no_grad to avoid unbounded memory allocations.

Optimizers in the Training Loop

An optimizer in a deep learning library represents a particular gradient descent method associated with some parameters to update each round.

One particular challenge is about how to use multiple optimizers with different parameters in one round. While for simpler cases, you could call step() many times in one round. It may be more efficient to call step() once.

Swift makes this particular choice easier by supporting extensions on built-in types.

1
2
3
4
5
public extension Collection where Element: Optimizer {
  func step() {
    ...
  }
}

This is a good starting point to support more concise updates such as: [adam1, adam2].step().

The same pattern can be applied if you want to support gradients with multiple losses: [loss1, loss2, loss3].backward(to: x).

These extension methods are conditional, and type-safe in Swift.

Operators

While Swift allows operator overloading and the ability to introduce custom operators, somehow the ambiguity of * is not addressed. We cannot have consistency with Python because @ cannot be overloaded in Swift language. I have to resort to .* and .+ for element-wise multiplications and additions.

Type-safe Functions

While Swift can enforce some type consistency, without more language level changes, we cannot deduce shapes, and would still encounter runtime errors if shape doesn’t match. Even with language-level support, we may still need an escape-hatch because some tensors could be loaded from IO. Not to mention it would be nice to support dynamic input shapes to a model while it can still statically compute its parameter shapes.

Transparent Multi-GPU Data Parallel Training

One thing annoying in the raw C interface, is the transition from one GPU to multi-GPU, even with simple data-parallel models. Unable to abstract tensor to higher-level in C makes the code unnecessarily complex.

Fortunately, with basic generics, this is not a problem in Swift. However, to use such generics turns out to be more complicated than I expected on the library author side. If I want to avoid runtime type-check (is / as keyword), there are quite a bit of protocol / extension type dance I would need to do. This doesn’t help with Swift’s borderline hostility against protocol-associated types. Luckily, there are new proposals to lift some of the restrictions (and no, some is not all it needs). You can see this monstrosity here. At the end, I have to introduce some runtime type-checks to keep my sanity.

Closing Words

I am pretty happy with the end result of s4nnc. It is small, versatile and does exactly what I set out to: an ergonomics win from the raw C interface. Generics, type-safety and reference-counting in Swift the language really made a difference in the API ergonomics. The language itself is not the fastest, but has a good balance in ergonomics, expressivity and performance. In the next blog post, I am going to detail the Swift data science workflow I had, and why we moved to this from the initial Python one. Stay tuned!