不学C++也能玩转超算编程，斯坦福大学推出超算编程语言Regent

2019-07-22 14:44
匠心计划创作者,机器之心官方帐号,优质科技领域创作者
关注

超级计算机在很多前沿领域都有非常重要的应用，但其编程所需的软件过于复杂，令非程序员出身的科学家非常头疼。近日，斯坦福大学的计算机科学家开发出了一种新的编程语言，旨在降低超级计算机的编程门槛。

只有超级计算机才最有能力应对科学面临的巨大挑战，但是这些机器的编程难度却阻碍了发展。

个人电脑革命曾经改变了一切，为我们大多数人提供了易于获取、更便宜、更小、更快、更易于使用的计算机。科学家们也受益匪浅。他们开发了相应的计算机技术来研究细胞的内部运作、遥远恒星的轨道以及曾经远远超出他们观察能力的其他现象。

但是对于前沿的研究人员来说，一个具有讽刺意味的现象出现了：新的、复杂的仪器开始产生越来越多的数据，因此需要超级计算机来分析实验结果。编程这些硬件所需的软件过于复杂，以至于试图分析如此庞大的数据集的科学家们常常很难掌握。

因此，Regent 出现了，它是由斯坦福大学计算机科学家 Alex Aiken 领导的小组开发的新编程语言，它使超级计算机更容易使用。阿尔卡特朗讯通信和网络教授、粒子物理学和天体物理学以及光子科学教授 Aiken 说：「我们希望创造一个编程环境，让那些不是计算机科学家的研究人员也能使用。」

Regent 帮助解决了超级计算中最大的挑战之一：今天的超级计算机比以往任何时候都要复杂得多，现有的编程语言一直在努力跟上。在人们的想象中，超级计算机可能是一台巨大的机器，但事实上，它是由数千个微处理器组成的阵列。科学家通常用 C++来编程这些数组，C++是 40 年前的一种编程语言，在当时那个年代，主要的微处理器是中央处理单元，即 CPU。CPU 以编程人员所称的串行方式，快速地解决大问题，一个接一个地进行计算。

然而，最近，第二种微处理器已经成为超级计算的重要组成部分，它就是图形处理单元（GPU）。GPU 开始主要用于控制计算机屏幕上的数百万像素，以提高电子游戏的视觉效果，它可以像程序员所说的那样同时或并行地执行许多类似的计算。事实证明，并行处理在机器学习等应用中非常有用。C++已经升级，以跟上这些硬件变化。不幸的是，补丁的增加使得该语言越来越难以使用。然而，Regent 使得超级计算机程序员面临的一些问题变得更加简单，比如将串行处理任务分配给 CPU，将并行处理任务分配给 GPU。

一旦 Regent 在概念层面上构建了程序，程序员的意图就会被转换——或者，使用技术术语，编译——成第二个软件层，叫做 Legion，Aiken 也开发了这个软件层。Legion 生成机器代码，即指导超级计算机硬件如何执行程序的精确指令。Regent 和 Legion 之间的紧密集成使得程序员更容易做出其他重要的决定，特别是超级计算机必须分析的数据存储在哪里。

SLAC 国家加速器实验室的科学家 Elliott Slaughter 说，这两个层的集成节省了程序员的金钱和时间，他几乎从 Regent 和 Legion 诞生之初就开始研究这两个层。计算机消耗能量，这是有代价的。但是移动数据的能源成本可能是对该数据进行计算的 100 倍。此外，大型实验往往依赖于收集大量数据的仪器。Slaughter 说，一些仪器每秒可以收集相当于 20 张视频 DVD 的数据，用于持续 15 分钟的实验。即使以光速在光纤上移动，从仪器到超级计算机获取大量数据也可能产生滞后，这可能会破坏分析。「你把数据放在哪里，是程序员做出的最重要的决定之一，」Slaughter 说。Regent 和 Legion 通过在等待计算时存储数据，给程序员带来了前所未有的控制权，也节省了金钱和时间。

他说：「你可以先编程计算任务，然后再定位数据，这非常容易，而且不需要重新编写代码。」

Regent 会广泛使用？研究人员说，新语言必须克服很大的惰性。「Regent 是一种非常不同的编程方式，」Aiken 说。「研究人员需要一段时间才能适应其所需的思维方式。」

但有两个因素对其有利。首先，超级计算硬件在不断改进。美国能源部（U.S.Department of Energy）正在推动其 Exascale 计算项目的发展，该项目的目标是在 2021 年左右将超级计算能力提高 50 倍。能源部正在支持软件项目，包括 Regent，以帮助编程跟上进度。

此外，许多想使用超级计算机的科学家不熟悉当前的工具，也不清楚编程大型实验所需的陡峭学习曲线。即使是经验丰富的超级计算机程序员也可能会觉得当前的系统很麻烦，并怀疑是否有更好的方法。Aiken 说：「我们经常与那些意识到 Regent 让他们生活更轻松的科学家交谈。」

Regent: A Language for Implicit Dataflow Parallelism

Tuesday, December 22, 2015

Programming

Regent is a research programming language which extracts implicit dataflow parallelism from code written in an imperative and (apparently) sequential semantics.

How does this work? Let’s jump right into an example:

task main()
  var a, b, c = ...
  t1(a, b)
  t2(b, c)
end

Here, t1 and t2 are tasks—think of them as functions in the usual imperative sense of the word. Tasks can even mutate their arguments, just like normal imperative code. Conceptually, Regent code runs sequentially, so you can run through the logic in your head by just reading the code top-to-bottom. Under the covers, the Regent implementation will take advantage of any parallelism it can find in the code. But any parallel execution is guaranteed to produce results identical to the original sequential semantics.

Internally, Regent discovers parallelism by computing a dataflow graph¹ for the program. If the arguments a, b and c are all distinct objects, for example, and if we suppose that t1 and t2 don’t modify b, then we’d get this dataflow graph:

Since t1 and t2 are independent in this graph (there is no path in the graph from one to the other), the two are able to run in parallel. If instead t1 were to modify its argument b, then we’d get this graph:

And the two tasks would be forced to run sequentially.

This idea—of parallelizing imperative programs via a transformation to dataflow—is actually an old idea. For example, Jade from the early 1990s used this technique to great effect on then-current hardware, and the technique itself was known prior to that. The biggest differences between Regent and previous systems are in the type system and the (multiple) ways in which Regent is able to discover parallelism.

The Three Dimensions of Parallelism

In fact, Regent has multiple ways of discovering parallelism in a program. I like to think of these each as dimensions along which a program can be parallel. Tasks only need to be independent along a single dimension in order to run in parallel—the extra dimensions provide flexibility to ensure that a variety of parallel patterns can be described in the language.

1. Privileges

As we saw in the example above, if two tasks read but do not modify an argument, they can run in parallel. This leads us to an important aspect of the Regent programming model: privileges. Privileges describe the modes of usage for a given argument (read, write, etc.). In Regent, privileges are part of the type signature of a task:

task t1(a : region(T), b : region(T))
where reads(a, b), writes(a) do
  ...
end

(Regions are just abstract containers for data, and are described in detail below.)

Privileges are checked by the type system, so if a task tries to access data it didn’t declare privileges for, the compiler will catch the mistake:

task t3(c : region(T)) where writes(c) do ... end

task t1(a : region(T), b : region(T))
where reads(a, b), writes(a) do
  t3(b) -- ERROR: Missing privilege writes(b)
  for x in b do
    @x = ... -- ERROR: Missing privilege writes(b)
  end
end

Obviously, tasks which only read their inputs can trivially run in parallel. However, it is also useful to be able to describe reductions—commutative operators (such as +, *, etc.) which can be applied in a lazy manner to a value. Reductions also allow tasks to run in parallel, as long as all tasks use the same reduction operator.

task t4(d : region(T))
where reduces+(d) do
  for x in d do
    @x += ... -- Values are saved and applied later.
  end
end

task main()
  var d = ...
  t4(d) -- Both can run in parallel.
  t4(d)
  -- Temporary values are flushed before the next read.
end

2. Fields

Often, tasks access the same objects, but use different fields in said objects. Field-sensitive privileges are an easy way to achieve task parallelism in codes with this sort of access pattern. Fields are easy to specify (just include the fields accessed in the privilege declaration), and frequently lead to unexpected gains in parallelism. This sort of parallelism can be especially tedious to express in existing programming models.

task t5(a : region(T))
where reads(a.{x, y, z}), writes(a.z) do
  ...
end

task t6(a : region(T))
where reads(a.{x, y, w}), writes(a.w) do
  ...
end

task main()
  var a = ...
  t5(a) -- Both can run in parallel.
  t6(a)
end

An added bonus of declaring field-sensitive privileges is that the compiler is able to catch operations to the wrong fields:

task t5(a : region(T))
where reads(a.{x, y, z}), writes(a.z) do
  for t in a do
    t.x = ... -- ERROR: Missing privilege writes(a.x)
  end
end

3. Regions

Privileges are easy enough to track at a type system level when everything is a local variable. But many interesting problems require the use of pointer data structures. Pointers naturally introduce the possibility of aliasing (two pointers with the same pointee)—and aliasing is arguably the bane of all static analysis.

In order to manage aliasing in a sane way, Regent uses a type system based on logical regions. Regions are similar to arrays, but allow for better static type checking. For example, pointers are explicitly typed to a specific region, and can only be dereferenced if the task has privileges on the containing region.

task t6()
  var r = region(ispace(ptr, 5), T) -- Create enough space for 5 Ts.
  var x = new(ptr(T, r)) -- Allocate an element in r.
  @x = ... -- Access is OK. Pointer x is guaranteed to be in r.
end

Regions also allow us to achieve data parallelism via partitioning. Partitions subdivide a region into some number of subregions. Subregions resulting from a partition are not separate regions, they’re just views onto the parent region. So changes made to subregions are automatically visible in the parent.

task t7()
  var r = region(ispace(ptr, 5), T)

  -- Allocate some elements in r.
  new(ptr(T, r))
  new(ptr(T, r))
  new(ptr(T, r))

  -- Assign colors to the elements of r.
  var c = 0
  for x in r do
    x.color = c
    c += 1
  end
  var colors = ispace(ptr, c, 0) -- Remember the set of colors.

  -- Partition r based on the field color.
  var p = partition(r.color, colors)

  -- Launch a task over every subregion. Because the partition is
  -- disjoint, all tasks will run in parallel.
  for i in colors do
    t5(p[i])
  end
end

Partitions in Regent can be disjoint, as above, or aliased. Using disjoint partitions guarantees that tasks on the subregions can run in parallel, but even aliased partitions are useful. For example, aliased partitions can be used with either read-only or reduce privileges to move information around the system (for example in halo exchanges).

Regions and partitions are both first-class objects. They can be passed to tasks as arguments, and stored in the heap. Note, however, that while the values are first-class, privileges are not. Ways of dealing with this limit are beyond the scope of this post (see this paper for details).

Try Regent Online

Interested in learning more? Try Regent in your browser and run the examples yourself. Or, if you want to run Regent locally, see the installation page for details.

For More Information

You might also enjoy reading the source, language reference, or paper. There is also an older paper which describes a fragment of the type system.

I should note that this graph is technically not a proper dataflow graph in the traditional sense. Since traditional dataflow languages use functional semantics, nodes are pure functions of their arguments, and it makes sense to only model data as values flowing along the edges in the graph. Because Regent is imperative, memory locations may hold multiple values over time, and thus it makes sense to explicitly model how these values change as the computation proceeds. This representation also turns out to be useful in performing compile-time optimizations.↩︎