Cuneiform is a functional programming language for large-scale data analysis workflows. It is open because it easily integrates foreign tools and libraries, e.g., Python libraries or command line tools. It is general because it has the expressive power of a functional programming language while automatically parallelizing and distributing program execution. Cuneiform uses distributed Erlang to scalably run in cluster and cloud environments.
Automatic Parallelism and Distribution
Cuneiform is built on distributed Erlang which allows setting up large clusters of Cuneiform workers on top of a Posix-conforming distributed file system with minimal effort. Since Cuneiform is a functional language, it can evaluate sub-expressions independently in a distributed setting assuming foreign functions are deterministic. This way, potential parallelism can be inferred without additional annotation from the user.
Worker allocation over time for a variant calling workflow made up of 12300 foreign function applications. The workflow analyzes 8 GiBytes of compressed DNA samples which it matches against a 3 GiBytes Human reference genome. It runs for just under an hour on an 8-node cluster amounting to 576 cores backed by a 4-node Gluster distributed file system. Independent foreign function applications run in parallel.
Data dependency graph for the same variant calling workflow. A yellow line stands for an input or output file, a blue line stands for a foreign function application. A black arrow stands for a data dependency between a pair of files or foreign function applications.
Foreign Function Integration
The flexible foreign function interface lets you integrate tools and libraries from many different sources. This allows you to drive a heterogeneous software collection through a uniform interface. Currently, the following foreign languages are supported:
Cuneiform is a functional programming language, i.e., functions are values. Furthermore, Cuneiform excites a declarative programming style by allowing only immutable variables and data. With general recursion unbounded iteration is available. Cuneiform provides lists and records as compound data types. Lists are accessed only via mapping and folding (excluding the unsafe head and tail accessors). Records can be accessed either through projection or through pattern matching.
Static Type Checking
A static type system excludes the possibility of runtime errors at the Cuneiform-native level. In the absence of recursion, also termination is guaranteed. While software can still fail (or diverge) at the foreign language level, Cuneiform helps to subdivide a large program into a set of independent foreign code islands which are easier to maintain.
Cuneiform is designed from both, a programming language and a distributed systems perspective. Its language semantics are defined in terms of reduction semantics while communication, distributed coordination, and fault tolerance are specified using Petri nets.
subscribe via RSS
- Dec 14, 2018 Parallelism in Racket at Racketfest
- Dec 6, 2018 Redex Language Models for Exploring a Distributed Programming Language at After Work Racket 2
- Nov 19, 2018 Cuneiform 3.0.4 released
- Jul 24, 2018 Modeling Erlang Processes as Petri Nets accepted at Erlang Workshop 2018
Blog PostsMore blog posts ...
- Cuneiform: A Functional Language for Large Scale Scientific Data Analysis
- Computation semantics of the Functional Scientific Workflow Language Cuneiform