Distributed Functional Programming and Foreign Language Integration
Cuneiform combines the strong points of functional programming languages, distributed databases, and workflow management systems. It enables you to run your data-intensive analysis workloads with minimal effort and no lock-in. Here is why:
In contrast to general purpose programming languages, Cuneiform is designed to integrate features instead of providing them first hand. As a consequence, it can do with only a small syntax for defining and calling functions.
Computation, I/O, or communication are done only by integrating foreign code. This makes Cuneiform predictable at the organizational level while giving you the opportunity to stick with your battle-tested methods at the computational level.
Having to develop operators in an application domain is often hard enough. Additionally having to micro-manage potential parallelism is distracting and unnecessary. Let Cuneiform do the parallelization for you. Independent operations are executed in parallel without additional syntactic clutter. Just define your tasks and call them. The rest will be figured out automatically. Here is an example:
Since all three files can be unzipped independently, Cuneiform will try to run these three operations in parallel.
Cuneiform is more than yet another distributed make-clone or a database query language. Cuneiform gives you many important features from functional programming, e.g., general recursion. While your application scenario might not need this feature right away, it may be worthwhile when your project has matured. E.g., say your workflow includes a costly operator that iteratively improves on an initial state and terminates when a convergence criterion is met. Keeping the operation within one task is fine until data sets grow to a point where the operation presents a major bottleneck for the whole workflow. With Cuneiform you can now unlock the parallelization potential of the operator by transposing the iteration to the workflow level. Now Cuneiform can manage the parallelization potential of the steps, the operator comprises. The following example shows how the k-means clustering algorithm can be expressed in Cuneiform:
Foreign Language Integration
Adopting Cuneiform lets you reuse the software you already have with minimum effort. Say you have a custom visualization library that converts a table to an image written in R, or an ETL operator with a command line interface written in Java. There is no need to alter these operators in any way to use them with Cuneiform. All that is needed is a light-weight wrapper in the form of a task definition.
This is also the reason for Cuneiform coming with no batteries included. Neither is there a standard library nor are there operators for arithmetics or boolean logic. There do not need to be. Any functionality you would expect from a general purpose programming language still is provided by your favorite general purpose programming language. This holds for basic operations like arithmetics as well as for sophisticated libraries and tools. The following foreign languages are currently supported:
Adaptable Software Stack
If you are running Linux or Mac OS, chances are that you do not have to alter your software stack at all. All Cuneiform needs is Erlang, which can be installed via the default package manager with most Linux distributions. You can test small workflows on your laptop and run medium-sized workflows on a workstation with this basic setup. When your use case exceeds a single workstation, you can scale out with a setup using distributed Erlang and HDFS. If your use case exceeds a hundred workstations, you can scale out further with a setup using Erlang, Hadoop, and HDFS.
Open Source Dev Community
Cuneiform is an open source project. It is maintained by a small number of dedicated developers whose goal is to maximize both the opportunities of craftsmanship and the quality of the results. In this sense, we prioritize the improvement of our understanding of scalable workflow languages and Cuneiform's design over other goals (like feature bloat). This approach ensures we can keep up our longterm commitment to supporting Cuneiform.
view all news
- Apr 15, 2017 gen_pnet 0.1.1 released
- Dec 22, 2016 Hi-WAY: Execution of Scientific Workflows on Hadoop YARN accepted at EDBT 2017
- Nov 24, 2016 Scalable Multi-Language Data Analysis on Beam presented at Erlang Factory Lite 2016 in Berlin
- Oct 28, 2016 Cuneiform poster presented at BSR Winterschool 2016 in Ede
- Oct 17, 2016 Release: Cuneiform 2.2.1
- Sep 8, 2016 Scalable Multi-Language Data Analysis on Beam presented at Erlang User Conference 2016 in Stockholm
- Jul 19, 2016 Functional Programming and Foreign Language Interfaces presented at Curry On 2016 in Rome
- Jun 9, 2016 Talk Scalable Multi-Language Data Analysis on Beam: The Cuneiform Experience accepted at Erlang User Conference 2016 in Stockholm
subscribe via RSS