Table of Contents

The Big Data Cluster "LBD"

The Big Data Cluster of the TU Wien is used for teaching and research and is using an (extended) Hadoop Software Stack. It is made to exploit easily available parallelism with automatic parallelization of programs written in Python, Java, Scala and R.

Typical programs make use of either

for parallelization.

Available software

Access

Hardware

LBD consists of

The login nodes are reachable from TUnet, the internal net of the TU Wien via https://lbd.tuwien.ac.at or ssh://login.tuwien.ac.at.

The hardware – which is then virtualized using Openstack – consists of

HDFS configuration

Jupyter Notebook

Most users use the LBD cluster via Jupyter Notebooks.

Example code

A short example using Spark and Python:

import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 10000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()