A short presentation I gave on why Apache Spark is such an impressive analytics platform, particularly for R and Python users. I also discuss how academia can benefit from Amazon AWS implementation.
It facilitates two types of operations: transformation and action.
A transformation is an operation such as filter(), map(), or union() on an RDD that yields another RDD.
An action is an operation such as count(), first(), take(n), or collect() that triggers a computation, returns a value back to the Master, or writes to a stable storage system.
More later on transformations and actions
Note: no type definition
http://www.scala-lang.org/
A lot of transformations are lazy: NOT run until an action is requested