3. Resources
• Google “Apache Systemml”
• Documentation - https://apache.github.io/incubator-systemml/
• DML Language Reference - https://apache.github.io/incubator-systemml/dml-
language-reference.html
• MLContext- https://apache.github.io/incubator-systemml/spark-mlcontext-
programming-guide.html#spark-shell-scala-example
• Github - https://github.com/apache/incubator-systemml
Note
• Some documentation is outdated
• If you find a typo or want to update the document, consider making a Pull Request
• All docs are in Markdown format
• https://github.com/apache/incubator-systemml/tree/master/docs
4. About DML Briefly
• DML = Declarative Machine Learning
• R-like syntax, some subtle differences from R
• Dynamically typed
• Data Structures
• Scalars – Boolean, Integers, Strings, Double Precision
• Cacheable – Matrices, DataFrames
• Data Structure Terminology in DML
• Value Type - Boolean, Integers, Strings, Double Precision
• Data Type – Scalar, Matrices, DataFrames*
• You can have a DataType[ValueType], not all combinations are supported
• For instance – matrix[double]
• Scoping
• One global scope, except inside functions
* Coming soon
5. About DML Briefly
• Control Flow
• Sequential imperative control flow (like most other languages)
• Looping –
• while (<condition>) { … }
• for (var in <for_predicate>) { … }
• parfor (var in <for_predicate>) { … } // Iterations in parallel
• Guards –
• if (<condition>) { ... } [ else if (<condition>) { ... } ... else { … } ]
• Functions
• Built-in – List available in language reference
• User Defined – (multiple return parameters)
• functionName = function (<formal_parameters>…) return (<formal_parameters>) { ... }
• Can only access variables defined in the formal_parameters in the body of the function
• External Function – same as user defined, can call external Java Package
6. About DML Briefly
• Imports
• Can import user defined/external functions from other source files
• Disambiguation using namespaces
• Command Line Arguments
• By position - $1, $2 …
• By name - $X, $Y ...
• Limitations
• A user defined functions can only be called on the right hand side of assignments as
the only expression
• Cannot write
• X <- Y + bar()
• for (i in foo(1,2,3)) { … }
7. Sample Code
A = 1.0 # A is an integer
X <- matrix(“4 3 2 5 7 8”, rows=3, cols=2) # X = matrix of size 3,2 '<-' is assignment
Y = matrix(1, rows=3, cols=2) # Y = matrix of size 3,2 with all 1s
b <- t(X) %*% Y # %*% is matrix multiply, t(X) is transpose
S = "hello world"
i=0
while(i < max_iteration) {
H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) # * is element by element mult
W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
i = i + 1; # i is an integer
}
print (toString(H)) # toString converts a matrix to a string
8. Sample Code
source("nn/layers/affine.dml") as affine # import a file in the “affine“ namespace
[W, b] = affine::init(D, M) # calls the init function, multiple return
parfor (i in 1:nrow(X)) { # i iterates over 1 through num rows in X in parallel
for (j in 1:ncol(X)) { # j iterates over 1 through num cols in X
# Computation ...
}
}
write (M, fileM, format=“text”) # M=matrix, fileM=file, also writes to HDFS
X = read (fileX) # fileX=file, also reads from HDFS
if (ncol (A) > 1) {
# Matrix A is being sliced by a given range of columns
A[,1:(ncol (A) - 1)] = A[,1:(ncol (A) - 1)] - A[,2:ncol (A)];
}
9. Sample Code
interpSpline = function(
double x, matrix[double] X, matrix[double] Y, matrix[double] K) return (double q) {
i = as.integer(nrow(X) - sum(ppred(X, x, ">=")) + 1)
# misc computation …
q = as.scalar(qm)
}
eigen = externalFunction(Matrix[Double] A)
return(Matrix[Double] eval, Matrix[Double] evec)
implemented in (classname="org.apache.sysml.udf.lib.EigenWrapper", exectype="mem")
10. Sample Code (From LinearRegDS.dml*)
A = t(X) %*% X
b = t(X) %*% y
if (intercept_status == 2) {
A = t(diag (scale_X) %*% A + shift_X %*% A [m_ext, ])
A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ]
b = diag (scale_X) %*% b + shift_X %*% b [m_ext, ]
}
A = A + diag (lambda)
print ("Calling the Direct Solver...")
beta_unscaled = solve (A, b)
*https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/LinearRegDS.dml#L133
11. MLContext API
• You can invoke SystemML from the
• Command line or a
• Spark Program
• The MLContext API lets you invoke it from a Spark Program
• Command line invocation described later
• Available as a Scala API and a Python API
• These slides will only talk about the Scala API
12. MLContext API – Example Usage
val ml = new MLContext(sc)
val X_train = sc.textFile("amazon0601.txt")
.filter(!_.startsWith("#"))
.map(_.split("t") match{case Array(prod1, prod2)=>(prod1.toInt, prod2.toInt,1.0)})
.toDF("prod_i", "prod_j", "x_ij")
.filter("prod_i < 5000 AND prod_j < 5000") // Change to smaller number
.cache()
13. MLContext API – Example Usage
val pnmf =
"""
# data & args
X = read($X)
rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)
write(W, $Wout)
write(H, $Hout)
"""
14. MLContext API – Example Usage
val pnmf =
"""
# data & args
X = read($X)
rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)
write(W, $Wout)
write(H, $Hout)
"""
ml.registerInput("X", X_train)
ml.registerOutput("W")
ml.registerOutput("H")
ml.registerOutput("negloglik")
val outputs = ml.executeScript(pnmf,
Map("maxiter" -> "100", "rank" -> "10"))
val negloglik = getScalarDouble(outputs,
"negloglik")
15. Invocation – How to run a DML file
• SystemML can run on
• Your laptop (Standalone)
• Spark
• Hybrid Spark – using the better choice between the driver and the cluster
• Hadoop
• Hybrid Hadoop
• For this presentation, we care about standalone, spark &
hybrid_spark
• Documentation has detailed instructions on the others
19. Editor Support
• Very rudimentary editor support
• Bit of shameless self-promotion :
• Atom – Hackable Text editor
• Install package - https://atom.io/packages/language-dml
• From GUI - http://flight-manual.atom.io/using-atom/sections/atom-packages/
• Or from command line – apm install language-dml
• Rudimentary snippet based completion of builtin function
• Vim
• Install package - https://github.com/nakul02/vim-dml
• Works with Vundle(vim package manager)
• There is an experimental Zeppelin Notebook integration with DML –
• https://issues.apache.org/jira/browse/SYSTEMML-542
• Available as a docker image to play with - https://hub.docker.com/r/nakul02/incubator-zeppelin/
• Please send feedback when using these, requests for features, bugs
• I’ll work on them when I can
20. Other Information
• All scripts are in - https://github.com/apache/incubator-
systemml/tree/master/scripts
• Algorithm Scripts - https://github.com/apache/incubator-
systemml/tree/master/scripts/algorithms
• Test Scripts - https://github.com/apache/incubator-
systemml/tree/master/src/test/scripts
• Look inside the test folder for programs that run the tests, play
around with some of them - https://github.com/apache/incubator-
systemml/tree/master/src/test/java/org/apache/sysml/test