2. • Background
- Machine Learning (ML) Research.
- Ph.D Numerical Optimization @Wisconsin
• Now
- Build ML tools for data-scientists & developers @Dato.
- Help deploy ML algorithms.
@krishna_srd, @DatoInc
About Me
12. SFrame Python API
Make a little SFrame of 1 column and 5 values:
>> sf = gl.SFrame({‘x’:[1,2,3,4,5]})
Normalizes the column x:
>> sf[‘x’] = sf[‘x’] / sf[‘x’].sum()
Uses a python lambda to create a new column:
>> sf[‘x-squared’] = sf[‘x’].apply(lambda x: x*x if x > 0 else 0)
Create a new column using a vectorized operator:
>> sf[‘x-cubed’] = sf[‘x-squared’] * sf[‘x’]
Create a new SFrame taking only 2 of the columns:
>> sf2 = sf[[‘x’,’x-squared’]]
14. nrating
sf[‘nrating’]-=-sf2[‘rating’]
What is the SFrame?
sf#=#gl.SFrame(‘netflix_tr.frame’)
user movie rating
netflix_tr.frame
sf
user
item
rating
sf2$=$gl.SFrame(‘netflix_norm.frame’)
user movie rating
netflix_norm.frame
sf2
user
item
rating
15. nrating
sf[‘nrating’]-=-sf2[‘rating’]
What is the SFrame?
sf#=#gl.SFrame(‘netflix_tr.frame’)
user movie rating
netflix_tr.frame
sf
user
item
rating
sf2$=$gl.SFrame(‘netflix_norm.frame’)
user movie rating
netflix_norm.frame
sf2
user
item
rating
diff
anonymous
diff$=$sf[‘rating’]$0 sf2[‘rating’]
16. What is the SFrame?
Filtering
sf[sf[‘rating’]->=-3]
Joins
Sf.join(user_table,-on=‘user_id’)
Random/Array3indexing
row10-=-sf[10]
Table_with_every_other_row =-sf[::2]
Rather3Fast3Parallelized3UDFs3(Interproc SHM)
sf[‘rating’].apply(lambda-x:-x*x)
Not a SQL
Frontend
17. SArray Column Types
Boring Scalar Types
- int64, double, string
Interesting Scalar Types
- Datetime, image
Mathematician Type
- array(‘d’)
Industrial Data Scientist Type
- list, dict
21. Cross Platform?
Python Bindings
- Our oldest binding
- Via Cython + Interprocessing communication to a C++ binary
R Bindings
- Via RCpp
- In Beta. Soon to be released.
C++ Bindings
- Used for internal development of
Julia Bindings
- “Hackathon” mock project mature
22. SGraph: Common Crawl
1x r3.8xlarge ! using 1x SSD.
PageRank:)9 min%per%iteration.
Connected)Components:))~%1%hr.
There)isn’t)any)general)purpose)library)out)there)capable)of)this.
3.5 billion Nodes and 128 billion Edges