3. My Roots
Images from BYU Mers Lab
Saturday, March 17, 12
4. Science led to Python
2
⇢0 (2⇡f ) Ui (a, f ) = [Cijkl (a, f ) Uk,l (a, f )],j
Raja Muthupillai
Richard Ehman
1997
Armando Manduca
Saturday, March 17, 12
9. Brief History
Person Package Year
Matrix Object
Jim Fulton 1994
in Python
Jim Hugunin Numeric 1995
Perry Greenfield, Rick
White, Todd Miller Numarray 2001
Travis Oliphant NumPy 2005
Saturday, March 17, 12
10. Found Python and Numeric in 1997
I was a fairly proficient MATLAB user, but it was not memory efficient enough.
Loved the expressive syntax of Python
Loved the fact that slicing didn’t make copies
Loved the existing multiple data-types
Loved how much more flexible it was to
extend than MATLAB was
Loved that I could read the source code and
extend it
Saturday, March 17, 12
11. First problem: Efficient Data Input
“It’s All About the Data”
Reference Counting Essay
TableIO http://www.python.org/doc/essays/refcnt/
April 1998 May 1998
Michael A. Miller Guido van Rossum
NumPyIO
June 1998
Saturday, March 17, 12
12. First problem: Efficient Data Input
“It’s All About the Data”
Reference Counting Essay
TableIO http://www.python.org/doc/essays/refcnt/
April 1998 May 1998
Michael A. Miller Guido van Rossum
NumPyIO
June 1998
Saturday, March 17, 12
13. Early pieces of SciPy
fftw wrappers cephesmodule
June 1998 November 1998
stats.py
December 1998
Gary
Strangman
Saturday, March 17, 12
14. 1999 : Early SciPy emerges
Discussions on the matrix-sig from 1997 to 1999 wanting a complete data analysis
environment: Paul Barrett, Joe Harrington, Perry Greenfield, Paul Dubois, Konrad Hinsen,
and others. Activity in 1998, led to increased interest in 1999.
In response on 15 Jan, 1999, I posted to matrix-sig a list of routines I felt needed to be
present and began wrapping / writing in earnest. On 6 April 1999, I announced I would
be creating this uber-package which eventually became SciPy
Gaussian quadrature 5 Jan 1999
cephes 1.0 30 Jan 1999
sigtools 0.40 23 Feb 1999
Numeric docs March 1999
cephes 1.1 9 Mar 1999 Plotting??
multipack 0.3 13 Apr 1999
Helper routines 14 Apr 1999 Gist
multipack 0.6 (leastsq, ode, fsolve, 29 Apr 1999 XPLOT
quad)
DISLIN
sparse plan described 30 May 1999
Gnuplot
multipack 0.7 14 Jun 1999
SparsePy 0.1
cephes 1.2 (vectorize)
5 Nov 1999
29 Dec 1999
Helping with f2py
Saturday, March 17, 12
15. Early 2000 : Numeric needs
• Memory mapped arrays
• Rank-0 arrays or scalars
• Handling indirect indexing: a[[10,5,7]]
• Handling masked indexing: a[[True, False, False]
• More attributes to N-d arrays
• “Record arrays”
Saturday, March 17, 12
16. Early contributors in 1999
Hosting of first Multipack CVS repository (June 1999)
Amazing makefiles
Interface to FITPACK
Wrote f2py as he watched my brute-force approach (July 1999)
Pearu Peterson
(IPP) Early IPython interactive environment (27 Apr 1999)
Matlab file reader (24 Apr 1999)
Janko Hauser
Created windows binaries of multipack, cephesmodule,
fftw, and signaltools (June 1999 while still in high school!)
Robert Kern
Saturday, March 17, 12
17. Early contributors in 1999
Hosting of first Multipack CVS repository (June 1999)
Amazing makefiles
Interface to FITPACK
Wrote f2py as he watched my brute-force approach (July 1999)
Pearu Peterson
(IPP) Early IPython interactive environment (27 Apr 1999)
Matlab file reader (24 Apr 1999)
Janko Hauser
Created windows binaries of multipack, cephesmodule,
fftw, and signaltools (June 1999 while still in high school!)
Robert Kern
Saturday, March 17, 12
22. SciPy 2001 Travis Oliphant
optimize
sparse
interpolate
integrate
special
signal
stats Founded in 2001 with Travis Vaught
fftpack
misc
Eric Jones
weave
cluster
Pearu Peterson
GA*
linalg
interpolate
f2py
Saturday, March 17, 12
25. STSCI leads out with Numarray
Perry Greenfield
J. Todd Miller
Rick White
Paul Barrett
Saturday, March 17, 12
26. Numarray released in 2003
• too slow for small arrays
• incomplete implementation for ufuncs
• minimal Numeric code re-use
• lots of very nice things, though (e.g. memory
maps, fast code for large arrays, better
sorting algorithms)
Saturday, March 17, 12
27. Split in the community
Numeric
SciPy
Numarray
ndimage
others
Saturday, March 17, 12
29. started January 2005
NumPy
Version 1.0 October 2006
Key contributions from:
Numarray
Numeric
Chuck Harris
Robert Kern
David Cooke
Pierre GM
Saturday, March 17, 12
30. NumPy in Python
Q: When is NumPy going to be part of the
core Python?
A: Data structure is in Python as PEP 3118
Saturday, March 17, 12
33. Community effort many, many others --- forgive me!
• Chuck Harris
• Pauli Virtanen
• David Cournapeau
• Stefan van der Walt
• Jarrod Millman
• Josef Perktold
• Anne Archibald
• Dag Sverre Seljebotn
• Robert Kern
• Matthew Brett
• Warren Weckesser
• Ralf Gommers
• Joe Harrington --- Documentation effort
• Andrew Straw --- www.scipy.org
Saturday, March 17, 12
35. NumPy Essentials
• Data: the array object
– slicing
– shapes and strides
– data-type generality
• Fast Math:
– vectorization
– broadcasting
– aggregations
Saturday, March 17, 12
36. Zen of NumPy
• strided is better than scattered
• contiguous is better than strided
• descriptive is better than imperative
• array-oriented is better than object-oriented
• broadcasting is a great idea
• vectorized is better than an explicit loop
• unless it’s too complicated --- then use Cython / weave
• think in higher dimensions
Saturday, March 17, 12
38. Experiences Conulsting
• Basically, I was a consultant and manager of
consultants for 4 years.
• It kept the lights on at home, but gave me minimal
time for NumPy.
• Silver lining was that I learned exactly what “big-data”
problems big companies have and saw first-hand that
numerous “little” improvements need to be made to
NumPy which will allow it to have a larger impact on
helping to manage the world’s data.
Saturday, March 17, 12
39. Putting Science back in CS
• Much of the software stack is for systems
programming --- C++, Java, .NET, ObjC, web
- Complex numbers?
- Vectorization primitives?
• Array-based programming has been
supplanted by Object-oriented programming
• Software stack for scientists is not as helpful
as it should be
• Fortran is still where many scientists end up
Saturday, March 17, 12
40. APL : the first array-oriented language
• Appeared in 1964
• Originated by Ken Iverson
• Direct descendants (J, K, Matlab) are still
used heavily and people pay a lot of money
for them APL
• NumPy is a descendent J
K Matlab
Numeric
NumPy
Saturday, March 17, 12
41. Conway’s game of Life
• Dead cell with exactly 3 live neighbors
will come to life
• A live cell with 2 or 3 neighbors will
survive
• With too few or too many neighbors, the
cell dies
Saturday, March 17, 12
43. Conway’s Game of Life
APL
NumPy
Initialization
Update Step
Saturday, March 17, 12
44. Improvements needed
• NDArray improvements
• Indexes (esp. for Structured arrays)
• SQL front-end
• Multi-level, hierarchical labels
• selection via mappings (labeled arrays)
• Memory spaces (array made up of regions)
• Distributed arrays (global array)
• Compressed arrays
• Standard distributed persistance
• fancy indexing as view and optimizations
• streaming arrays
Saturday, March 17, 12
45. Improvements needed
• Dtype improvements
• Enumerated types (including dynamic enumeration)
• Derived fields
• Specification as a class (or JSON)
• Pointer dtype (i.e. C++ object, or varchar)
• Finishing datetime
• Missing data with both bit-patterns and mask
• Parameterized field names
Saturday, March 17, 12
46. Example of Object Dtype
@np.dtype
class Stock(np.DType):
symbol = np.Str(4)
open = np.Int(2)
close = np.Int(2)
high = np.Int(2)
low = np.Int(2)
@np.Int(2)
def mid(self):
return (self.high + self.low) / 2.0
Saturday, March 17, 12
47. Improvements needed
• Ufunc improvements
• Generalized ufuncs support more than just
contiguous arrays
• Specification of ufuncs in Python
• Move most dtype “array functions” to ufuncs
• Unify error-handling for all computations
• Allow lazy-evaluation and remote computation ---
streaming and generator data
• Structured and string dtype ufuncs
• Multi-core and GPU optimized ufuncs
• Group-by reduction
Saturday, March 17, 12
48. Improvements needed
• Miscellaneous improvements
• ABI-management
• Eventual Move to C++ library (NDLib)
• NDLib could serve as base for Javascript and other
high-level languages
• Integration with LLVM
• Possible dtype / shape / stride unification into a
“dimension protocol”
• Remote computation
• Fast I/O for CSV and Excel
Saturday, March 17, 12
49. NumPy Users
• Want to be able to write Python to get fast
code that works on arrays and scalars
• Need access to a boat-load of C-extensions
(NumPy is just the beginning)
PyPy doesn’t cut it for us!
Saturday, March 17, 12
50. Ufuncs
Saturday, March 17, 12
Generalized
UFuncs
Python
Function
Window
Kernel
Funcs
Function-
based
Indexing
Memory
Dynamic compilation
Filters
Dynamic
Compilation
NumPy Runtime
I/O Filters
Reduction
Filters
Computed
Columns
function pointer
51. SciPy needs a Python compiler
optimize integrate
special ode
writing more of SciPy at high-level
Saturday, March 17, 12
52. Numba -- a Python compiler
• Replays byte-code on a stack with simple type-
inference
• Translates to LLVM (using LLVM-py)
• Uses LLVM for code-gen
• Resulting C-level function-pointer can be
inserted into NumPy run-time
• Understands NumPy arrays
• Is NumPy / SciPy aware
Saturday, March 17, 12
53. NumPy + Mamba = Numba
Python Function Machine Code
LLVM-PY
LLVM 3.1
ISPC OpenCL OpenMP CUDA CLANG
Intel AMD Nvidia Apple
Saturday, March 17, 12
60. NumFOCUS
Num(Py) Foundation for Open Code for Usable Science
Saturday, March 17, 12
61. NumFOCUS
• Mission
• To initiate and support educational programs
furthering the use of open source software in
science.
• To promote the use of high-level languages and
open source in science, engineering, and math
research
• To encourage reproducible scientific research
Saturday, March 17, 12
62. NumFOCUS
• Activites
• Sponsor sprints and conferences
• Provide scholarships and grants
• Pay for documentation development and basic
course development
• Work with domain-specific organizations
• Raise funds from industries using Python and
NumPy
Saturday, March 17, 12
63. NumFOCUS
• Directors
• Jarrod Millman
• Fernando Perez
• Travis Oliphant
• Perry Greenfield
• John Hunter
• Members
• Basically people who donate for now. In time, a
body that elects directors.
Saturday, March 17, 12