15 NumPy
- Be able to use numpy
- Know what numpy is good for
15.1 What Is NumPy?
NumPy (Numerical Python) is a foundational Python library for numerical and scientific computing. It provides efficient data structures and functions for working with large, multi-dimensional numerical datasets. NumPy is widely used in data science, machine learning, engineering, physics, and finance, and it underpins many other libraries in the Python scientific ecosystem, such as pandas, SciPy, and scikit-learn.
You will likely need NumPy in many of the future python courses including in machine learning and data science courses, and therefore we introduce it here.
15.2 Base Data Structure: The ndarray
The core data structure in NumPy is the ndarray (N-dimensional array).
Key characteristics:
- Stores elements of a single data type (e.g., integers, floats)
- Supports one-dimensional (vectors), two-dimensional (matrices), and higher-dimensional arrays
- Optimised for performance and memory efficiency
Example:
import numpy as np
a = np.array([1, 2, 3]) # 1D array
b = np.array([[1, 2], [3, 4]]) # 2D array
15.3 Common Operations
Element-wise Operations
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
x + y # array([5, 7, 9])
x * y # array([4, 10, 18])
Broadcasting
x * 2 # array([2, 4, 6])
Indexing and Slicing
x[0] # 1
x[1:] # array([2, 3])
Aggregations
x.mean() # 2.0
x.sum() # 6
x.max() # 3
Linear Algebra
A = np.array([[1, 2], [3, 4]])
np.linalg.det(A)
np.linalg.inv(A)
15.4 Reading and Saving Data
Reading Data
Common methods include from text or CSV files:
data = np.loadtxt("data.csv", delimiter=",")
or from NumPy’s native binary format:
data = np.load("data.npy")
Saving Data
np.savetxt("output.csv", data, delimiter=",")
np.save("output.npy", data)
NumPy is often combined with pandas another package for more complex data ingestion tasks.
15.5 Advantages and Disadvantages
| Advantages | Disadvantages |
|---|---|
| Very fast for numerical operations due to optimized C-based implementation | Limited support for non-numeric or mixed data |
| Memory-efficient compared to Python lists | Less intuitive for labeled or relational data |
| Extensive mathematical and linear algebra functionality | Steeper learning curve than basic Python lists for beginners |
| Integrates well with the scientific Python ecosystem |
15.6 Common Alternatives and Their Use Cases
pandas Used for labeled, tabular data (DataFrames). Builds on NumPy and adds indexing, grouping, and data-cleaning tools.
SciPy Extends NumPy with advanced scientific algorithms (optimization, signal processing, statistics).
TensorFlow / PyTorch Used for large-scale numerical computation and machine learning, especially with GPUs and automatic differentiation.
Python Lists Suitable for small datasets or heterogeneous data, but inefficient for numerical computation.
‘NumPy’ Documentation
The ‘NumPy’ documentation is very good and the modules are very widely used! For this reason I have not given any more examples here. When you start coding more, you must use the resources and documentation to solve your own challenges. Use the both the documentation and what you have learnt so far in the following exercises.
https://numpy.org/doc/
15.7 Summary
Numpy is very useful for numerical calculations, that are common in many fields.