Back to catalog

Season 34 7 Episodes 25 min 2026

NumPy

v2.4 — 2026 Edition. An audio course introducing NumPy, explaining its high performance, multidimensional arrays, and critical role in the Python ecosystem. (v2.4, 2026 Edition)

Data Science Python Core

🌐 English 🇪🇸 Español 🇫🇷 Français 🇵🇹 Português 🇮🇹 Italiano 🇵🇱 Polski 🇩🇪 Deutsch 🇷🇴 Română

Now Playing

Click play to start

0:00

The Core Identity: ndarray

This episode covers the ndarray object, homogeneous data types, and fixed memory allocation. You will learn why standard Python lists are inefficient for large-scale math and how NumPy solves this by dropping down to compiled C code.

3m 36s

Summoning Arrays: Creation & Shape

This episode explores how to properly create multidimensional arrays using intrinsic functions. You will learn how to use tools like zeros, arange, and linspace to generate datasets instantly.

3m 51s

Under the Hood: Memory, Strides, and Views

This episode dives into NumPy's internal architecture, focusing on the data buffer and strides. You will learn why operations like slicing and transposing are virtually instantaneous because they return memory views, not copies.

3m 40s

Universal Functions: Math Without Loops

This episode covers Universal Functions (ufuncs) and how they vectorize operations. You will learn to eliminate Python for-loops entirely by applying element-by-element math and axis-based reductions.

3m 44s

Broadcasting: The Magic of Mismatched Shapes

This episode explains the exact rules of Broadcasting. You will learn how NumPy conceptually stretches arrays of mismatched shapes so they can be processed together without wasting memory.

3m 43s

Precision Filtering: Boolean Masking

This episode focuses on advanced boolean masking to filter complex datasets. You will learn how to extract highly specific data points from massive arrays using simple conditional logic.

3m 19s

The Universal Translator: Interoperability

This episode reveals why NumPy remains the backbone of Python data science. You will learn how DLPack and the array interface allow zero-copy memory sharing between tools like Pandas and PyTorch.

3m 41s

Episodes

The Core Identity: ndarray

3m 36s

Download

Hi, this is Alex from DEV STORIES DOT EU. NumPy, episode 1 of 7. The standard Python list is highly flexible, but the moment you try to do math on a million items, you hit a performance wall. You end up paying a massive pointer tax just to multiply numbers together. The solution to this bottleneck is the core engine of Python scientific computing: the ndarray. To understand why the ndarray exists, you have to look at what standard lists do under the hood. A Python list does not store raw numbers. It stores pointers. Each pointer directs the system to a scattered location in memory where a full Python object lives. If you write a standard loop to multiply two sequences of a million numbers, the Python interpreter works extremely hard. For every single item, it fetches the pointer, locates the object, checks its data type to confirm it is actually a number, calculates the math, and stores the new object. Doing this a million times introduces severe overhead. This pointer-chasing and type-checking cycle is why standard loops are simply too slow for large mathematical operations. The ndarray, which stands for N-dimensional array, drops this flexibility in exchange for raw speed. The N-dimensional part means this object can represent a flat sequence of numbers, a two-dimensional grid, or a complex multi-dimensional mathematical matrix. Regardless of how many dimensions you define, under the hood, it operates on two strict rules. First, it requires homogeneous data types. Every single element in an ndarray must be the exact same type, such as a sixty-four-bit float. Second, it uses a fixed memory size. When you create an ndarray, NumPy reserves a single, continuous block of memory. There are no pointers. The raw numbers sit tightly packed next to each other in the system memory. Here is the key insight. Because NumPy knows the exact data type and the exact memory layout, it can completely bypass the slow Python interpreter. When you multiply two ndarrays containing a million numbers, you do not write a loop. You simply write array A multiplied by array B. This process is called vectorization. NumPy takes your command and hands the actual calculation over to pre-compiled C code. The C code blasts through that continuous block of memory at hardware speeds. It skips the type-checking and pointer lookups for individual items because the memory is perfectly uniform. The trade-off for this massive speed boost is structural rigidity. Because the memory is one continuous block, you cannot just append a new number to an ndarray the way you do with a Python list. If you need a larger array, NumPy generally has to allocate a brand-new block of memory and copy the old data over. You build the container to the exact size you need, and then you run your operations on the whole block at once. The standard Python list is a collection of isolated objects spread across memory. The NumPy ndarray is a dense, uniform block of raw data designed to be processed instantly by optimized C code. If you enjoy these episodes and want to support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

Summoning Arrays: Creation & Shape

3m 51s

This episode explores how to properly create multidimensional arrays using intrinsic functions. You will learn how to use tools like zeros, arange, and linspace to generate datasets instantly.

Download

Hi, this is Alex from DEV STORIES DOT EU. NumPy, episode 2 of 7. You rarely type out data by hand in data science. Instead, you need to summon vast empty grids and numerical ranges into existence with a single command. Let us talk about the intrinsic functions that let you create arrays from scratch and control their structure. First, a quick correction on manual creation. When you convert a standard Python list into an array using the basic array function, a common mistake is passing multiple separate arguments to create multiple dimensions. The function expects a single sequence. To make a two-dimensional array, you pass one list that contains other lists, not two separate lists. Every array you create carries structural metadata. Two properties matter most here. The first is ndim, which tells you the number of axes, or dimensions, the array has. A flat sequence has an ndim of one, while a flat grid has an ndim of two. The second property is shape. Shape is a tuple of integers indicating the exact size of the array along each dimension. If you have a matrix with two rows and three columns, its shape is two by three. The length of the shape tuple will always equal the ndim value. Creating arrays from existing lists is fine for small tests, but real work requires generating arrays programmatically. If you need a placeholder to fill with data later, you use the zeros or ones functions. You simply pass a shape tuple to these functions, and they return an array of that exact structure, populated entirely with zeros or ones. By default, these functions create floating-point numbers, but you can override this by specifying a different data type. When you need a sequence of numbers, NumPy provides two main tools. The first is arange, which works much like standard Python range. You give it a starting value, a stopping value, and a step size. It generates an array of numbers spaced by that step. While arange is great for integers, using it with floating-point steps can cause unpredictable results because of how computers handle decimal precision. The number of elements you get back might vary slightly depending on microscopic rounding errors. That brings us to linspace, which solves the floating-point precision problem. Instead of defining the step size, you define the exact number of elements you want. You give linspace a start value, a stop value, and the total number of points. NumPy calculates the exact spacing for you. Consider a scenario where you are evaluating a mathematical function over a specific interval. You want to calculate the function across a smooth grid of coordinates between zero and one. Using linspace, you can generate exactly one hundred evenly spaced coordinates across that interval. You get a perfectly distributed one-dimensional array, guaranteeing that both the start and end boundaries are included. This is where it gets interesting. The distinction between these two sequence generators dictates your workflow. Use arange when the exact step size matters, like counting integers by twos, but always use linspace when dealing with floats and intervals so you can guarantee exactly how many data points you get and precisely hit your boundaries. Appreciate you listening — catch you next time.

Under the Hood: Memory, Strides, and Views

3m 40s

Download

Hi, this is Alex from DEV STORIES DOT EU. NumPy, episode 3 of 7. You have a matrix of a billion pixels, and you need to flip its rows and columns. If you do this in standard Python, your machine will grind to a halt while it copies gigabytes of data. In NumPy, this operation happens instantly. Reordering those pixels does not actually move a single byte of memory. This happens because of how NumPy handles Memory, Strides, and Views under the hood. To understand why NumPy is so fast, you have to look at the internal structure of an array. A NumPy array is not a single, monolithic object. It is strictly separated into two pieces. The first piece is the data buffer. This is a contiguous, flat block of raw memory. It is just a one-dimensional line of bytes sitting in RAM. The raw buffer knows absolutely nothing about rows, columns, or dimensions. The second piece is the metadata. This is a small header, implemented internally as a C structure, that tells NumPy how to interpret that raw line of bytes. The metadata holds a pointer to the start of the data buffer, the data type, the shape of the array, and the strides. Strides are the mechanism that transforms a flat line of memory into a multi-dimensional grid. A stride is simply the number of bytes the computer must step forward in memory to find the next element along a specific axis. Suppose you have a two-dimensional array of 64-bit integers. Each integer takes up eight bytes. To move one column to the right, the stride might be eight bytes. But to drop down to the next row, the stride might be eighty bytes, because it has to skip past an entire row of data in the raw buffer to find the start of the next one. Many developers assume that when you take a slice of an array, the system allocates new memory and copies the selected data over. That is incorrect. When you request a slice, NumPy leaves the raw data buffer completely untouched. Instead, it creates a new metadata header. This new header points to the exact same block of memory, but it changes the starting pointer and modifies the strides to skip the elements you excluded. This is called a view. Here is the key insight. Because the raw data and the metadata are kept separate, operations that change the shape or order of the array are almost completely free. Think back to transposing that massive billion-pixel matrix. NumPy does not pick up the data and rearrange it physically. It simply swaps the stride values in the new metadata header. The number of bytes you used to skip to find the next row becomes the number you skip to find the next column. A completely different array structure is returned to your code, but it is just a view looking at the exact same physical memory. This separation is the foundation of NumPy's memory efficiency. You can carve a massive dataset into dozens of overlapping slices, pass them into different functions, and consume zero extra memory for the data itself. You are only generating tiny metadata headers. However, this means you share state. Modifying a value in a slice will alter the original array, because there is only one true data buffer beneath them all. The separation of raw bytes from the rules that govern them means your heaviest data transformations are often just lightweight metadata swaps. That is all for this one. Thanks for listening, and keep building!

Universal Functions: Math Without Loops

3m 44s

Download

Hi, this is Alex from DEV STORIES DOT EU. NumPy, episode 4 of 7. If you ever find yourself writing a for-loop to multiply numbers in an array, stop. Doing that means you are paying Python's interpreter overhead on every single item, and your code is running a hundred times slower than it needs to. The solution is using universal functions to perform math without loops. A common mistake is trying to pass an entire array to a function from Python's standard math module. If you pass a million sensor readings to math dot sine, Python throws an error. The standard math module only understands single scalar values. To process an array, you would normally have to write a loop. But Python is a dynamic language. Inside a loop, the interpreter evaluates the data type on every single iteration before it calculates the result. When you process massive datasets, those tiny type-checking pauses add up to a significant performance bottleneck. A universal function, or ufunc, solves this by operating on arrays element by element automatically. When you call a ufunc, NumPy pushes the loop execution down to compiled C code. Because NumPy arrays have a single, uniform data type, the C code does not need to pause and check types. It iterates over contiguous blocks of memory and calculates the result as fast as your processor allows. Let us look at a concrete scenario. You have an array containing thousands of environmental sensor readings, and you need to apply a mathematical transformation to all of them at once. Instead of writing a loop, you just pass the entire array into a universal function like numpy dot exp or numpy dot sin. The ufunc takes your input array, runs the fast C-level loop across every single element, and returns a completely new array populated with the transformed readings. This is the part that matters. Ufuncs do more than just element-by-element transformations. They contain built-in methods to collapse data, completely bypassing Python for aggregations. The most common is the reduce method. Suppose you applied your mathematical transformation, and now you need the total sum of the entire array. You call the reduce method directly on the addition ufunc. You write numpy dot add dot reduce, and hand it your array. The reduce method applies the underlying addition operation to the first two elements. It takes that sum, adds it to the third element, and continues this pattern until the entire array is collapsed into a single scalar value. If your data has multiple dimensions, you can control how this collapse happens. If your sensor readings form a two-dimensional grid, where rows are different sensors and columns are individual timestamps, reducing the entire grid to one number destroys that structure. By providing an axis argument, you control the direction of the operation. If you tell the reduce method to operate along axis zero, it collapses the rows, leaving you with an array containing the sum of all sensors at each individual timestamp. Any time you let a universal function handle the iteration natively, you are trading slow Python loops for hardware-optimized C execution. Thanks for tuning in. Until next time!

Broadcasting: The Magic of Mismatched Shapes

3m 43s

This episode explains the exact rules of Broadcasting. You will learn how NumPy conceptually stretches arrays of mismatched shapes so they can be processed together without wasting memory.

Download

Hi, this is Alex from DEV STORIES DOT EU. NumPy, episode 5 of 7. What happens when you try to multiply a three-dimensional matrix of a million pixels by a single small array of three numbers? In many strict languages, you get a shape mismatch error. In NumPy, it just works. This behavior is called broadcasting. Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. It takes the smaller array and conceptually stretches it across the larger one so their shapes align perfectly. Listeners often mistakenly believe this stretching physically copies data to build a new, matching array in memory. It does not. NumPy handles this alignment implicitly at the C level. It iterates over the same elements multiple times using zero memory overhead, which makes broadcasting incredibly fast and highly efficient. To use it, you must understand how NumPy decides if two arrays are compatible. It does not look at the total number of elements. It looks at the shape tuples. NumPy lines up the shapes of the two arrays and compares them starting from the trailing dimensions—the ones at the far right—and works its way left. Two dimensions are compatible if they meet one of two strict conditions. They must be exactly equal, or one of them must be the number one. If neither condition is met at any point during the comparison, NumPy throws a ValueError and the operation fails. Let us apply this to a concrete scenario. You are working with an RGB image. You load it into a NumPy array with a shape of 256 by 256 by 3. The 3 represents the red, green, and blue color channels at the end of the shape. Now, you need to color correct this image by scaling each color channel differently. You define a one-dimensional array containing three color correction weights. The shape of this array is just 3. When you multiply the massive image array by the tiny weights array, NumPy applies the right-to-left rule. It places the image shape, 256 by 256 by 3, above the weights shape, which is just 3. Starting on the far right, the trailing dimensions are compared. Both are 3. Because they are equal, they are compatible. Then, NumPy moves to the left. The image array has a dimension of 256, but the weights array has run out of dimensions entirely. This is where the second part of the rule kicks in. When an array has fewer dimensions than the other, NumPy implicitly prepends ones to its shape until they match in length. The weights array shape is treated as 1 by 1 by 3. Now the comparison continues. The image has a dimension of 256, and the weights array now has a dimension of 1. Because one of them is a 1, they are compatible. The weights array is conceptually stretched down the 256 rows. This happens again for the next dimension. The shapes align, and NumPy applies your three color weights across all sixty-five thousand pixels seamlessly. This is the part that matters. The rule only works right to left. If you have a two-dimensional array shaped 5 by 4, and you try to add an array of shape 5, you might think it would stretch across the columns. It will not. Starting from the right, NumPy compares 4 and 5. They are not equal, and neither is one. The operation instantly fails. To make it work, you would have to reshape the second array to 5 by 1 first. Broadcasting lets you write clean, loop-free code that executes at compiled speeds. The golden rule is always trailing dimensions first: they must match exactly, or one of them must be a one. Thanks for tuning in. Until next time!

Precision Filtering: Boolean Masking

3m 19s

This episode focuses on advanced boolean masking to filter complex datasets. You will learn how to extract highly specific data points from massive arrays using simple conditional logic.

Download

Hi, this is Alex from DEV STORIES DOT EU. NumPy, episode 6 of 7. Grabbing every negative number from a billion-item array is not a search problem. It should not require a loop, and it certainly should not slow down your application. This is where we use Precision Filtering: Boolean Masking. Basic slicing grabs neat, predictable chunks of data. But real-world data is messy. You rarely just want the first ten items. You want specific elements based on a logical condition, and those elements might be scattered randomly throughout your dataset. In NumPy, grabbing data this way is called advanced indexing, and boolean masking is one of its most powerful forms. Take a massive array of temperature readings. You need to isolate only the values that drop below freezing. Instead of writing a loop to check every single reading, you use a mask. A mask is an array of boolean values, meaning True or False. Crucially, this boolean array has the exact same shape as your original data array. You create it by applying a less-than-zero condition directly to the array. NumPy instantly evaluates every single element. If a temperature is below zero, the mask records a True at that exact position. If the temperature is zero or higher, it records a False. Once you have this mask, you apply it back to the original array by passing it exactly where you would normally put an index. NumPy reads the mask, picks out every element where the mask is True, and drops the rest. The operation yields a brand new, one-dimensional array containing nothing but your freezing temperatures. It happens in a single step, pushed down to highly optimized C code under the hood. Pay attention to this next distinction, because it trips up a lot of developers. You need to know exactly what NumPy hands back to you when you filter data. Basic slicing returns a view. If you slice the first ten elements of an array and modify them, the original array changes. Boolean masking behaves entirely differently. Because boolean masking is a form of advanced indexing, it always returns a copy of the data, never a view. The reason is memory architecture. When you slice, NumPy just changes the pointers to an existing, continuous block of memory. But when you apply a boolean mask, the elements you select are completely arbitrary. They do not sit neatly next to each other in memory anymore. NumPy has to gather them up and allocate new space to store the result. This means if you modify your newly filtered array of freezing temperatures, your original dataset remains completely untouched. You can also chain these conditions. If you need temperatures that are below freezing but strictly above minus ten degrees, you combine the two logical conditions. NumPy evaluates the combined logic element by element and builds a single precise mask. When you apply a boolean mask, you are trading the memory efficiency of a view for absolute filtering precision, giving you a pristine, independent copy of exactly the data you asked for. That is all for this one. Thanks for listening, and keep building!

The Universal Translator: Interoperability

3m 41s

This episode reveals why NumPy remains the backbone of Python data science. You will learn how DLPack and the array interface allow zero-copy memory sharing between tools like Pandas and PyTorch.

Download

Hi, this is Alex from DEV STORIES DOT EU. NumPy, episode 7 of 7. With modern GPU frameworks handling the heavy lifting in machine learning, you might think NumPy is becoming obsolete. You might even assume that converting a PyTorch CPU tensor to a NumPy array requires a slow memory copy, dragging down your pipeline. Neither is true. The reality is that NumPy acts as The Universal Translator: Interoperability, holding the entire Python data ecosystem together. Think about a standard machine learning pipeline. You use Pandas to load and clean a massive tabular dataset. You extract those values into NumPy to apply a specialized mathematical filter. Finally, you pass that filtered data into PyTorch to train a neural network. If each of these libraries isolated their data, moving from step to step would mean duplicating the entire dataset in memory over and over. You would exhaust your RAM and waste processing time just moving bytes around. Instead, through the magic of interoperability, the data never actually moves. Pandas, NumPy, and PyTorch simply share the exact same underlying memory pointer. When PyTorch reads the data, it is looking at the exact same physical memory addresses that Pandas originally allocated. This zero copy sharing is made possible by standard memory protocols. The foundational one is the array interface. If a Python object exposes this interface, it essentially hands over a small dictionary of metadata. This metadata tells NumPy exactly where the raw data starts in memory, what shape it takes, and what data type it holds. When you call a NumPy function on a compatible object, NumPy reads those instructions and wraps its own array structure around the existing memory block. It does not create a new array; it just creates a new view of the old data. Here is the key insight. The original array interface was designed primarily for standard system memory. As data science moved to hardware accelerators, the ecosystem needed a way to share data living on GPUs or custom chips without routing it back through the CPU. This led to the adoption of DLPack. DLPack is a modern, open standard for sharing multidimensional arrays across different frameworks. It defines a stable structure that any library can produce and consume. If you have a tensor in a framework like PyTorch or JAX, you can export it using the DLPack protocol. NumPy can then seamlessly ingest it using its dedicated from dlpack function. While NumPy itself mostly operates on the CPU, its support for DLPack means it can act as the central routing hub. You can hand a DLPack object from a deep learning framework to NumPy, or from NumPy back to a framework, all without expensive data duplication. NumPy long ago stopped being just a math library. It is the invisible memory standard that prevents the Python data ecosystem from fragmenting into isolated, incompatible islands. I encourage you to explore the official documentation, try these zero copy conversions hands on in your own terminal, or visit devstories dot eu to suggest topics for our future series. That is all for this one. Thanks for listening, and keep building!