Rasterio: Numpy for Geospatial Data

v1.5 — 2026 Edition. A 5-episode short audio course that explores Rasterio 1.5. Learn how to bridge the gap between complex geospatial data and Python, process terabytes of raster data with memory-safe windows, and seamlessly write back new datasets.

Geospatial Analysis Raster Data

🌐 English 🇪🇸 Español 🇫🇷 Français 🇵🇹 Português 🇮🇹 Italiano 🇵🇱 Polski 🇩🇪 Deutsch 🇷🇴 Română

Now Playing

Click play to start

0:00

Numpy for Maps

Discover how Rasterio bridges the gap between complex geospatial data and Python. We cover opening a dataset, inspecting basic metadata, and reading raster bands directly into Numpy arrays.

4m 02s

The Affine Transform

Learn how a grid of pixels maps to the real world. We break down Coordinate Reference Systems (CRS) and the Affine transformation matrix used to translate array indices into geographic coordinates.

3m 27s

Windowed Processing

Process terabytes of raster data without crashing your computer. We explore windowed reading and writing to handle massive datasets in small, memory-safe chunks.

3m 48s

Masking with Vectors

Bridge the gap between vector and raster data. Learn how to use shapefiles and polygons to dynamically crop and mask larger raster datasets to your exact area of interest.

4m 03s

Resampling and Writing

Change your data's resolution and save the results. We cover upsampling, downsampling, updating the affine transform, and writing the final dataset back to disk.

4m 20s

Episodes

Numpy for Maps

4m 02s

Discover how Rasterio bridges the gap between complex geospatial data and Python. We cover opening a dataset, inspecting basic metadata, and reading raster bands directly into Numpy arrays.

Download

Hi, this is Alex from DEV STORIES DOT EU. Rasterio: Numpy for Geospatial Data, episode 1 of 5. You want to extract pixel values from a satellite image, but standard image libraries choke on geospatial formats, and the official bindings feel like writing C code in Python. That tension is exactly why Rasterio exists. Rasterio is a library that reads and writes grid-based geospatial data, like GeoTIFF files. It is highly likely you have heard of GDAL, the industry-standard geospatial library. A common misconception is that Rasterio replaces GDAL. It does not. Rasterio actually uses GDAL under the hood. The difference is the interface. If you have ever used the native GDAL Python bindings, you know they expose a clunky C API full of manual resource management and syntax that feels entirely foreign to Python. Rasterio leaves that behind. It provides a modern, Python-first interface. It treats raster datasets like normal files and handles their pixels like standard Numpy arrays. Suppose you have a GeoTIFF of Landsat imagery on your disk. You open this file using the rasterio dot open function. Because Rasterio embraces standard Python patterns, you do this using a with statement. This creates a context manager. Just like opening a basic text file, the with block ensures that the dataset is cleanly closed and memory is freed the moment your code finishes running. You never have to manually destroy objects or worry about memory leaks from unclosed files. Inside that with block, the open function gives you a dataset reader object. Before reading any actual pixels, you can inspect the file. You can ask the dataset for its count, which tells you the total number of bands, or layers, in the image. For a Landsat image, this count might be something like seven or eleven, representing different light wavelengths. You can read the width and height properties to get the spatial dimensions in pixels. You can also check the dtypes property. This gives you the data type of the pixels for each band, such as unsigned 16-bit integers or 32-bit floats. All of this metadata is available instantly, without loading the heavy image data into your system memory. Here is the key insight. When you are ready to look at the actual pixels, you call the read method on the dataset object. You can pass a specific index to load just one layer. Pay attention to this bit: Rasterio bands are one-indexed, meaning you ask for band one, not band zero. If you call read and pass the number one, the method returns those pixels as a standard two-dimensional Numpy array. If you call read without any arguments, it reads everything, returning a three-dimensional Numpy array containing all the bands. There is no special, proprietary pixel object. You just get an array of numbers. Once you have that Numpy array, you are back in familiar territory. You can slice it, run statistical operations, or feed it directly into machine learning models using the exact same code you use for any other matrix. Rasterio acts as a bridge. It handles the complicated file formats and metadata on disk, and hands you a clean mathematical object in memory. If you enjoy the show and want to support what we do, you can search for DevStoriesEU on Patreon — it is a huge help. The true power of Rasterio is not that it reinvents geospatial processing, but that it makes it boring; by turning complex satellite imagery into standard Numpy arrays, it allows you to stop fighting GIS formats and start doing actual data science. Thanks for listening, happy coding everyone!

The Affine Transform

3m 27s

Learn how a grid of pixels maps to the real world. We break down Coordinate Reference Systems (CRS) and the Affine transformation matrix used to translate array indices into geographic coordinates.

Download

Hi, this is Alex from DEV STORIES DOT EU. Rasterio: Numpy for Geospatial Data, episode 2 of 5. A basic Numpy array is just a flat grid of numbers. If you visualize it, you simply get a colored rectangle on your monitor. What actually turns those abstract numbers into a precise map of the Colorado Plateau? The answer is the Affine Transform. People often assume geospatial data is just an image where the corners are loosely pinned to map coordinates. That is not the case. The transform is a strict mathematical link that calculates the precise real-world position of every single pixel in your array. Before you can map pixels to the Earth, you have to know how you are measuring the Earth. This is the Coordinate Reference System. You read this from the dataset by accessing dataset dot crs. This property tells Rasterio whether your spatial coordinates are measured in degrees of longitude and latitude, or in meters relative to a specific equator and prime meridian. Once the reference system is set, you need the mapping logic. You find this by checking dataset dot transform. This yields an affine transformation matrix. That sounds like heavy linear algebra, but it is effectively a set of six numbers that describe the physical footprint of your grid. The matrix defines the exact spatial x and y coordinates of the upper left corner of the top left pixel. It defines the physical width of one pixel, for example, exactly thirty meters. It defines the physical height of one pixel, which is usually negative because image rows count down while map coordinates count up. The matrix also holds rotation values in case the satellite was angled when the image was captured. Here is the key insight. You never have to multiply these matrix values by hand. Rasterio gives you direct methods to jump back and forth between pixel space and coordinate space. Suppose you are analyzing an image and you find an anomaly at a specific location, say row five hundred and column one thousand. To find out where that is on the actual planet, you pass those two integers into the dataset dot xy method. Rasterio pushes the row and column through the affine matrix and returns the exact spatial x and y coordinates for the center point of that specific pixel. You can also run this machine in reverse. Let us say you have the exact spatial coordinates for a physical location. Maybe it is a canyon located exactly one hundred kilometers east and fifty kilometers south of your image origin. You need to extract the underlying pixel data for that canyon. You take your spatial x and y values and pass them into the dataset dot index method. Rasterio inverts the affine matrix, processes your coordinates, and hands back the exact row and column integers. You then use those integers to slice into your Numpy array and read the exact measurement recorded by the satellite. The affine transform isolates your raw data array from your spatial geometry, guaranteeing that no matter how you crop or read your data, you never lose your precise position on the globe. That is it for today. Thanks for listening — go build something cool.

Windowed Processing

3m 48s

Process terabytes of raster data without crashing your computer. We explore windowed reading and writing to handle massive datasets in small, memory-safe chunks.

Download

Hi, this is Alex from DEV STORIES DOT EU. Rasterio: Numpy for Geospatial Data, episode 3 of 5. If you try to read a ten-gigabyte GeoTIFF directly into your laptop memory, your script will almost certainly crash. You might only need a tiny patch of pixels to test an algorithm, but the system tries to swallow the whole continent at once. The solution to this memory limit is Windowed Processing. Windowed processing lets you read or write specific rectangular subsets of a raster file while the rest of the file stays safely on your hard drive. This is entirely index-based. We are not dealing with geographic coordinates today. We are dealing strictly with pixel rows and pixel columns. To read a subset, you need a way to tell Rasterio which pixels to grab. You do this by importing the Window object from the rasterio windows module. The standard way to define a Window is by providing four numbers: column offset, row offset, width, and height. Pay attention to this bit. The order of arguments matters. It requires the column offset first, then the row offset. The column offset is your starting horizontal position, measuring from the left edge. The row offset is your starting vertical position, measuring down from the top edge. After the offsets, you specify how many pixels wide and how many pixels high your selection should be. Consider a massive gigabyte-sized test file. You want to extract a small tile that is 256 pixels wide and 512 pixels high. You want this tile to start one thousand pixels in from the left and two thousand pixels down from the top. You instantiate your Window by passing one thousand for the column offset, two thousand for the row offset, 256 for the width, and 512 for the height. Next, you open your dataset using the standard open function. When you call the read method, you do not leave the arguments empty. Empty arguments tell Rasterio to load the entire file into memory. Instead, you assign your new Window object to the window keyword argument inside the read method. Rasterio translates that window into byte offsets on the disk, fetches only that specific 256 by 512 chunk, and returns it as a numpy array. Your memory usage barely moves. If you work with numpy all day, passing offsets and dimensions might feel slightly unnatural. You are probably used to defining slices with start and stop indices for your rows and columns. Rasterio supports this workflow with an alternative method called Window dot from slices. This method takes two inputs. The first input is a tuple defining the row start index and row stop index. The second input is a tuple defining the column start index and column stop index. Notice that the order here is rows first, then columns. This mirrors exactly how you slice a two-dimensional numpy array. Under the hood, this method calculates the offsets and dimensions for you and returns a standard Window object that you can pass to the read method just like before. By enforcing windowed reads across your data pipelines, you permanently decouple the physical size of your input files from the hardware limits of your machine. That is all for this one. Thanks for listening, and keep building!

Masking with Vectors

4m 03s

Bridge the gap between vector and raster data. Learn how to use shapefiles and polygons to dynamically crop and mask larger raster datasets to your exact area of interest.

Download

Hi, this is Alex from DEV STORIES DOT EU. Rasterio: Numpy for Geospatial Data, episode 4 of 5. You just downloaded a massive satellite image covering an entire state, but you only care about the pixels inside one specific farm boundary. Loading the whole grid wastes memory, and reading by pixel offsets is useless when your boundary is an irregular polygon shape. The solution is Masking with Vectors. In a previous episode, we looked at windowed reading. That approach required exact pixel indices, like asking for row fifty and column one hundred. Masking is entirely different. Masking relies on real-world geographic coordinates. Instead of counting pixels, you provide a vector geometry, like a polygon defined by real longitudes and latitudes. Rasterio figures out which pixels fall inside that shape and handles the complex translation between the continuous vector coordinate space and the discrete raster grid. The core tool for this workflow is the mask function, located in the rasterio dot mask module. You pass it two main arguments. First, an open raster dataset. Second, an iterable of geometric shapes. These shapes must be formatted as dictionaries matching the GeoJSON specification. Typically, you read a shapefile containing your farm polygon, extract the geometry dictionary for that specific feature, place it in a standard Python list, and hand it to the mask function. Here is the key insight. The mask function has two distinct behaviors controlled by a single boolean parameter called crop. You need to understand the difference between simply masking a dataset and actually cropping it. If you run the function with crop set to false, Rasterio looks at your farm polygon and sets every pixel outside that boundary to a nodata value. It reads the default nodata value directly from the original raster metadata. The output array you receive has the exact same dimensions as your massive original state-wide image. The pixels inside the farm retain their actual values, but everything outside is blanked out. The overall spatial extent of the file does not change. If you set crop to true, the behavior shifts. Rasterio still sets the pixels outside your polygon to nodata. However, it then discards the vast areas of empty space. It calculates the rectangular bounding box of your irregular farm polygon and shrinks the returned array down to fit exactly within that specific box. Your giant state-wide array is physically reduced to a tiny, manageable grid containing just the farm and a small amount of nodata padding around its irregular edges. Because cropping physically changes the dimensions of the grid, it also changes the geographic starting point. The top-left corner is no longer the top-left corner of the state. It is now the top-left corner of the farm bounding box. Therefore, the mask function returns a tuple containing two items. The first item is your newly cropped numpy array. The second item is a brand new affine transform object. This updated transform holds the new origin coordinates. If you want to save this newly cropped array to a new file, you must pass this updated transform into your write profile. If you reuse the original state-wide transform by mistake, your tiny farm image will be stretched back across the entire state. The real value of the mask function is that it handles all this spatial recalculation for you automatically. Hope that was useful. Thanks for listening, and enjoy the rest of your day.

Resampling and Writing

4m 20s

Change your data's resolution and save the results. We cover upsampling, downsampling, updating the affine transform, and writing the final dataset back to disk.

Download

Hi, this is Alex from DEV STORIES DOT EU. Rasterio: Numpy for Geospatial Data, episode 5 of 5. Sometimes you have beautiful, high-resolution satellite imagery, but your climate model only accepts coarse, low-resolution inputs. You cannot just slice the array, because the resulting pixels still need to represent the exact same physical area, just in a lower-density grid. To do that correctly, you need Resampling and Writing. This is not about reprojection. The coordinate reference system stays exactly the same. We are only changing the resolution. Let us say you need to downsample a raster by half. In Rasterio, you perform the resampling at the exact moment you read the data into memory. When you call the read method on your dataset, you pass a parameter called out shape. Instead of reading the full array, Rasterio reads it and immediately scales it to fit the shape you requested. For a single-band raster, your out shape would be a tuple specifying one band, then half the original height, and half the original width. You also need to tell Rasterio how to calculate the values of these new, larger pixels. You do this by passing a resampling parameter. Rasterio provides a Resampling enum for this. If you are dealing with continuous data like temperature, you might pass Resampling dot bilinear, which calculates a weighted average of the surrounding pixels. If you are dealing with categorical data like land cover, you would use Resampling dot nearest to preserve the exact original class values. Here is the key insight. You now have a new, smaller Numpy array, but it still represents the exact same geographic area on the Earth. Because the array has fewer pixels, each individual pixel covers more physical space. This means your original Affine transform is now completely wrong. If you write the new array to disk using the old transform, your image will shrink on the map to cover only a quarter of its original area. You must scale the transform. You do this by taking the original dataset transform and multiplying it by an Affine scaling matrix. You find the scaling ratios by dividing the original width by your new width, and the original height by your new height. Since we halved the dimensions, both of these ratios are exactly two. When you multiply the original transform by these scaling factors, you get a new transform where the pixel dimensions are doubled, but the top-left origin point remains perfectly anchored. Now you have your resampled array and your scaled transform. The final step is writing a valid GeoTIFF to disk. To do this, you need metadata. The safest approach is to copy the profile property from your source dataset. This gives you a dictionary containing the file driver, the coordinate reference system, the data type, and the nodata value. You then update this profile dictionary with your new height, your new width, and your newly calculated transform. With the updated profile ready, you call rasterio dot open. You provide the new filename, set the mode to write, and unpack your profile dictionary as keyword arguments. This creates an empty dataset on disk formatted to your exact specifications. Finally, you call the write method on this new dataset object and pass in your downsampled Numpy array. The array is written to the file, and your new lower-resolution GeoTIFF is ready for the climate model. Remember, modifying raster data is always a two-part contract: if you change the shape of the Numpy array, you must explicitly change the Affine transform to match, or your spatial footprint is broken. That brings us to the end of our series. I encourage you to read the official Rasterio documentation to explore other resampling algorithms and try building these pipelines hands-on. If you want to suggest topics for a future series, visit devstories dot eu. I would like to take a moment to thank you for listening — it helps us a lot. Have a great one!