Back to catalog

Season 40 11 Episodes 40 min 2026

GeoPandas

v1.1 — 2026 Edition. An audio course covering GeoPandas 1.1, the powerful Python library for geospatial data. Learn how to handle geometric operations, manipulate spatial data, work with projections, and generate maps.

Geospatial Analysis Data Science

🌐 English 🇪🇸 Español 🇫🇷 Français 🇵🇹 Português 🇮🇹 Italiano 🇵🇱 Polski 🇩🇪 Deutsch 🇷🇴 Română

Now Playing

Click play to start

0:00

Meet GeoPandas: The Active Geometry Column

An introduction to the core data structures of GeoPandas: the GeoSeries and the GeoDataFrame. Learn how GeoPandas extends the familiar pandas library to handle geospatial objects and understand the critical concept of the active geometry column.

3m 34s

Reading and Writing: Fast I/O with Pyogrio

A deep dive into loading and saving spatial data. Discover how GeoPandas leverages the Pyogrio engine and Apache Arrow to drastically accelerate file I/O, plus how to use spatial and bounding-box filters during loading.

3m 47s

The Shape of the Earth: Projections and CRS

Understand Coordinate Reference Systems (CRS) and why they are vital for spatial accuracy. Learn the difference between geographic and projected coordinates, and how to safely transform your geometries using GeoPandas.

3m 32s

Shaping Space: Buffers, Centroids, and Convex Hulls

Discover how to generate entirely new geometries from existing ones. This episode covers essential constructive methods like calculating centroids, generating buffer zones, and drawing convex hulls.

3m 41s

Spatial Predicates: Intersects, Within, and Contains

Learn how to ask questions about the relationships between different shapes. We explore binary spatial predicates—like intersects, within, and contains—to test how geometries interact in space.

3m 46s

Turbocharge Queries: The R-Tree Spatial Index

Discover the secret engine behind GeoPandas' performance. This episode uncovers how the STR R-tree spatial index uses bounding boxes to drastically reduce the computational cost of spatial queries.

3m 48s

Merging Worlds: Spatial and Nearest Joins

Take data integration to the next level. Learn how to merge two separate GeoDataFrames based entirely on their spatial relationships using spatial joins (`sjoin`) and proximity joins (`sjoin_nearest`).

4m 01s

Set Operations: Creating Geometries with Overlays

Discover how to cut, merge, and split overlapping shapes. This episode covers the powerful `overlay` method, explaining how to compute intersections, unions, and differences to create entirely new geometries.

3m 34s

Spatial GroupBy: Aggregation with Dissolve

Learn how to group spatial data. We explore the `dissolve` method, which acts as a spatial GroupBy, merging smaller geometries into larger ones while seamlessly aggregating their tabular attributes.

3m 14s

Static Mapping: Building Choropleths and Plotting Layers

Turn your spatial data into compelling visuals. This episode covers GeoPandas' integration with matplotlib, teaching you how to build customized choropleth maps, overlay multiple datasets, and handle missing data.

3m 47s

Interactive Exploration and Beyond

Bring your maps to life. We look at the `explore()` method for creating interactive, web-based maps. Finally, we wrap up the GeoPandas journey and prepare you to start building real-world spatial applications.

3m 30s

Episodes

Meet GeoPandas: The Active Geometry Column

3m 34s

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 1 of 11. Traditional geographic systems usually force you to keep exactly one geometric shape per file, meaning if you need the border of a county and its center point, you must maintain two separate files. GeoPandas breaks this constraint, letting you store borders, center points, and buffered zones side-by-side in a single table. Meet GeoPandas: The Active Geometry Column. GeoPandas is an extension of the data analysis library pandas. It takes the familiar DataFrame and Series structures and adds spatial capabilities. Under the hood, GeoPandas acts as a bridge between tabular data and spatial geometry. It manages the table structure itself, while it delegates the actual mathematics of points, lines, and polygons to a spatial engine called Shapely. When you ask GeoPandas to calculate an area or find a boundary, it passes the relevant shapes to Shapely, retrieves the mathematical result, and aligns it right back into your data row. The two core data structures making this possible are the GeoSeries and the GeoDataFrame. A GeoSeries is a single column where every row holds a Shapely geometry object. A GeoDataFrame is a standard pandas DataFrame that contains at least one GeoSeries column. Because a single GeoDataFrame can hold multiple spatial columns at once, the system needs to know which one to target when you run a spatial command. This is managed through the active geometry column. Here is the key insight. The active geometry column is a functional state, not a specific text string. Users frequently confuse a column literally named 'geometry' with the underlying concept of the active geometry column. By default, when loading data, GeoPandas will look for a column named 'geometry' and assign it the active status. But your spatial columns can be named absolutely anything. What matters is which one holds the active status, because spatial methods are routed specifically to that active column. Consider a dataset of local counties. You have a column called county_borders storing the complex polygon outlines of each region. In the same table, you have another column called county_centroids storing a single dot in the middle of each region. If the borders column holds the active status, asking the dataframe to calculate the area will return the full square mileage of the county. Any distance calculation you run will measure from the outer edge of that county border. If you want to measure the distance between the center points of the counties instead, you change the spatial context. You call a method named set_geometry and provide the name of the centroids column. Immediately, county_centroids becomes the active geometry. The borders column remains perfectly intact, holding your polygons, but the system now treats it as just another column of data. If you run a distance calculation now, GeoPandas automatically targets the center points. You switch between spatial contexts instantly without merging tables or managing duplicate datasets. The most useful takeaway is that a GeoDataFrame is a spatial container capable of holding as many geometry layers as you need, but the active geometry dictates which spatial tool is currently engaged. If you enjoy the podcast and want to help support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

Reading and Writing: Fast I/O with Pyogrio

3m 47s

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 2 of 11. You are loading a five-gigabyte shapefile into memory, only to immediately drop ninety percent of the rows using standard pandas filters. Your RAM spikes, your script crawls, and you are wasting time. You can actually slice that spatial data before it ever leaves the hard drive. Today we are looking at Reading and Writing: Fast I/O with Pyogrio. The primary functions you will use to get data in and out of GeoPandas are read file and to file. Historically, GeoPandas used a library called Fiona under the hood. It worked, but it was slow. Modern GeoPandas defaults to Pyogrio. Pyogrio is a direct, highly optimized interface to GDAL, which is the core C library that powers almost all open-source geospatial software. Pyogrio is fast by default, but you can force it to be much faster. When you call read file or to file, you can pass an argument called use arrow, set to true. This tells Pyogrio to handle the data using Apache Arrow memory structures. Instead of reading the file and translating every single coordinate and attribute into a Python object one by one, the Arrow integration processes the data in large, memory-efficient batches. This bypasses the usual Python overhead entirely. If you are dealing with millions of records, turning on the Arrow flag turns a read or write operation that takes several minutes into one that takes a few seconds. But the most effective optimization is simply reading less data. A very common mistake is reading an entire massive dataset into a GeoDataFrame, and then using standard pandas indexing to filter it down afterward. This causes massive, completely unnecessary memory spikes. GeoPandas allows you to filter the data during the read file operation itself, which keeps your memory footprint flat. Let us say you have a file containing every building footprint in New York State, but you only care about the buildings around Coney Island. If you pass a bounding box tuple to the b box argument in read file, the underlying C engine checks the spatial index of the file on the disk. It completely ignores any building outside that defined box. You get a tiny DataFrame containing only what you need, and your RAM barely registers the operation. If a simple rectangular bounding box is not precise enough for your needs, you can use the mask argument. You pass a specific geometry, like a complex polygon representing exact neighborhood boundaries, directly into read file. The engine evaluates this shape and will only load rows that intersect your polygon. It is slightly more computationally expensive than a basic bounding box, but it is highly accurate. Spatial filtering covers the geography, but you can also filter the standard attributes. The read file function accepts a where parameter. This takes a standard SQL WHERE clause as a string. If you only want buildings tagged as residential, you pass a string stating the type column equals residential. GDAL parses this SQL statement and filters the rows at the C level before passing anything to Python. You can even combine a bounding box and a where clause in the exact same read call to slice both geography and attributes simultaneously. Here is the key insight. Pushing your filters down to the read operation means the heavy lifting happens in highly optimized C code, not in Python memory. The absolute fastest way to process spatial data is to never load the parts you do not need. Thanks for spending a few minutes with me. Until next time, take it easy.

The Shape of the Earth: Projections and CRS

3m 32s

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 3 of 11. You calculate the distance between two cities and the result is 2.4 instead of 200 miles. Your code ran perfectly, but your answer is completely useless. The culprit is how your spatial data maps to the physical world, which brings us to The Shape of the Earth: Projections and CRS. Every GeoSeries and GeoDataFrame has an attribute called dot C R S. This stores a pyproj C R S object. It is the metadata that tells GeoPandas exactly what the numbers in your geometry column actually represent. Without a coordinate reference system, a coordinate like negative 73 comma 40 is just a point on an infinite, abstract grid. With a C R S, it becomes a specific, known location on Earth. There are two main categories of coordinate systems you need to understand. Geographic coordinate systems represent the earth as a three-dimensional globe. Their coordinates are angles measured from the center of the earth, expressed in degrees of longitude and latitude. A very common example is EPSG 4326, which is the system used by global GPS. Projected coordinate systems, on the other hand, represent the earth flattened out onto a two-dimensional surface. Their coordinates use linear measurements, like meters or US Survey feet. Here is the key insight. The spatial operations under the hood of GeoPandas assume your data exists on a flat Cartesian plane. If your data is in a geographic coordinate system like EPSG 4326 and you ask GeoPandas to calculate the area of the New York boroughs, it will run the math treating degrees as if they were simple grid squares. You will get a result like 0.083. That means 0.083 square degrees, which is a meaningless metric. Degrees change their physical width depending on how far you are from the equator, so you cannot use them to measure absolute distance or area. To do real-world math, you must project your geographic data into a projected coordinate system. You do this using a method called to C R S. If you take that New York data and pass EPSG 2263 into the to C R S method, GeoPandas will mathematically transform every single coordinate in your geometry column. EPSG 2263 is a projected system specific to New York that measures distances in feet. Now, when you run that exact same area calculation, you get a result in millions of square feet, which is an actual, usable measurement. There is a common trap here. Developers often try to fix missing projections using the set C R S method instead of the to C R S method. Set C R S is not a conversion tool. It is only used when your spatial data was loaded missing a coordinate reference system entirely. It simply assigns the metadata, telling GeoPandas what the numbers in your geometry column already are, without altering the numbers themselves. You use to C R S when the data already has a valid C R S, and you want to mathematically convert those coordinates into a completely new system. If your area or distance calculations ever look absurdly small, it almost always means you are asking GeoPandas to do flat math on spherical degrees, and you need to project your data. That is it for today. Thanks for listening — go build something cool.

Shaping Space: Buffers, Centroids, and Convex Hulls

3m 41s

Discover how to generate entirely new geometries from existing ones. This episode covers essential constructive methods like calculating centroids, generating buffer zones, and drawing convex hulls.

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 4 of 11. Sometimes the shape you need to analyze is not the shape you were given. You might import a list of store locations as single points, but what you actually need to understand is the delivery area surrounding them. This requires mathematically molding points and lines into dynamic boundaries using Shaping Space: Buffers, Centroids, and Convex Hulls. These constructive geometric manipulations act like the molding clay of spatial data. You start with raw geometries and use these built-in methods to compute entirely new shapes tailored to your analysis. In GeoPandas, these methods apply element-wise across the entire active GeoSeries. If you have ten thousand shapes, one command generates ten thousand new ones. All of this relies on the Shapely library under the hood to perform the geometric math. The most common manipulation is the buffer method. A buffer creates a polygon representing all points within a given distance from your original shape. Take a scenario where you have point geometries representing coffee shops. You want to define a two-kilometer delivery catchment area for each location. You call the buffer method on your points and pass in your distance. GeoPandas instantly draws a circle around every single shop, transforming your point dataset into a polygon dataset. Pay attention to this bit. Performing a buffer operation requires a projected Coordinate Reference System. If your geometries are in a geographic format like latitude and longitude, a distance value of ten means ten degrees, not ten meters. Because degrees of longitude shrink in physical width as you move away from the equator, buffering unprojected data results in heavily distorted, stretched ovals instead of uniform circles. Always project your data into a metric system before computing distances. Sometimes you need to collapse a shape instead of expanding it. If you have complex polygons representing neighborhood zones and you need to reduce them to single points for distance calculations or map labels, you use the centroid attribute. The centroid calculates the mathematical center of mass for each geometry, returning a perfect central point for every polygon or line in your dataset. To isolate the edges of a shape, you use the boundary attribute. If you have a polygon representing a county, the boundary strips away the entire interior. It returns a lower-dimensional geometry, transforming the polygon into a set of lines that just outline the county. If you call boundary on a line, it returns the single points at each end of that line. This is where it gets interesting. You also have the convex hull attribute. Picture a scattered collection of points representing individual animal sightings in a forest. If you were to stretch a rubber band completely around the outermost points and let it snap tight, the shape that rubber band forms is the convex hull. It returns the smallest possible convex polygon that encloses the entire geometry. It is an incredibly fast way to compute the overall physical footprint of a dispersed set of coordinates. Constructive geometry means you are never stuck with the spatial boundaries you imported; you can always compute the exact zones your analysis actually demands. Thanks for spending a few minutes with me. Until next time, take it easy.

Spatial Predicates: Intersects, Within, and Contains

3m 46s

Learn how to ask questions about the relationships between different shapes. We explore binary spatial predicates—like intersects, within, and contains—to test how geometries interact in space.

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 5 of 11. Asking whether a specific GPS coordinate sits inside a complex, jagged city boundary sounds like a heavy computational math problem involving ray-casting. But in this library, it is handled by a single line of boolean logic. We are talking about Spatial Predicates: Intersects, Within, and Contains. If you use standard pandas, you are already comfortable with conditional filtering. You take a dataframe column, check if its values are greater than ten, and you get back a series of True and False values. You then pass that boolean series right back into your dataframe brackets to filter your rows. Spatial predicates do the exact same thing. Instead of filtering by a number, you filter by spatial truth. You are comparing shapes. A spatial predicate is a method that tests the topological relationship between an entire GeoSeries and one single reference geometry. When you call a predicate method on your active geometry column, GeoPandas evaluates every single row against that reference shape. It calculates the geometry math under the hood and hands you back a standard pandas boolean Series. The most common relationship test is intersects. When you call the intersects method, it returns True if the boundary or the interior of a geometry in your dataset touches or overlaps the reference geometry in any way. If two polygons share a single point on their outer edge, they intersect. If a line segment crosses a polygon, they intersect. It is a broad catch-all for any shared physical space. Next, we have the methods contains and within. People frequently mix these up because they are inverse operations. The logic is strictly directional. If Polygon A is a large city boundary and Point B is a coffee shop, the city contains the coffee shop. The coffee shop is within the city. If Polygon A contains Point B, then Point B is within Polygon A. You use the contains method when your GeoSeries holds large bounding geometries, and you are passing in a smaller reference shape. It returns True only if the reference shape is completely enclosed by the geometry in the row. Conversely, you call the within method when your GeoSeries holds the smaller items, like thousands of individual coordinate points, and you want to test if they fall entirely inside a single larger reference polygon. Pay attention to this bit. Because these methods return standard boolean masks, you can chain them directly into your data pipelines. Let us say you have a dataframe of city boroughs. You have applied a buffer to expand their boundaries. Now you want to know which of these expanded, buffered boroughs overlap with the original, unbuffered polygon of Brooklyn. First, you isolate the single original Brooklyn polygon to act as your reference geometry. Then, you take your dataframe of buffered boroughs and call the intersects method on its geometry column, passing in the Brooklyn shape. GeoPandas evaluates every row. It returns True for the buffered boroughs that touch the Brooklyn polygon, and False for the ones that do not. You place that resulting True and False series directly inside your dataframe selection brackets. The dataframe instantly drops the False rows. You are left with a geographically filtered dataset, achieved entirely through standard tabular data operations. By treating physical space relationships as simple true or false questions, you bridge the gap between complex cartography and basic dataframe logic. That is all for this one. Thanks for listening, and keep building!

Turbocharge Queries: The R-Tree Spatial Index

3m 48s

Discover the secret engine behind GeoPandas' performance. This episode uncovers how the STR R-tree spatial index uses bounding boxes to drastically reduce the computational cost of spatial queries.

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 6 of 11. Your geospatial queries are taking hours to run, freezing your machine and draining your patience. You are probably making your CPU check every single point against every single boundary, which scales terribly. The solution to this bottleneck is Turbocharge Queries: The R-Tree Spatial Index. You have one million GPS points and fifty neighborhood polygons. If you test every point against every polygon to see which neighborhood it belongs to, you execute fifty million complex geometric calculations. Finding if a coordinate rests inside an irregular polygon is a heavy math operation. Doing this as a full scan creates an exponential scaling problem that destroys performance. A spatial index solves this by acting like a table of contents for your map. GeoPandas uses a specific structure called an R-tree, built using the Sort-Tile-Recursive algorithm. The R stands for rectangle. The tree groups objects logically based on their location. The key insight here is the pre-filter. Instead of checking the exact, jagged borders of a neighborhood polygon, the index draws a simple rectangular box around it. This is called an envelope or a bounding box. Checking if a coordinate falls inside a basic rectangle requires almost zero computational effort. The index immediately discards any point that is not even inside a neighborhood rectangular bounding box. You instantly narrow down the candidates from one million to maybe a few thousand. This brings up a critical two-step reality of spatial querying. A hit on the spatial index only means the bounding boxes intersect. A point might sit inside the rectangle but just outside the actual curved boundary of the neighborhood. The bounding box hit is just step one. Step two is the exact geometry check on that smaller subset of candidates. You access this engine through the sindex dot query method. If you pass a single geometry, it returns an array of integer indices corresponding to the geometries in your GeoSeries whose bounding boxes intersect your input. To force the index to handle that crucial second step, pass the argument predicate equal to intersects. GeoPandas will then use the cheap bounding box check to find candidates, and automatically run the expensive exact geometry check on just the survivors. You can also pass an entire array of geometries into the query at once. This array query returns a two-dimensional array of indices, pairing up the matches between your input array and the indexed GeoSeries. The first row gives the index of your input geometry, and the second row gives the index of the matching geometry in the tree. Sometimes you do not need an intersection, you just need proximity. The sindex dot nearest method takes an input geometry and returns the index of the closest geometry in the tree. This is highly efficient for snapping a stray coordinate to a road network or finding the closest weather station without calculating the distance to every single station on the continent. Complex spatial operations are mathematically expensive. Never force your CPU to calculate exact geometric intersections when a simple rectangle check can eliminate ninety-nine percent of the candidates upfront. If you find these episodes helpful and want to support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

Merging Worlds: Spatial and Nearest Joins

4m 01s

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 7 of 11. How do you merge a list of restaurants with a list of census tracts when they do not share any ID columns? You let their coordinate locations do the matching. Today we are talking about Merging Worlds: Spatial and Nearest Joins. A spatial join is the geographic equivalent of a SQL join. Instead of linking tables on a shared string or integer ID, you join them based on their physical relationship in space. You use the sjoin function for this. Consider a scenario where you have a GeoDataFrame of grocery store points and a separate GeoDataFrame of Chicago community polygons. You want to attach the community name to each grocery store row. To do this, you call sjoin on the grocery stores dataframe and pass the communities dataframe as the right-hand argument. The function relies on two main arguments. The first is how. This works exactly like standard database joins. An inner join keeps only the stores that fall inside a community. A left join keeps all your grocery stores, appending null values if a store happens to sit outside the community boundaries. A right join keeps all the communities, duplicating them if they contain multiple stores, and keeping them with empty store data if they have none. The second argument is predicate. This defines the spatial condition that must be met for a match. The default is intersects, meaning the geometries touch or overlap in any way. You can also use within, ensuring a point is strictly inside a polygon, or contains, if you are checking whether a polygon fully encloses a point. Here is the key insight. When performing a spatial join, you are combining two tables with geometry columns, but the resulting GeoDataFrame can only have one active geometry. By default, sjoin retains the geometry of the left dataframe and drops the geometry of the right dataframe, while keeping all of its standard attribute columns. If you join grocery store points on the left to community polygons on the right, your output will be a table of points that now includes community names. If you actually need the resulting table to contain the polygon shapes, you must reverse your join order. Make the communities the left table and the stores the right table. Sometimes your datasets do not overlap at all. If you want to find the closest subway station to each grocery store, intersections will not help. For this, you use the sjoin nearest function. It operates similarly to a standard spatial join but matches geometries based on proximity rather than intersection. You can pass a distance column argument as a string to sjoin nearest. This tells the function to append a new column to your results containing the exact computed distance between the matched items. You can also provide a max distance threshold. This restricts the join so it only matches if the nearest neighbor is within a specified radius, preventing you from linking a store to a station on the other side of the city simply because it happens to be the closest one available. Under the hood, both of these functions rely heavily on a spatial index. They do not calculate the distance or intersection between every single point and every single polygon, which would take massive amounts of time. They use the index to evaluate bounding boxes first, rapidly discarding geometries that are nowhere near each other before performing the heavy mathematical calculations. Treating physical location as your ultimate foreign key allows you to bridge datasets that otherwise have absolutely nothing in common. That is all for this one. Thanks for listening, and keep building!

Set Operations: Creating Geometries with Overlays

3m 34s

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 8 of 11. A spatial join tells you if a flood zone hits a property. But it leaves the property boundary completely intact in your dataset. If you need a brand new polygon showing exactly which part of the property is underwater, a join is not enough. You need Set Operations, specifically creating geometries with overlays. It is common to confuse spatial joins, using the sjoin method, with overlays. Here is the distinction. A spatial join tests for a relationship, appending attributes from one layer to another while keeping the original geometry exactly the same. An overlay physically slices the geometries. Think of it like a cookie cutter. You have two overlapping sheets of rolled-out dough, representing two different GeoDataFrames. An overlay presses down through both layers, cutting them against each other. The result is a completely new set of puzzle pieces. Wherever those two layers overlapped, the new piece inherits the data attributes of both original layers. You perform this using the overlay method on a GeoDataFrame, passing in a second GeoDataFrame and specifying the operation type using the how parameter. There are five logic types you can pass to the how parameter. The first is intersection. This returns only the exact geographic areas where the two layers overlap. Any part of the geometries that does not overlap is discarded. The second is union. A union returns everything. It merges both layers into a single GeoDataFrame, but it slices the geometries anywhere they cross. You get pieces representing just layer one, pieces representing just layer two, and pieces representing the overlap. The third is symmetric difference. This is the exact opposite of an intersection. It returns the areas that belong to layer one or layer two, but it specifically cuts out and discards the areas where they overlap. The fourth is difference. This keeps the geometries of your first layer, but subtracts the areas covered by the second layer. It is like taking a bite out of your first shape using the second geometry as teeth. The fifth is identity. This one is highly specific. It keeps the outer boundaries of your first layer entirely intact, but it splits the inside wherever the second layer intersects it. The overlapping slices get the attributes of the second layer, while the rest of the first layer stays as it was. To see why this matters, take a city planner evaluating grocery access. They have a layer of neighborhood boundaries and a layer of grocery store points. First, they buffer the grocery points into one-kilometer polygons. If they ran a standard spatial join between the neighborhoods and the buffers, they would merely flag which neighborhoods contain a buffer. But by calling the overlay method with the how parameter set to intersection, they physically cut the neighborhood shapes using the buffer shapes. The output is a new GeoDataFrame containing only the exact footprint of land within one kilometer of a store, clipped perfectly to the neighborhood borders. Here is the key insight. Overlays calculate new intersection nodes for every crossing boundary, which makes them computationally expensive. Do not use an overlay just to check for overlapping attributes, use it only when you genuinely need to generate entirely new geometric boundaries from the collision of two layers. That is all for this one. Thanks for listening, and keep building!

Spatial GroupBy: Aggregation with Dissolve

3m 14s

Learn how to group spatial data. We explore the `dissolve` method, which acts as a spatial GroupBy, merging smaller geometries into larger ones while seamlessly aggregating their tabular attributes.

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 9 of 11. You have a spatial dataset of over three thousand counties, but your analysis requires a clean map of the fifty states. You do not need to pause your work and hunt for a new dataset online, you just need a way to melt the internal boundaries away. We solve this using Spatial GroupBy, specifically a method called Dissolve. Data is often delivered at a very granular level. You might have city blocks when you need neighborhoods, or census tracts when you need entire municipalities. Dissolve is your tool for moving up the geographic hierarchy. It takes many smaller shapes, merges them into larger ones based on a shared attribute, and aggregates their underlying data. If you are familiar with standard data analysis, you can think of dissolve as a spatial group-by operation. Let us look at a concrete scenario. You have a spatial dataframe of Nepal broken down into tiny districts. Your dataset has the polygon geometry for each district, a population count, and a text column indicating the larger administrative zone that the district belongs to. You want a map showing just the zones. You call the dissolve method on your spatial dataframe and provide the zone column as your grouping target. GeoPandas then performs two distinct operations simultaneously. First, it handles the spatial geometries. It groups all the district rows by their zone name, takes their geometries, and unions them together into a single feature. The internal district borders are erased, leaving you with one continuous outer boundary for the new zone. Second, it has to decide what to do with the tabular data attached to those shapes, like your population column. Here is the key insight. Many users run the dissolve method, look at their new zone map, and realize the population counts are drastically wrong. This is not a geometry error. By default, the dissolve method handles tabular data by simply taking the value from the very first row it encounters in each group. It ignores the rest. To aggregate numeric quantities correctly, you must explicitly use the aggregate function parameter. When you call dissolve grouping by zone, you also pass the aggregate function parameter set to sum. Now, as GeoPandas merges the district shapes physically, it also adds up the district population numbers mathematically. The resulting zone row will contain the correct, aggregated total population. This parameter accepts other standard statistical functions as well. If your granular data contained average household income, you might pass mean instead of sum. If it contained elevation measurements, you might pass max to find the highest point in the newly merged region. Dissolve combines spatial unioning with tabular aggregation into one synchronized step, giving you complete control over the geographic scale of your data. Thanks for listening, happy coding everyone!

Static Mapping: Building Choropleths and Plotting Layers

3m 47s

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 10 of 11. You do not need to export your spatial data to heavy desktop GIS software just to see what it looks like. You can generate publication-ready visuals right inside your Python environment. The solution is Static Mapping: Building Choropleths and Plotting Layers. The core of visualization in GeoPandas is the plot method. Every GeoDataFrame has it. Calling plot with no arguments immediately draws your geometry. It is effortless, but under the hood, it wraps matplotlib. This means you get a map in one line of code, but you still have the full power of matplotlib available to tweak colors, axes, and styles. A basic plot just draws shapes. To make a choropleth map—where shapes are colored based on data values—you use the column argument. You pass the name of the column containing your data. GeoPandas will map the values in that column to a color scale. To help viewers understand that scale, you can add a legend by passing legend equals True. By default, plotting a continuous numerical column creates a smooth color gradient. Often, it is better to group your data into distinct bins. GeoPandas integrates with a library called mapclassify to do this. By adding the scheme argument to your plot method, you can sort your data into classes. For example, setting scheme to quantiles splits your geometries into equal-sized groups based on their values, making spatial patterns much easier to read. Real-world datasets often have holes. If you are building a choropleth and some rows are missing the data value you are plotting, GeoPandas will drop them from the map entirely. This leaves awkward blank spaces. To fix this, you use the missing keywords argument. You pass a dictionary of styling options, like setting the color to light gray, so those shapes still appear on the map without confusing the data. Now, the second piece of this is overlaying multiple datasets. Most useful maps combine multiple layers. Suppose you have a base layer of city neighborhoods and you want to overlay a scatter plot of grocery store locations. You do this by sharing a matplotlib axis object. First, you create an axis. Then, you plot your neighborhood polygons, passing that axis into the plot method. Next, you plot your grocery store points, passing that exact same axis. Both datasets draw onto the same canvas. To control which layer sits on top, you use the z order argument. A lower z order goes to the bottom. So, you give your neighborhoods a z order of one, and your grocery stores a z order of two. The points will render cleanly over the polygons. Pay attention to this bit. Sometimes you want your base polygons to be completely transparent so you only see their borders. If you set the face color argument to None, as a Python object without quotes, matplotlib ignores it and applies a default fill color. You have to set face color to the string word none to make it transparent. Alternatively, you can bypass this entirely by calling the boundary property on your GeoDataFrame and plotting that instead. It is a much safer approach for drawing just the outlines. The true strength of mapping in GeoPandas is its seamless escalation from quick data checks to publication-ready graphics. You get an immediate visual with a single method call, but never lose the underlying control of matplotlib when you need to layer complex spatial stories. That is all for this one. Thanks for listening, and keep building!

Interactive Exploration and Beyond

3m 30s

Download

Hi, this is Alex from DEV STORIES DOT EU. GeoPandas, episode 11 of 11. Static maps are great for a final presentation, but when you are deep in the analysis phase, a flat picture is incredibly limiting. You need to zoom into a specific neighborhood, pan across a river, and hover over a shape to inspect the raw data hiding underneath. Interactive Exploration and Beyond is exactly what this episode covers. In previous episodes, we used the plot method to draw static images of our geometries. GeoPandas also provides an alternative called explore. When you call explore on a GeoDataFrame, it generates a fully interactive web map right inside your environment. Under the hood, it uses a Python library called Folium, which itself is built on the popular Leaflet JavaScript mapping library. The beauty of the explore method is that its application programming interface perfectly mimics the plot method. You do not have to learn a completely new set of arguments to switch from a static image to an interactive map. Take the New York boroughs dataset. You call explore on the dataset and pass the column argument set to area. The output is immediate and tactile. A map appears showing the city, and you can use your mouse to pan around and scroll to zoom into specific streets. Here is the key insight. When you hover your cursor over Brooklyn or Queens, a tooltip automatically pops up. This tooltip displays the underlying tabular data for that specific geometry, including the exact area value you told it to color the map by. You get visual context and raw numbers at the exact same time. There is a technical detail you must keep in mind. You need to understand what the explore method actually returns. The plot method generates a lightweight image file. The explore method returns a heavy object packed with HTML and JavaScript. This is brilliant when you are working inside a Jupyter Notebook, because the browser renders the interactive map flawlessly. But if your final goal is to generate a static PDF report or a simple printout, explore is the wrong tool. The interactive web elements will simply not translate to paper. Use explore to investigate your data, and switch back to plot when you need to publish a static document. This brings us to the real power of the GeoPandas framework. Throughout this series, we have seen how it acts as a cohesive glue for the Python spatial ecosystem. It takes pandas and gives it spatial awareness. It delegates the heavy geometric math to Shapely. It relies on Pyogrio to read and write files at high speeds. Finally, it hooks into visualization libraries to give you instant feedback. You can load a massive dataset, filter it, calculate distances, perform spatial joins, and map the results on an interactive canvas, all in just a few lines of Python. The official GeoPandas documentation is excellent, and reading through their getting started guides is the best way to solidify what you have learned. Open a notebook, load some data you care about, and start experimenting. If you have an idea for a topic we should cover, drop by devstories.eu and let us know. The true advantage of this framework is that it strips away the complexity of spatial math, letting you treat geography not as a hurdle, but as just another data type to filter, join, and analyze. That is all for this one. Thanks for listening, and keep building!