Blog d’Anita Graser

https://anitagraser.com

  • 29 mars 2025The quest for a fair TimeGPT benchmark
    At the end of yesterday’s TimeGPT for mobility post, we concluded that TimeGPT’s trainingset probably included a copy of the popular BikeNYC timeseries dataset and that, therefore, we were not looking at a fair comparison. Naturally, it’s hard to find mobility timeseries datasets online that can be publicized but haven’t been widely disseminated and therefore may have slipped past the scrapers of foundation models builders. So I scoured the Austrian open government data portal and came up with a bike-share dataset from Vienna. Dataset SharedMobility.ai dataset published by Philipp Naderer-Puiu, covering 2019-05-05 to 2019-12-31. Here are eight of the 120 stations in the dataset. I’ve resampled the number of available bicycles to the maximum hourly value and made a cutoff mid August (before a larger data collection cap and the less busy autumn and winter seasons): Models To benchmark TimeGPT, I computed different baseline predictions. I used statsforecast’s HistoricAverage, SeasonalNaive, and AutoARIMA models and computed predictions for horizons of 1 hour, 12 hours, and 24 hours. Here are examples of the 12-hour predictions: We can see how Historic Average is pretty much a straight …
  • 28 mars 2025TimeGPT for mobility: Can foundation models outperform classic machine learning models for mobility predictions?
    tldr; Maybe. Preliminary results certainly are impressive. Introduction Crowd and flow predictions have been very popular topics in mobility data science. Traditional forecasting methods rely on classic machine learning models like ARIMA, later followed by deep learning approaches such as ST-ResNet. More recently, foundation models for timeseries forecasting, such as TimeGPT, Chronos, and LagLlama have been introduced. A key advantage of these models is their ability to generate zero-shot predictions — meaning that they can be applied directly to new tasks without requiring retraining for each scenario. In this post, I want to compare TimeGPT’s performance against traditional approaches for predicting city-wide crowd flows. Experiment setup The experiment builds on the paper “Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction” by Zhang et al. (2017). The original repo referenced on the homepage does not exist anymore. Therefore, I forked: https://github.com/topazape/ST-ResNet as a starting point. The goals of this experiment are to: Get an impression how TimeGPT predicts mobility timeseries. Compare TimeGPT to classic machine learning (ML) and deep learning …
  • 10 mars 2025Analyzing GTFS Realtime Data for Public Transport Insights
    In today’s post, we (that is, Gaspard Merten from Universite Libre de Bruxelles and yours truly) are going to dive deep into how to analyze public transport data, using both schedule and real time information. This collaboration has been made possible by the EMERALDS project. Previously, I already shared news about GTFS algorithms for Trajectools that add GTFS preprocessing tools (incl. Route, segment, and stop layer extraction) to the QGIS Processing toolbox.  Today, we’ll discuss the aspect of handling realtime GTFS data and how we approach analytics that combine both data sources. About Realtime GTFS  Many of us have come to rely on real-time public transport updates in apps like Google Maps. These apps are powered by standardized data formats that ensure different systems can communicate. Google first introduced GTFS in 2005, a format designed to organize transit schedules, stop locations, and other static transit information. Then, in 2011, they introduced GTFS Realtime (GTFS-RT), which added the capability to include live updates on vehicle positions, delays, speeds, and much more. However, as the name suggests, GTFS Realtime is all about live data. This means that while GTFS …
  • 1 mars 2025Trajectools is moving to Codeberg
    The Trajectools repository is migrating from GitHub to Codeberg. The new home for Trajectools is: https://codeberg.org/movingpandas/trajectools The GitHub repo remains as a writable mirror, for now, but the issue tracking is only active on Codeberg. Why the move? I am working on moving my projects to European infrastructure that better aligns with my values. Codeberg is a nonprofit and libre-friendly platform based in Germany. This will ensure that the projects are hosted on infrastructure that prioritizes user privacy and open-source ideals. What does this mean for users? No impact on functionality – Trajectools remains the same great tool for trajectory analysis, available through the recently update QGIS Plugin Repo. Development continues – I’ll continue actively maintaining and improving the project. (If you want to file feature requests, please note that the issue tracker on the GitHub mirror has been deactivated and issues should be filed on Codeberg instead.) What does this mean for contributors? If you’re contributing to Trajectools, simply update your remotes to the new repository. The GitHub repo continues to accept PRs and the changes are synched between GitHub and Codeb …
  • 31 janvier 2025Geocomputation with Python: now in print!
    Today, I’m super excited to share with you the announcement that our open source textbook “Geocomputation with Python” has finally arrived in print and is now available for purchase from Routledge.com, Amazon.com, Amazon.co.uk, and other booksellers. “Geocomputation with Python” (or geocompy for short) covers the entire range of standard GIS operations for both vector and raster data models. Each section and chapter builds on the previous. If you’re just starting out with Python to work with geographic data, we hope that the book will be an excellent place to start. Of course, you can still find the online version of the book at py.geocompx.org. The book is open-source and you can find the code on GitHub. This ensures that the content is reproducible, transparent, and accessible. It also lets you interact with the project by opening issues and submitting pull requests. …
  • 11 janvier 2025Trajectools 2.4 release
    In this new release, you will find new algorithms, default output styles, and other usability improvements, in particular for working with public transport schedules in GTFS format, including: Added GTFS algorithms for extracting stops, fixes #43 Added default output styles for GTFS stops and segments c600060 Added Trajectory splitting at field value changes 286fdbd Added option to add selected fields to output trajectories layer, fixes #53 Improved UI of the split by observation gap algorithm, fixes #36 Note: To use this new version of Trajectools, please upgrade your installation of MovingPandas to >= 0.21.2, e.g. using import pip; pip.main([‘install’, ‘–upgrade’, ‘movingpandas’]) or conda install movingpandas==0.21.2 …
  • 17 décembre 2024Urban mobility insights with MovingPandas & CARTO in Snowflake
    Today, I want to point out a blog post over at https://carto.com/blog/urban-mobility-insights-with-movingpandas-carto-in-snowflake written together with my fellow co-authors and EMERALDS project team member Argyrios Kyrgiazos. For the technically inclined, the highlight are the presented UDFs in Snowflake to process and transform the trajectory data. For example, here’s a TemporalSplitter UDF: CREATE OR REPLACE FUNCTION CARTO_DATABASE.CARTO.TemporalSplitter(geom ARRAY, t ARRAY, mode STRING) RETURNS ARRAY LANGUAGE PYTHON RUNTIME_VERSION = 3.11 PACKAGES = (‘numpy’,’pandas’, ‘geopandas’,’movingpandas’, ‘shapely’) HANDLER = ‘udf’ AS $$ import numpy as np import pandas as pd import geopandas as gpd import movingpandas as mpd import shapely from shapely.geometry import shape, mapping, Point, Polygon from shapely.validation import make_valid from datetime import datetime, timedelta def udf(geom, t, mode): valid_df = pd.DataFrame(geom, columns=[‘geometry’]) valid_df[‘t’] = pd.to_datetime(t) valid_df[‘geometry’] = valid_df[‘geometry’].apply(lambda x:shapely.wkt.loads(x)) gdf = gpd.GeoDataFrame(valid_df, geometry=’geometry’, crs=’epsg:4326′) gdf = gdf.set_index(‘t’) traj = mpd.Trajectory(gdf …
  • 23 novembre 2024GeoParquet in QGIS – smaller & faster files for the win!
    tldr; Tired of working with large CSV files? Give GeoParquet a try! “Parquet is a powerful column-oriented data format, built from the ground up to as a modern alternative to CSV files.” https://geoparquet.org/ (Geo)Parquet is both smaller and faster than CSV. Additionally, (Geo)Parquet columns are typed. Text, numeric values, dates, geometries retain their data types. GeoParquet also stores CRS information and support in GIS solutions is growing. I’ll be giving a quick overview using AIS data in GeoPandas 1.0.1 (with pyarrow) and QGIS 3.38 (with GDAL 3.9.2). File size The example AIS dataset for this demo contains ~10 million rows with 22 columns. I’ve converted the original zipped CSV into GeoPackage and GeoParquet using GeoPandas to illustrate the huge difference in file size: ~470 MB for GeoParquet and zipped CSV, 1.6 GB for CSV, and a whopping 2.6 GB for GeoPackage: Reading performance Pandas and GeoPandas both support selective reading of files, i.e. we can specify the specific columns to be loaded. This does speed up reading, even from CSV files: Whole fileSelected columnsCSV27.9 s13.1 sGeopackage2min 12s 20.2 sGeoParquet7.2 s4.1 s Indeed, reading the whole GeoPackage is get …
  • 4 novembre 2024GeoAI: key developments & insights
    It’s been a while since my post on geo and the AI hype in 2019. Back then, I didn’t use the term “GeoAI”, even though it has certainly been around for a while (including, e.g., with dedicated SIGSPATIAL workshops since 2017). GeoAI isn’t one single thing. It’s an umbrella term, including: “AI for Geo” (using AI methods in Geo, e.g. deep learning for object recognition in remote sensing images) and “Geo for AI” (integrating geographic concepts into AI models, e.g. by building spatially explicit models). [Zhang 2020] [Li et al. 2024] Today’s post is a collection of key GeoAI developments I’m aware of. If I missed anything you are excited about, please let me know here in the comments or over on Mastodon. Background A week ago, I had the pleasure to attend a “Specialist Meeting” on GeoAI here in Vienna, meeting over 40 researchers from around the world, from Master students to professor emeritus. Huge props to Jano (Prof. Krzysztof Janowicz) and his team at Uni Wien for bringing this awesome group of people together. The elephant in the room: LLMs Unsurprisingly, LLMs and the claims they make about geography are a mayor issue due to the mistakes they make and the biases behind them. A …
  • 6 octobre 2024LLM-based spatial analysis assistants for QGIS
    After the initial ChatGPT hype in 2023 (when we saw the first LLM-backed QGIS plugins, e.g. QChatGPT and QGPT Agent), there has been a notable slump in new development. As far as I can tell, none of the early plugins are actively maintained anymore. They were nice tech demos but with limited utility. However, in the last month, I saw two new approaches for combining LLMs with QGIS that I want to share in this post: IntelliGeo plugin: generating PyQGIS scripts or graphical models At the QGIS User Conference in Bratislava, I had the pleasure to attend the “Large Language Models and GIS” workshop presented by Gustavo Garcia and Zehao Lu from the the University of Twente. There, they presented the IntelliGeo Plugin which enables the automatic generation of PyQGIS scripts and graphical models. The workshop was packed. After we installed all dependencies and the plugin, it was exciting to test the graphical model generation capabilities. During the workshop, we used OpenAI’s API but the readme also mentions support for Cohere. I was surprised to learn that even simple graphical models are actually pretty large files. This makes it very challenging to generate and/or modify models because …