The time based schedule is available here: Schedule
Talks
A python-based pipeline for large-scale land cover information extraction from cloud-based historical topographic map collections
Johannes Uhl University of Colorado Boulder Talk 30 Minutes Monday, June 20: 09:45 - 10:15
We leverage open-source python tools to extract historical land cover information (1890-1950) from the United States Geological Survey (USGS) Historical Topographic Map Collection (HTMC). Based on python packages for image processing, machine learning, and geospatial analysis, we extracted historical road networks, urban areas, and forest extents to enhance our knowledge of historical landscape evolution in the United States.
A python-based pipeline for large-scale land cover information extraction from cloud-based historical topographic map collections
Johannes Uhl Talk 30 Minutes
We leverage open-source python tools to extract historical land cover information (1890-1950) from the United States Geological Survey (USGS) Historical Topographic Map Collection (HTMC). Based on python packages for image processing, machine learning, and geospatial analysis, we extracted historical road networks, urban areas, and forest extents to enhance our knowledge of historical landscape evolution in the United States.
How have our landscapes evolved over the last 120 years? While data on the contemporary and more recent states of the Earth’s surface is available at high spatial, temporal, and semantic resolution, spatially explicit data on land use / land cover prior to the 1980s is rare. To overcome this data gap, we develop methods to extract retrospective geographic information from historical map archives, such as the United States Geological Survey (USGS) Historical Topographic Map Collection (HTMC), holding almost 200,000 individual maps published between 1884 and 2006.
The HTMC is a cloud-based collection of scanned map images including rich metadata, representing the only country-level geographic data resource created by surveying and manual orthophoto interpretation in the era prior to digital cartography and remote sensing. We developed a python-based pipeline that allows for retrieving scanned image files of relevant map sheets, automatically removes irrelevant data (i.e., map collars), aggregates the color information found in the scanned map sheets to a desired target resolution, and generates composites of large amounts of individual map sheets at the country scale. This aggregation step may consist of simple raster resampling but may also involve the encoding of map tiles using texture and feature descriptors. Such a framework enhances the accessibility of cloud-based, georeferenced image sources for computer vision applications and the analysis in local GIS environments. It also enables the seamless integration of information harvested from historical maps with other geospatial datasets, such as earth observation data or road network data.
We call this pipeline the “USGS HTMC map processor” and make the python code publicly available at https://github.com/johannesuhl/mapprocessor. We employed the map processor for different applications to extract historical urban areas, road networks, and forest extents for large parts of the United States between 1890 and 1950, making use of open-source image processing, geospatial analysis, and machine learning python packages. In these applications we used contemporary open geospatial data (e.g., OpenStreetMap, Global Human Settlement Layer, Landfire vegetation data) as auxiliary data. Based on these methodological contributions, we are able to reveal quantitative insight into the long-term land cover changes in the United States.
This effort contributes to the availability of fine-grained, spatial-historical open data by providing a reusable pipeline facilitating the information extraction from historical maps and to digitally preserve the historical geographic information about our past landscapes.
About Johannes Uhl
Geographic information scientist + Post-doctoral research associate @ University of Colorado, Boulder, USA. Leveraging large spatio-temporal datasets for long-term modeling of land use / land cover changes + evolution of urban systems.
A recap view on the crowdsourced map for checking supermarket wait times worldwide in 2020
In March 2020 the world is completely blocked and people are lining up to shop or to the pharmacy or to buy basic necessities. There have been many initiatives and among these I have created a worldwide map that allows anyone to check the estimated waiting times of supermarkets, pharmacies and places of interest. Here is a recap talk on how I accomplished the project hoping to inspire others
A recap view on the crowdsourced map for checking supermarket wait times worldwide in 2020
Miki Lombardi Online Talk 30 Minutes
In March 2020 the world is completely blocked and people are lining up to shop or to the pharmacy or to buy basic necessities. There have been many initiatives and among these I have created a worldwide map that allows anyone to check the estimated waiting times of supermarkets, pharmacies and places of interest. Here is a recap talk on how I accomplished the project hoping to inspire others
In March 2020 the world is completely blocked and people are lining up to shop or to the pharmacy or to buy basic necessities.
There have been many initiatives and among these I have created a worldwide map that allows anyone to check the estimated waiting times of supermarkets, pharmacies and places of interest.
In addition to this, I gave people the opportunity to check waiting times and correct them through a crowdsourcing mechanism.
All this, to be fast in development and in responding to requests, has exploited Redis with its geospatial indexes.
The opensource project has obtained more than 2Mln visits in about 3 months of life, until June 2020 when the pandemic slowed down.
In this talk we will see the architecture and the problems I encountered and solved with Redis.
About Miki Lombardi
I'm a passion & creative driven developer who has ton of others passion: basketball, cycling, photography are few of them.
Since 2014 I have lent my expertise in the software develpoment and solution architect fields in Florence, Italy.
I've been working as a software engineer & solution architect for 8+ years in both product and consultancy companies, taking the best from both worlds.
I'm currently working as a Full Stack Engineer at Growens, where I'm developing and optimizing the email marketing platform in an Agile team (SCRUM).
Accurate visual localization exploiting street-level imagery
Jonas Meyer FHNW Talk 30 Minutes Tuesday, June 21: 11:15 - 11:45
Visual localization is a key technology for applications such as augmented, mixed and virtual reality, as well as robotics and autonomous driving. It addresses the problem of estimating the 6-degree-of-freedom (DoF) camera pose from which a given image or sequence of images was captured relative to a reference scene representation, often in the form of images with known poses. Although much research has been done in this area in recent years, large variations in appearance caused by season, weather, illumination, and man-made changes, as well as large-scale environments, are still challenging for visual localization solutions. To overcome the limitations caused by appearance changes, traditional hand-crafted local image feature descriptors such as SIFT (Lowe, 2004) or SURF (Bay et al., 2008) are replaced by learned feature descriptors such as SuperPoint (DeTone et al., 2018), R2D2 (Revaud et al., 2019), ASLFeat (Luo et al., 2020), DISK (Tyszkiewicz et al., 2020) or ALIKE (Zhao et al., 2022). Hierarchical approaches combining image retrieval and structure-based localization (Sarlin et al., 2019) are developed to deal with large environments, both to keep the required computational resources low and to ensure the uniqueness of the local features.
Accurate visual localization exploiting street-level imagery
Jonas Meyer Talk 30 Minutes
Visual localization is a key technology for applications such as augmented, mixed and virtual reality, as well as robotics and autonomous driving. It addresses the problem of estimating the 6-degree-of-freedom (DoF) camera pose from which a given image or sequence of images was captured relative to a reference scene representation, often in the form of images with known poses. Although much research has been done in this area in recent years, large variations in appearance caused by season, weather, illumination, and man-made changes, as well as large-scale environments, are still challenging for visual localization solutions. To overcome the limitations caused by appearance changes, traditional hand-crafted local image feature descriptors such as SIFT (Lowe, 2004) or SURF (Bay et al., 2008) are replaced by learned feature descriptors such as SuperPoint (DeTone et al., 2018), R2D2 (Revaud et al., 2019), ASLFeat (Luo et al., 2020), DISK (Tyszkiewicz et al., 2020) or ALIKE (Zhao et al., 2022). Hierarchical approaches combining image retrieval and structure-based localization (Sarlin et al., 2019) are developed to deal with large environments, both to keep the required computational resources low and to ensure the uniqueness of the local features.
In this talk we present our visual localization solution based on a huge database of images and associated poses obtained from mobile mapping campaigns. Our visual localization solution follows a hierarchical approach and consists of three steps. Steps one and two are performed in our own Python library, while step three is performed in COLMAP, an open-source structure-from-motion (SfM) software that we bound with PyBind11. To localize a query image, we first select potential reference images in the database with a spatial query of the prior pose of the query image. Then, we perform image retrieval to find the 15 most similar reference images. Second, we extract local features from query and reference images and match them pairwise. Third, we build a COLMAP database with the matches and image metadata and import it to COLMAP. Then we use the bound COLMAP functions to perform a geometric verification of the raw matches, reconstruct the 3D scene from the reference images, and register the query image to the 3D scene by 2D-3D matches. Finally, we obtain the pose and the associated standard deviation of the query image. We tested our approach using accurately georeferenced street-level imagery provided by our project partner iNovitas AG. Experiments in road and railway environments demonstrated a high accuracy potential of sub-decimeter in position and sub-degree in orientation.
Using Python capabilities to manage RockEval® Data, interpret them and finally create a kerogen kinetic.
Appling Python in Petroleum System Modeling.
Giboreau Online Talk 30 Minutes
Using Python capabilities to manage RockEval® Data, interpret them and finally create a kerogen kinetic.
Describing the evolution of sedimentary layers which contains or had contained a reactive organic matter which is/was able to produce hydrocarbons one (Source Rock) of the key elements in a petroleum system: if it does not exist, there is no chance to find hydrocarbons in a basin. Geochemistry helps us to understand if the source exists (Kerogen), if it has been active and when, if large volumes of hydrocarbons have been generated and which type.
Extensive works have been done since the 80’s to measure, understand and predict the source rocks behaviors (potential of generation and kinetic of reaction). Several “kerogen types” were described and nowadays our understanding is good enough to propose geochemical modeling solutions to predict oil and gas occurrences in basins. One of the key data to populate these models are parameters obtained by pyrolysis of source rock samples (e.g. RockEval® TOC, HI, Tmax, etc.). The first challenge is to correctly interpret RockEval data for identifying different kerogen types vs. different maturity levels. The second challenge is to obtain consistent kinetic schemes, direct measurement being more expensive (especially compositional ones) and sometimes unusable.
We will explain our workflow from data collecting to kerogen kinetic modelling with a special focus on standard interpretation graphs and kinetic modeling. All the shown images and scripts can be easily redone by a python beginner with open-source library. The key point is to properly write the pyrolysis routine using several parallel 1st order chemical reactions (Arrhenius). With the proposed scripts it is possible to QC the data, create kinetic from pyrolyze data (pyrograms and/or Rock Eval), mix the kinetics of several kerogens, and propose a compositional dressing to predict fluid composition during primary cacking.
About Giboreau
Petroleum Geologist
Cloud for Mars: python tools to planetary data access through EOSC
Carlos Brandt Jacobs University of Bremen Talk 30 Minutes Monday, June 20: 11:45 - 12:15
In this talk, I will walk you through Python-backed Geo-Planetary data projects being developed in the last years in the realm of the European Open Science Cloud, where the ultimate goal is to bring analysis-ready data to the general public.
Cloud for Mars: python tools to planetary data access through EOSC
Carlos Brandt Talk 30 Minutes
In this talk, I will walk you through Python-backed Geo-Planetary data projects being developed in the last years in the realm of the European Open Science Cloud, where the ultimate goal is to bring analysis-ready data to the general public.
Once restricted to fiction moves, space exploration has become reality as the recent development of hardware and software technologies allowed the scaling up of missions quality and also data management and analysis. In the core fundamentals of such improvement we will find open-source software and its community, which we can see acknowledged through Github Mars 2020 Helicopter Contributor’ list of software (https://bit.ly/3M971uK) where, for instance, Python and so other many libraries of our everyday were fundamental to put a drone flying on Mars (as of now).
Clearly a recipe for society success, open-source philosophy spread to other areas of research and development, Data being the one next VIP to enter the club. Currently, in space sciences, no new projects come live with embargoed data, and to the old archived data by NASA and ESA are going through great efforts to bring their data not only public but easily accessible to the general public.
In particular, this talk will present three European projects focused on bringing planetary data and the necessary tools to allow their easy access by non-experts (data scientists, engineers, enthusiasts): NEANIAS (https://www.neanias.eu/), GMAP (https://www.europlanet-gmap.eu/), VESPA (http://www.europlanet-vespa.eu/). Besides the data mission, those planets share one thing we much interested in this meeting: Python and open-source software. I will present the concept of FAIR (Findable, Accessible, Interoperable, Reusable) data, and the (Python) software and surrounding tools the audience can — and should! — use to access out of this world data (Mars!) and also, hopefully, engage the participant of developers in boosting the exploration of nearby planets and moons.
DL4DS - A python library for empirical downscaling and super-resolution of Earth Science data
Carlos Alberto Gómez Gonzalez Barcelona Supercomputing Center Talk 30 Minutes Monday, June 20: 11:15 - 11:45
In this talk, we present DL4DS, a python package that implements a wide variety of state-of-the-art and novel algorithms for downscaling gridded Earth Science data with deep neural networks. DL4DS has been designed with the goal of providing a general framework for convolutional neural networks with configurable architectures and training procedures to enable benchmark, comparative and ablation studies.
DL4DS - A python library for empirical downscaling and super-resolution of Earth Science data
Carlos Alberto Gómez Gonzalez Talk 30 Minutes
In this talk, we present DL4DS, a python package that implements a wide variety of state-of-the-art and novel algorithms for downscaling gridded Earth Science data with deep neural networks. DL4DS has been designed with the goal of providing a general framework for convolutional neural networks with configurable architectures and training procedures to enable benchmark, comparative and ablation studies.
A common task in Earth Sciences (ES) is to infer weather and climate information at local and regional scales from global climate models. Dynamical downscaling requieres running expensive numerical models at high resolution and comes with high computational costs. Empirical or statistical downscaling techniques present an alternative approach for learning links between the large- and local-scale climate. A large number of deep neural network-based approaches for statistical downscaling have been proposed in recent years (Vandal et al., 2017; Leinonen et al., 2020; Höhlein et al., 2020; Liu et al., 2020; Harilal et al., 2021), mainly in the form of convolutional neural networks (CNNs). CNNs have been used almost universally for the past few years in computer vision applications, such as object detection, semantic segmentation and super-resolution (SR), and have shown outstanding capabilities in the task of SR and downscaling Earth observation and ES data.
While scientific software tools such as Numpy, Xarray or Jupyter have an essential role in modern ES research workflows, state-of-the-art domain-specific Deep Learning-based algorithms are usually developed as proof-of-concept scripts. For Deep Learning to fulfil its potential to transform ES, the development of Artificial Intelligence- and Deep Learning-powered scientific software must be carried out in a collaborative and robust way following open-source and modern software development principles. In our search for efficient architectures for downscaling ES data, we have developed DL4DS, a library that draws from recent computer vision developments for tasks such as image-to-image translation and SR. DL4DS is implemented in Tensorflow/Keras and contains a large collection of building blocks that abstract and modularize a few core design principles for composing and training CNNs-based empirical downscaling models. DL4DS can be found in this repository: https://github.com/carlgogo/dl4ds
About Carlos Alberto Gómez Gonzalez
I am a Marie-Curie postdoctoral fellow at the Earth Sciences department of the Barcelona Supercomputing Center (BSC-ES) where I lead a research line on Artificial Intelligence applied to climate and atmospheric composition problems. I am interested in the development of machine and deep learning algorithms for topics, such as statistical downscaling, bias correction techniques, data-driven parameterisations, and the study of extreme climate events. Finally, I care about open science, sustainable code and reproducibility.
Display Your Map on a Website Using Geopandas , Folium , Django , and Heroku
You often work in a notebook using Geopandas and other libraries . But it is always nice to be able to display you map to customers using a website. We will learn how to do so without additional costs.
Display Your Map on a Website Using Geopandas , Folium , Django , and Heroku
Gregory Wallace Talk 30 Minutes
You often work in a notebook using Geopandas and other libraries . But it is always nice to be able to display you map to customers using a website. We will learn how to do so without additional costs.
We will start by a data preparation example with Geopandas to prepare a Folium map. Then we will upload our map in a DJANGO application. Finally we will push the code to Heroku to have a website ready with a map page.
About Gregory Wallace
Data Scientist and responsible for GIS applications at TotalEnergies
EOReader - Remote-sensing opensource python library for optical and SAR sensors
EOReader is a remote-sensing opensource python library reading optical and SAR sensors, loading and stacking bands, clouds, DEM and index in a sensor-agnostic way.
EOReader - Remote-sensing opensource python library for optical and SAR sensors
Rémi Braun Talk 30 Minutes
EOReader is a remote-sensing opensource python library reading optical and SAR sensors, loading and stacking bands, clouds, DEM and index in a sensor-agnostic way.
The main goal of EOReader is to simplify the access to a numerous offer of remote sensing data, providing easy to understand and sensor-agnostic functions to read, load and stack multiple bands and indices (and even DEM or cloud bands).
For example, one important feature of EOReader is the mapping of optical bands in order to access them sensor-agnostically (i.e. with RED band, you can access the band number 4 of Sentinel-2, 8 of SENTINEL-3 OLCI, 1 of Pleiades, see here for more information)
We wanted also to eliminate tricky steps such as orthorectification or geocoding for SAR and Sentinel-3 data. By automating it, we allow more users to develop applications with remote-sensing data.
Last but not least, we want to provide an opensource library using cutting-edge technology, this is why we are using xarrays and support dask.
The Github repo can be found here.
The API documentation can be found here.
EOmaps is a python library to simplify the creation of static and interactive maps. It provides an easy-to-use framework to visualize, analyse and compare (potentially large) geographical datasets.
EOmaps - Interactive maps in python
Raphael Quast Talk 30 Minutes
EOmaps is a python library to simplify the creation of static and interactive maps. It provides an easy-to-use framework to visualize, analyse and compare (potentially large) geographical datasets.
EOmaps brings together the capabilities of existing libraries (matpltolib, cartopy, geopandas, datashader, xarray ...) to provide an easy-to-use framework for visualization and comparison of earth-observation datasets.
Turn your static maps into powerful interactive data-analysis widgets!
The primary goal of EOmaps is to provide a simplified (and well documented) python interface to analyse both structured and unstructured datasets provided in different coordinate-systems and spatial resolutions.
The key features are:
Fast and meaningful visualization of potentially large datasets
moderately large datasets (<1M) are visualized as projected shapes with actual geographical dimensions
very large datasets (>1M) are visualized with "datashader" to speed up plotting.
Interact with the map via pre-defined (or custom) callback-functions
Identify clicked points, add annotations, markers etc.
Execute custom functions to interact with the underlying dataset
Directly compare your data to other datasets or WebMaps
Switch/swipe between different data-layers
or publicly available WebMap services
Add features and utulities to customize the maps
colorbars, scalebars, North-arrows, annotations, markers, etc.
NaturalEarth features and WebMap layers
Easily build interactive and static maps with hvPlot
Maxime Liquet Anaconda Online Talk 30 Minutes Monday, June 20: 14:30 - 15:00
hvPlot adapts and extends the .plot() API made popular by Pandas and Xarray to easily create interactive and static maps.
Easily build interactive and static maps with hvPlot
Maxime Liquet Online Talk 30 Minutes
hvPlot adapts and extends the .plot() API made popular by Pandas and Xarray to easily create interactive and static maps.
If you have done data analysis with Pandas before, then you have likely encountered the pandas .plot() API that renders static images using Matplotlib. The pandas .plot API has emerged as a de-facto standard for high-level plotting APIs in Python, and is now supported by multiple data libraries (including GeoPandas, Xarray, and Dask) and multiple underlying plotting engines (via pandas-bokeh, cufflinks, hvPlot, etc.) that provide additional power and flexibility. Learning this API allows you to access capabilities provided by a wide variety of underlying tools, with relatively little additional effort.
In this presentation we’ll introduce you to the high-level .plot() API offered by hvPlot, a Python library that is part of the HoloViz ecosystem and built on top of powerful data visualization libraries like HoloViews, GeoViews, Datashader, and Panel. You will see how easy it is to create interactive Bokeh, Matplotlib, or Plotly maps:
from your Pandas, GeoPandas or Xarray objects,
to help you understand your columnar or multidimensional data fully,
to render very large datasets faithfully to show both patterns and outliers,
to explore temporal datasets with an automatically generated dashboard-like view.
About Maxime Liquet
Software Engineer at Anaconda, maintaining and improving the open-source data viz libraries of the HoloViz ecosystem. Previously a civil engineer specialized in flood risk assessment, making flood maps with hydraulic simulation software.
Explorative Analysis and Visualization of High-dimensional Remote Sensing Data Using UMAP
Sylvia Schmitz Fraunhofer IOSB, Ettlingen and Karlsruhe Institute of Technology (KIT) Talk 30 Minutes Monday, June 20: 11:45 - 12:15
How can the information content of large and complex remote sensing data sets be easily grasped and evaluated? And in which way is it possible to identify the potential of such data sets with respect to concrete objectives? Methods from the field of manifold learning, for which implementations are available as ready-to-use Python packages, are a good remedy. This talk focuses on the application of the dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) for the visualization of high-dimensional remote sensing data.
Explorative Analysis and Visualization of High-dimensional Remote Sensing Data Using UMAP
Sylvia Schmitz Talk 30 Minutes
How can the information content of large and complex remote sensing data sets be easily grasped and evaluated? And in which way is it possible to identify the potential of such data sets with respect to concrete objectives? Methods from the field of manifold learning, for which implementations are available as ready-to-use Python packages, are a good remedy. This talk focuses on the application of the dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) for the visualization of high-dimensional remote sensing data.
In the evaluation and interpretation of remote sensing data, one often has to deal with high-dimensional data sets. The high number of dimensions of the observation space results, for example, from high spectral resolution in the case of hyperspectral data, the use of different polarizations and frequency bands in SAR systems, from temporal variation or from the combination of different sensor systems. The use of high-dimensional data offers opportunities and challenges at the same time. On the one hand, the use of many different measurements allows a better characterization of an observed scene. On the other hand, it is extremely difficult for human observers to get an overall view of all relevant information and to interpret their interaction. Furthermore, the high dimension of the observation space also complicates automatic pattern recognition, which is the basis of e.g. classification, regression or clustering. To counteract this problem, a number of dimension reduction methods have been developed, which aim at preserving relevant information in only a few components to the maximum possible extent while removing redundant information. One representative, which belongs to the field of manifold learning, is the algorithm UMAP presented in 2018 by McInnes et al..
In this talk, I will present applications of the UMAP algorithm for the visualization of high-dimensional remote sensing data. The focus is on the exploratory analysis of multi-frequency, full-polarimetric, interferometric SAR data. I will walk through the extraction of physical features that spans the high-dimensional observation space, the projection of the data in a 3-dimensional Euclidean space and the resulting opportunities for visual, comprehensive data analysis. In addition, I demonstrate how UMAP can be integrated as a pre-processing step for land cover classification, serving to simplify the training process and improve the explanatory power of classification results.
About Sylvia Schmitz
I am a scientific researcher and PhD candidate at Fraunhofer IOSB and the Institute of Photogrammetry and Remote Sensing at KIT. My research focuses on the developement of learning based algorithms for the analysis and interpretation of polarimetric SAR image data in remote sensing applications. In this context, I am particularly interested in topics such as the combination of model-based and data-driven approaches as well as multimodal data fusion and knowledge representation.
The openness of the Python community and their willingness to share their knowledge and provide freely available tools excites me and assists me in making very fast progress in my scientific work.
Finding inequality in public transport mobility patterns for the Metropolitan Region of Buenos Aires
Sebastián Anapolsky, Felipe Gonzalez Online Talk 30 Minutes Tuesday, June 21: 15:30 - 16:00
This paper prepared for the Inter-American Development Bank analyzes the travel patterns of different socioeconomic groups with data from the public transport electronic payment system in the Metropolitan Buenos Aires Region.
Finding inequality in public transport mobility patterns for the Metropolitan Region of Buenos Aires
Sebastián Anapolsky, Felipe Gonzalez Online Talk 30 Minutes
This paper prepared for the Inter-American Development Bank analyzes the travel patterns of different socioeconomic groups with data from the public transport electronic payment system in the Metropolitan Buenos Aires Region.
The literature on mobility has delved into the close link between mobility patterns and social inequality. Using data from the public transport electronic payment system, census data, and the location of slums, this paper carries out an empirical analysis of travel patterns of different socioeconomic groups that use public transport in the Metropolitan Region of Buenos Aires. To perform this analysis, we process data from the public transport electronic payment system in order to infer destinations and create daily trip chains for each user. Since all transactions are georeferenced at the origin of each trip, we assign users a socioeconomic level constructed with census data considering the location of the first trip of the day (which presumably corresponds to a stop close to their homes). Then, we calculate trip distances, develop origin-destination matrices, and create travel pattern maps for the different socioeconomic groups. We found out that trips from low socioeconomic groups are more dispersed in the territory, while trips from high socioeconomic groups are more concentrated in the central administrative and business area of the city. Lower-income groups tend to have destinations that are less connected, which results in longer trips and more transfers, without using a more efficient multimodal strategy. The modal split is also characterized by greater use of the bus (instead of using metro or train) and, even when transfers occur, there is a greater probability of combining two buses instead of metro and rail. In the case of users who live near vulnerable neighborhoods, we observe that the trips are shorter, more direct, and with fewer transfers than those of users of low socioeconomic status.
About Sebastián Anapolsky
Sebastián Anapolsky is an urban development, mobility, and transport specialist in city and metropolitan projects with more than 15 years of professional experience. He is using data to understand the complexity of cities with expertise in data mining, and geospatial analysis techniques.
About Felipe Gonzalez
Felipe Gonzalez, Ms in Applied Urban Science and Informatics (NYU), passionate about sustainable mobility and evidence-based public policy. I've been working with both industry and public sectors in bearing data analysis to foster better mass public transportation.
Formulating geospatial data questions to answer big problems
bonny mcclain data & donuts Online Talk 45 Minutes Tuesday, June 21: 16:30 - 17:15
Data storytelling has never been more popular. Immanuel Kant stated the following in 1802, "The history of occurrences at different times, which is true history, is nothing other than a consecutive geography, and thus it is a great limitation on history if one does not know where something happened, or what it was like”.
To truly bring history to light we need to bring the right data into the conversation, use the right tools, and be able to hold a tension between what we would like the solutions to be and what limits the actual realization of change. The story I would like to tell by engaging spatial and non-spatial data centering around the role disinformation and politics played in the profound deforestation of the Amazon since 2018. What can we measure? What should we be measuring?
Formulating geospatial data questions to answer big problems
bonny mcclain Online Talk 45 Minutes
Data storytelling has never been more popular. Immanuel Kant stated the following in 1802, "The history of occurrences at different times, which is true history, is nothing other than a consecutive geography, and thus it is a great limitation on history if one does not know where something happened, or what it was like”.
To truly bring history to light we need to bring the right data into the conversation, use the right tools, and be able to hold a tension between what we would like the solutions to be and what limits the actual realization of change. The story I would like to tell by engaging spatial and non-spatial data centering around the role disinformation and politics played in the profound deforestation of the Amazon since 2018. What can we measure? What should we be measuring?
We will use satellite imagery, social media data, media headlines, and non-spatial climate science knowledge to explore how to create robust data questions worthy of sincere and focused analysis and discussion.
Open-source tools and datasets will allow us to test our hypothesis against our existing biases. In the absence of objective experience, we are human after all, what tools can help measure expectation, time-series comparisons, and inference?
ITS_LIVE: Simplifying access to global glaciological big data
Luis Lopez The National Snow and Ice Data Center (NSIDC) Talk 30 Minutes Monday, June 20: 14:00 - 14:30
ITS_LIVE is a NASA MEaSUREs project that produces low latency, global glacier flow and elevation change datasets. The size and complexity of this data makes its distribution and use a challenge. To address these problems, ITS_LIVE was built for modern cloud-optimized data formats and includes easy-to-use Jupyter notebooks for data access and visualization.
ITS_LIVE: Simplifying access to global glaciological big data
Luis Lopez Talk 30 Minutes
ITS_LIVE is a NASA MEaSUREs project that produces low latency, global glacier flow and elevation change datasets. The size and complexity of this data makes its distribution and use a challenge. To address these problems, ITS_LIVE was built for modern cloud-optimized data formats and includes easy-to-use Jupyter notebooks for data access and visualization.
This presentation will show how ITS_LIVE uses the Pangeo stack to generate zarr data cubes that make quick access possible without the need of a back-end service. We will also delve into our data access strategy and how we leverage the Jupyter ecosystem to create ipyleaflet-based notebooks to visualize big data in a matter of seconds.
Improving GNSS position quality with machine learning approaches
Stark, Hans-Jörg Prof. Talk 30 Minutes Monday, June 20: 16:30 - 17:00
Starting from raw GNSS position with a lot of noise and scattered patterns machine learning algorithm such as random forests help to improve the classification of GNSS positions into "good" and "bad" ones.
Improving GNSS position quality with machine learning approaches
Stark, Hans-Jörg Prof. Talk 30 Minutes
Starting from raw GNSS position with a lot of noise and scattered patterns machine learning algorithm such as random forests help to improve the classification of GNSS positions into "good" and "bad" ones.
Construction site vehicles are sending regularly their position to a server and a central unit is interested in observing the locations or tracking the objects and see where they had been in the past. As a matter of fact, these positions are very often scattered especially when the vehicles are hardly moving or at a stand still. Nevertheless they receive signals and send their positions to the central unit. This leads to a fair amount of data that can be significantly reduced. This presentation shows how with machine learning approaches GNSS positions can be classified as good or bad ones and how the bad ones that are usually scattered and for clusters can be grouped and reduced to a centroid. As a result the amount of data is massively reduced and is a lot more manageable for usage in other applications that are sensitive to data volume or have a minor bandwidth to access and display the data in a map context.
This talk breaks down the simple Point in Polygon problem by briefly discussing the different algorithms that tackle it and what happens behind the scene in the tools & libraries that you use to run this with a simple click of a button/ one line of code
Is the Point inside the Polygon ?
sangarshanan Online Talk 30 Minutes
This talk breaks down the simple Point in Polygon problem by briefly discussing the different algorithms that tackle it and what happens behind the scene in the tools & libraries that you use to run this with a simple click of a button/ one line of code
Is the Point inside the Polygon? This is a very basic and fundamental question in computational geometry with use-cases rooted not only in GIS but in graphics, computer vision, path planning & computer-aided design. Tons of algorithms have come up over the years that have attempted to tackle this problem efficiently beginning with simple ray casting & winding algorithms to specialized spatial indexes and parallelization to optimize it to the fullest, So let's take a trip through history and discuss how different algorithms tacked this problem. I hope to guide you down the rabbit hole I went through and maybe when we come out we have a deeper appreciation for the abstractions and algorithms that help solve this fundamental problem with a single line of code
(2 min) Introduction
To myself and the Problem statement ?
(5 mins) Crossing Number method
Explanation, Pros, Cons, Visualisation & Demo
(5 mins) Winding Number method
Explanation, Pros, Cons, Visualisation & Demo
(5 mins) Tree approaches
Quadtree For the Win
Queries on Postgis
(5 mins) Extending the Problem
Different Types of Polygons (Self-intersecting, Monotone, Convex)
Different Dimensions of Polygons ( 3 Dimensional )
(3 mins) Buffer Overflow & Questions
About sangarshanan
My name is Sangarshanan and I am a Software Engineer from planet Earth. I love making stuff that helps and amuses me in equal measure and standing upside down while holding a banana. When I'm bored you can find me making absurdist memes, yet another spotify playlist or staring straight into the void
Spatial and temporal data is in high demand by Data Scientists and crops domain experts, wishing to quickly develop models to help farmers optimize their crops production in a climate friendly way. A way to efficiently create, save and load the data is necessary. Our solution is to store the data in one large multi-dimensional geospatial and temporal dataset.
Large-scale geospatial and temporal dataset
Donjeta Runjeva Talk 30 Minutes
Spatial and temporal data is in high demand by Data Scientists and crops domain experts, wishing to quickly develop models to help farmers optimize their crops production in a climate friendly way. A way to efficiently create, save and load the data is necessary. Our solution is to store the data in one large multi-dimensional geospatial and temporal dataset.
Currently, the spatial and temporal data is handled separately in the different departments. We wish to create a storage for the data where everyone has easy access to a unique and high-quality dataset.
Other georeferenced data:
- Harvest yield
- Elevation maps
- Weather data
- Annotations like crops falling over (lodging).
In agriculture we have a special case where we’re interested in small areas in relation to the area covered by satellite images. The small areas being the crop fields. The farmers are interested in both field level and pixel level mathematical models to utilize precision farming and explain rela-tions in their fields.
The dataset was created by arranging the geospatial data by UTM Zone into a Zarr dataset with a chunk size of 10x10km for 7 days and a resolution of 10x10m and 1 day. Saving and loading was done using Dask arrays and Xarray. The solution enables us to easily add new dimensions to the dataset and navigate to the relevant section automatically, providing us easy handling and scaling of the large multi-dimensional dataset.
Learning from the “cool kids”: how academic research can benefit from becoming more like open-source
Martin Fleischmann University of Liverpool Talk 30 Minutes Tuesday, June 21: 11:45 - 12:15
While academic research heavily depends on open-source software, the relationship is often one-way. We believe that designing research in close relation to open-source development is beneficial for all parties and present one way of doing that, by turning a research project into a component of the open-source ecosystem.
Learning from the “cool kids”: how academic research can benefit from becoming more like open-source
Martin Fleischmann Talk 30 Minutes
While academic research heavily depends on open-source software, the relationship is often one-way. We believe that designing research in close relation to open-source development is beneficial for all parties and present one way of doing that, by turning a research project into a component of the open-source ecosystem.
Academic research often depends on open-source software. Still, researchers do not contribute back that often due to the lack of institutional incentives, time demands, or an imposter syndrome (“my code is too messy”). However, open-source software development doesn’t have to be detached from academic work.
The first step is a decision to make the code open. Then the question is, how?
From an academic standpoint, packing up the functionality into a new package instead of contributing to existing libraries could lead to additional publications that matter in career progress. However, from an open-source standpoint, such an approach widens the ecosystem’s fragmentation and threatens its sustainability. In this talk, we outline why we chose the path benefiting open source over the academic benefits, how we did it and our vision of academic work closely linked to open-source development.
We illustrate this approach in our work on the Urban Grammar AI project, combining aspects of radical openness of the process, making research code available as it is written, and enhancing existing libraries when we need new functionality. It led to significant contributions to the GeoPandas and PySAL ecosystem, a release of one independent package with functionality that didn’t fit elsewhere, and further developments of a canonical Docker container for geographic data science.
About Martin Fleischmann
Martin is a Research Associate in the Geographic Data Science Lab at the University of Liverpool. He is researcher in urban morphology and geographic data science focusing on quantitative analysis and classification of urban form, remote sensing, and AI.
He is the author of momepy, the open source urban morphology measuring toolkit for Python, and a member of the development teams of GeoPandas, the open source Python package for geographic data, and PySAL, the Python library for spatial analysis.
Likeness: a Python toolkit for connecting the social fabric of place to human dynamics
Joe Tuccillo, James Gaboardi Oak Ridge National Laboratory Online Talk 30 Minutes Tuesday, June 21: 16:00 - 16:30
Promoting community resilience requires population data that captures human dynamics with high spatial, temporal, and demographic fidelity. Likeness is a Python toolkit that supports these aims by creating agents informed by hundreds of individual-level attributes from census microdata and producing realistic simulations of their activity spaces.
Likeness: a Python toolkit for connecting the social fabric of place to human dynamics
Joe Tuccillo, James Gaboardi Online Talk 30 Minutes
Promoting community resilience requires population data that captures human dynamics with high spatial, temporal, and demographic fidelity. Likeness is a Python toolkit that supports these aims by creating agents informed by hundreds of individual-level attributes from census microdata and producing realistic simulations of their activity spaces.
Likeness is a Python implementation of the UrbanPop framework developed by Oak Ridge National Laboratory [1] that pairs attribute-rich synthetic populations with realistic simulations of human activity spaces at high spatial and temporal resolutions. A core principle of Likeness is the creation of "vivid" synthetic populations, in which individuals are described by hundreds of census microdata attributes covering demographics, socioeconomic status, housing, and health. Vivid synthetic populations are available for any location in the United States thanks to annual data releases from the American Community Survey (ACS) and its Public Use Microdata Sample (PUMS). Likeness consists of three core packages: pymedm (spatial allocation), livelike (population synthesis), and actlike (activity modeling).
The pymedm package is the building block for Likeness, supporting spatial allocation of longform survey responses from the PUMS to granular census geographies (e.g., block groups). pymedm is a Python port of Penalized Maximum-Entropy Dasymetric Modeling (P-MEDM), a method designed to accommodate a high volume of individual-level attributes from the PUMS [2, 3].
The livelike package is a population synthesizer that integrates pymedm/P-MEDM with the Census Microdata API. Interoperability between livelike and the Census Microdata API 1) provides a flexible means of generating model constraints for pymedm/P-MEDM and 2) supports querying and small-area estimation relative to specific population segments.
The actlike package optimally allocates agents from synthetic populations generated by livelike to points of interest (e.g., schools) along transportation networks [4, 5] with an integer program that "sends" individual agents from nighttime to daytime locations. This functionality enables researchers to examine human mobility and activity spaces more deeply.
We demonstrate Likeness by modeling human activity spaces in the Knoxville, TN Metropolitan Statistical Area with 2019 ACS 5-Year Estimates. For daytime locations we use K-12 schools with faculty and enrollment sizes obtained from the Homeland Infrastructure Foundation Level Database.
Aziz, H. M., Nagle, N. N., Morton, A. M., Hilliard, M. R., White, D. A., & Stewart, R. N. (2018). Exploring the impact of walk–bike infrastructure, safety perception, and built-environment on active transportation mode choice: a random parameter model using New York City commuter data. Transportation, 45(5), 1207-1229.
Leyk, S., Nagle, N. N., & Buttenfield, B. P. (2013). Maximum entropy dasymetric modeling for demographic small area estimation. Geographical Analysis, 45(3), 285-306.
Nagle, N. N., Buttenfield, B. P., Leyk, S., & Spielman, S. (2014). Dasymetric modeling and uncertainty. Annals of the Association of American Geographers, 104(1), 80-95.
Boeing, G. (2017). OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems, 65, 126-139.
Foti, F., Waddell, P., & Luxen, D. (2012, February). A generalized computational framework for accessibility: from the pedestrian to the metropolitan scale. In Proceedings of the 4th TRB Conference on Innovations in Travel Modeling. Transportation Research Board.
Copyright: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Acknowledgement: This material is based upon the work supported by the U.S. Department of Energy under contract no. DE-AC05-00OR22725.
Mapping VIIRS Active Fires in South America
Abraham Coiman Online Talk 30 Minutes Monday, June 20: 16:00 - 16:30
In this talk, we will show you the use of geospatial Python libraries within a Jupyter Notebook to map VIIRS active fires in South America. We will show you a straightforward workflow to visualize interactively VIIRS active fires using Geopandas and Folium libraries. This workflow could be easily customized to map active fires in any country around the world.
Mapping VIIRS Active Fires in South America
Abraham Coiman Online Talk 30 Minutes
In this talk, we will show you the use of geospatial Python libraries within a Jupyter Notebook to map VIIRS active fires in South America. We will show you a straightforward workflow to visualize interactively VIIRS active fires using Geopandas and Folium libraries. This workflow could be easily customized to map active fires in any country around the world.
In this talk, we will show you the use of geospatial Python libraries within a Jupyter Notebook to map VIIRS ( Visible Infrared Imaging Radiometer Suite ) active fires in South America. We will show you a straightforward workflow to obtain and map input geospatial data to visualize interactively VIIRS active fires using Geopandas and Folium libraries. zoom_start parameter in Folium library is always set manually, thus in this talk, we will demonstrate how to calculate automatically zoom level using the bounding box of a given country. This workflow could be easily customized to map active fires in any country around the world. We expect the audience has a basic understanding of GIS, Remote Sensing, and Python programming.
Mapping a COVID-19 Testing Needs Index
Krista Mar Jefferson Health Talk 30 Minutes Tuesday, June 21: 11:15 - 11:45
COVID-19 Testing was inequitably distributed at the start of the pandemic, especially to at-risk populations. An index was created and mapped to help the operational and population health team members decide on where to put additional COVID-19 testing centers.
Mapping a COVID-19 Testing Needs Index
Krista Mar Talk 30 Minutes
COVID-19 Testing was inequitably distributed at the start of the pandemic, especially to at-risk populations. An index was created and mapped to help the operational and population health team members decide on where to put additional COVID-19 testing centers.
Access to COVID-19 testing was both limited and uneven for underserved populations, particularly in the early course of the COVID-19 pandemic in the United States. Limited testing and reagent materials contributed to this problem as well as the quick response needed to handle the first wave of the pandemic. A COVID-19 Testing Needs Index was developed for use in Philadelphia to assist Jefferson Health in determining which neighborhoods most in need of access to COVID-19 testing. Internal data, city of Philadelphia COVID-19 data, and US census data were used in the index. Epidemiological measures were used as well as measures to capture disproportionate disease burden on historically underserved communities. Once the neighborhoods most in need of additional testing were highlighted, community partners were identified to collaborate on implementing testing locations.
Python, geopandas, and folium were used to create a quick and dynamic visualizations to guide discussions and decisions about where to put additional COVID-19 testing centers.
MovingPandas: general purpose visual movement data analytics
Anita Graser AIT Austrian Institute of Technology Online Talk 30 Minutes Tuesday, June 21: 10:45 - 11:15
This talk presents MovingPandas, a project that aims to provide general purpose tools for analyzing and visualizing movement data.
MovingPandas: general purpose visual movement data analytics
Anita Graser Online Talk 30 Minutes
This talk presents MovingPandas, a project that aims to provide general purpose tools for analyzing and visualizing movement data.
There are at least 58 different libraries for tracking data analysis in the R ecosystem alone. Is the Python community going down the same path? Many recently introduced Python movement data analysis libraries focus on specific application domains, be it mobility research, movement ecology, or sports analysis. Many of these libraries build on Pandas, fewer on GeoPandas. There is a lot of overlap but also contrasting ideas and approaches. This talk focuses on MovingPandas, a project that aims to provide general purpose tools for analyzing and visualizing movement data. We'll look at how to get started as well as at recent developments.
About Anita Graser
Anita Graser is a spatial data scientist, open source GIS advocate, and author. She works as a scientist at the AIT Austrian Institute of Technology in Vienna and teaches a class on QGIS and Python at UNIGIS Salzburg. She serves on the QGIS project steering committee and is the developer of MovingPandas, a Python library for analyzing movement data. She has published several books about QGIS, including “Learning QGIS” and “QGIS Map Design”. In 2020, she was awarded the Sol Katz Award 2020 for Geospatial Free and Open Source Software.
Parking Recommendation Service Using RS & GIS
Abouzar Ramezani, Moslem Darvishi Sayyed Jamaleddin Asadabadi University Talk 30 Minutes Monday, June 20: 15:30 - 16:00
In this talk we will implement a location-based service for indicating the nearest parking slot to drivers by analyzing the data obtained by urban cameras. To analyze camera images, a new convolution neural networks is developed.
Parking Recommendation Service Using RS & GIS
Abouzar Ramezani, Moslem Darvishi Talk 30 Minutes
In this talk we will implement a location-based service for indicating the nearest parking slot to drivers by analyzing the data obtained by urban cameras. To analyze camera images, a new convolution neural networks is developed.
Electronic city (or smart city), intending to provide qualitative services at reasonable prices to all, tries to increase productivity and improve the quality of life of people. Meanwhile, transportation as one of the infrastructure systems plays an influential role in providing a platform for creating a smart city. The use of information and communication systems in your transportation can facilitate and pave the way for this. The smart city is a framework created primarily from information and communication technology (ICT) to develop, expand, and promote sustainable development practices of urbanization. IoT-based cloud computing programs receive, analyze, and manage information instantly to help municipalities, companies, and citizens make better decisions to improve their quality of life. People use various ways to connect with the ecosystems of a smart city, such as smartphones, portable intelligent devices, cars, and smart homes. Today, 54% of the world's population lives in different cities, expected to reach 66% by 2050. In total, with population growth, urbanization will add another 2.5 billion to cities over the next three decades. Environmental, social and economic sustainability is one of the most critical points to coordinate with this rapid population expansion and financing of cities. According to the latest research from the institute, Juniper is predicted that the implementation of intelligent traffic management and smart parking plans will save about 4.2 billion people-hours annually by 2021. The Juniper Institute predicts that by 2021, about 2 million smart parking spaces will be created worldwide due to improved private and commercial traffic flows. Finding valuable information in this modern age, technologies that connect vehicles to the Internet and produce large amounts of data have become a key concern.
About Abouzar Ramezani
I am currently an assistant Professor at the Geomatics Engineering faculty of Sayyed Jamaleddin Asadabadi University. My research interests include Machine Learning, Ubiquitous and Mobile GIS, Location Based Services, Spatiotemporal analysis, and Vulnerability Assessment.
About Moslem Darvishi
PHD Candidate in Remote Sensing at Tehran University
Population Demographic Tracking and Estimation Tool: A Simulation-Dashboard for Urban Redevelopments.
Shai Sussman Technion - Israel Institute of Technology Talk 30 Minutes Tuesday, June 21: 15:30 - 16:00
Simulation and an online dashboard tool to analyze population changes as a function of time and predict population as a function of speculated development scenario within the built environment.
Population Demographic Tracking and Estimation Tool: A Simulation-Dashboard for Urban Redevelopments.
Shai Sussman Talk 30 Minutes
Simulation and an online dashboard tool to analyze population changes as a function of time and predict population as a function of speculated development scenario within the built environment.
Our research proposes a methodology and an online dashboard to analyze population changes as a function of time, in the context of massive redevelopment within the existing urban fabric.
The simulation tool receives a script of speculated urban development over time, current household information, and interaction rules. With these components, the simulation of future urban development suggests the population composition from a micro-point of view. The simulation output is a population track-change CSV file that describes the properties of the existing population and that of newcomers at any given time. The CSV file is then fed into an online dashboard that presents the population composition as a function of the urban development. The co-occurrences of environment and population change adds a level of complexity to the simulation, raising questions of population resilience to environmental change. To cope with these questions, we have created a synthetics population that both statistically resembles the actual population and generates a new population when needed. We defined characteristics of the population such as age, income, and ownership, and finally defined interactions between the population characteristics and the changing environment.
We have developed the tool based on the planning data obtained from the municipality of Bat-Yam, Israel. The tool can be applied online and connected to current planning tools using simple GIS files, spreadsheets, or other database sources.
About Shai Sussman
Eng. Shai Sussman (M.Sc., Technion) is the head of the Smart Social Strategy lab at Haifa Israel. He is a Civil Engineer in Geoinformation and recently graduated a master's program in Urban Planning at the Israel Institute of Technology (Technion) under the supervision of Dr. Meirav Aharon Gutman. As part of his thesis, he was involved in joint venture between Cornell-Tech and the Technion, where he developed a time based spatial-microsimulation platform, that would simulate demographic behavior in an urban renewal context. Today, as part of his current role, Shai is the main researcher and developer at the Smart Social Strategy Lab which involves the development of the digital twin and supplementary applications, as well as the interactive immersive theater where decision makers can come, touch, and see the data from the digital twin.
Predicting urban heat islands in Calgary
Sumedh Ghatage, Anand S Gramener Online Talk 45 Minutes Tuesday, June 21: 16:30 - 17:15
Leveraging geospatial Python libraries to understand and predict Land Surface Temperature in urban areas considering historical openly available satellite images and urban morphological data.
Predicting urban heat islands in Calgary
Sumedh Ghatage, Anand S Online Talk 45 Minutes
Leveraging geospatial Python libraries to understand and predict Land Surface Temperature in urban areas considering historical openly available satellite images and urban morphological data.
Dealing with extreme heatwaves can be challenging, it has become the necessity to understand the land surface temperature (LST) change and its driving factors to reduce the impact and achieve more sustainable planning methods for city growth.
This module will help you understand how to calculate LST from the openly available satellite imageries and merge it with urban morphological factors (like building height, building count, FSI, building block coverage, etc.) to predict the temperature trend and mitigate the impact.
We will demonstrate an end-to-end methodology using geospatial Python libraries to understand the use of spatial regression methods taking into account the variation over time. This talk will also throw light upon:
Getting the large imagery datasets into DL friendly format
Spatial aggregation of different variables
Understanding correlation between variables for feature engineering
Application & comparison of different regression methods on the same data
Future scope
We'll also showcase the geo-visualization portal we created and the technologies used, how you can use Python to convert large GeoJSON output to light vector tiles, and create a seamless experience for the user through an intuitive front-end.
About Sumedh Ghatage
Sumedh Ghatage is an Associate Lead Data Scientist (Geospatial) at Gramener. He has worked on various smart city initiatives including sectors such as environmental resource management, location intelligence, and disaster management projects.
He drives a community called “Geospatial Awareness Hub” which helps enable Education, Employment, and Business to foster the growth and awareness of the Geospatial Industry.
About Anand S
Anand is a co-founder of Gramener, a data science company. He leads a team that automates insights from data and narrates these as visual data stories. He is recognized as one of India's top 10 data scientists, and is a regular TEDx speaker.
Anand is a gold medalist at IIM Bangalore and an alumnus of IIT Madras, London Business School, IBM, Infosys, Lehman Brothers, and BCG.
More importantly, he has hand-transcribed every Calvin & Hobbes strip ever and dreams of watching every film on the IMDb Top 250.
He blogs at https://s-anand.net. His talks are at https://bit.ly/anandtalks
Python static type checking with mypy
Michal Gutowski Threatray Talk 30 Minutes Monday, June 20: 10:45 - 11:15
Add another layer of safety to your codebase with static typing.
Python static type checking with mypy
Michal Gutowski Talk 30 Minutes
Add another layer of safety to your codebase with static typing.
Python is a dynamic language, which gives its users a lot of power. But, as we know, with great power comes great responsibility. Fortunately for us, we can incorporate a tool in our workflows - Mypy. Mypy allows developers to add a layer of safety in their programs - static type annotations.
During my talk, I will show you why PEP484 type annotations can be helpful, how to check them, and gradually introduce them in your codebase. Additionally, I want to show some tricks to make the dynamic parts safer. Finally, I will also show situations where Mypy falls short and how to avoid them.
About Michal Gutowski
My professional coding career started in 2014 in Ruby. Since that time, I have managed to work in Scala, Java, and Python. I am interested in security and open-source intelligence. In addition, I love to have correct and well-tested software running on production.
QGreenland: automated QGIS data package creation for Greenland
Trey Stafford National Snow and Ice Data Center (NSIDC) Online Talk 30 Minutes Monday, June 20: 14:00 - 14:30
QGreenland is a free and open-source Greenland-focused QGIS environment for data analysis and visualization. Built using Python and open source geospatial tools like GDAL, QGreenland's software offers automated, reproducible builds to ensure consistent outputs with metadata and provenance for all included datasets.
QGreenland: automated QGIS data package creation for Greenland
Trey Stafford Online Talk 30 Minutes
QGreenland is a free and open-source Greenland-focused QGIS environment for data analysis and visualization. Built using Python and open source geospatial tools like GDAL, QGreenland's software offers automated, reproducible builds to ensure consistent outputs with metadata and provenance for all included datasets.
QGreenland is a free and open-source Greenland-focused QGIS environment for data analysis and visualization. Originally inspired by Quantarctica, a similar QGIS data environment focused on Antarctica, QGreenland formalizes the process of data package creation with a framework built using Python and open source geospatial tools like GDAL. QGreenland's software (available on GitHub) automates the process of fetching data from a variety of public sources, transforming those data into optimized formats and projection, and producing a downloadable package with documentation and metadata. Users then access data via a single project file. Because QGreenland's tooling is open source and provenance is maintained for all data operations, QGreenland provides reproducible and transparent outputs suitable for education, field use, and scientific research. Moreover, because the QGreenland data package includes a pre-configured QGIS project file with data organized by discipline (e.g., "Glaciology", "Geophysics", etc), QGreenland is accessible, and is further supported with extensive how-to, tutorial, and curriculum resources.
This talk will discuss the idea behind QGreenland, the development process, challenges encountered along the way, lessons-learned, and what the future holds for the project.
Quantifying agricultural soil carbon stocks at continent scale using a modern Python big data and ML framework
David Schurman, Julia Maddalena Cloud Agronomics Online Talk 30 Minutes Monday, June 20: 16:00 - 16:30
We have developed a Python-based modeling and spatial prediction framework to accurately estimate soil carbon content at large geographic scales. These methods can provide cost-effective carbon accounting for regenerative farming operations.
Quantifying agricultural soil carbon stocks at continent scale using a modern Python big data and ML framework
David Schurman, Julia Maddalena Online Talk 30 Minutes
We have developed a Python-based modeling and spatial prediction framework to accurately estimate soil carbon content at large geographic scales. These methods can provide cost-effective carbon accounting for regenerative farming operations.
Adopting regenerative management of farmland has the potential to sequester large amounts of atmospheric carbon in the soil while generating income for growers through the sale of carbon credits. However, a major obstacle is the absence of an accurate, scalable method for verifying the amount of carbon sequestered. Traditional verification methods rely solely on the collection of in-situ soil samples or farm-specific data which is expensive and labor-intensive.
To address this we have developed a spatial modeling framework which leverages large quantities of geophysical and remotely sensed data to accurately estimate the soil carbon stocks across the United States. We leverage a geospatial asset catalog and distributed Python pipeline to curate 20+ data sources, including optical remote sensing, climatological, and geological features which are relevant to soil carbon. Data are joined with thousands of in-situ soil samples collected from 12 US states, and models are trained using Python’s xgboost library. Predictions are converted to the quantity of interest, soil carbon stock, on a site-by-site basis using a Python-based pipeline orchestrated via Apache Airflow. Once validated, the model can be applied with no further in-situ sampling, enabling it to be scaled to large geographic areas. As a result, we have used this framework to create the highest-ever soil organic carbon map spanning the entire continental USA. This talk will discuss the geospatial data architecture, predictive model architecture, and scaled deployment of the model, as well as background on the emerging field of carbon sequestration estimation.
About David Schurman
David is a computational scientist, information designer, and entrepreneur specializing in Earth and planetary sciences applications. He is the co-founder and Chief Innovation officer of Cloud Agronomics, a remote sensing and climate-tech company developing a global measurement platform for soil-based carbon credits. Prior to Cloud Agronomics, David was a software lead at NASA’s Jet Propulsion Laboratory, where he led incubation for a data science and UI toolset currently being used to search for life on Mars by the Perseverance Rover science team.
Road risk analysis with Google Cloud serverless tools
From a real work done for an Oil & Gas company, a modern data pipeline analysis of vehicles tracking and monitoring data made using Google Cloud serverless tools for ETL, cleaning, storing and data visualization.
Road risk analysis with Google Cloud serverless tools
Nicola Guglielmi Online Talk 30 Minutes
From a real work done for an Oil & Gas company, a modern data pipeline analysis of vehicles tracking and monitoring data made using Google Cloud serverless tools for ETL, cleaning, storing and data visualization.
From a real work for one of the biggest Oil & Gas company, a road risk mitigation analysis in compliance with OGP land transport standard.
We will see hot o get data from onboard devices with some python code snippet, upload the data to a cloud storage, GCS, clean and prepare the data with Dataprep and Dataflow, store the data on BigQuery for on fly analytics, visualize the data with the online GIS visualizer and create dynamic dashboard on Google Datastudio.
Python web frameworks, like FastAPI, Flask, Quartz, Tornado, and Twisted, are important for writing high-performance web applications and for their contributions to the web ecosystem. However, even they posit some bottlenecks either due to their synchronous nature or due to the usage of python runtime. Most of them don’t have the ability to speed themselves due to their dependence on *SGIs. This is where Robyn comes in. Robyn tries to achieve near-native Rust throughput along with the benefit of writing code in Python. In this talk, we will learn more about Robyn. From what is Robyn to the development in Robyn.
Robyn: An async web framework written in Rust
Sanskar Talk 30 Minutes
Python web frameworks, like FastAPI, Flask, Quartz, Tornado, and Twisted, are important for writing high-performance web applications and for their contributions to the web ecosystem. However, even they posit some bottlenecks either due to their synchronous nature or due to the usage of python runtime. Most of them don’t have the ability to speed themselves due to their dependence on *SGIs. This is where Robyn comes in. Robyn tries to achieve near-native Rust throughput along with the benefit of writing code in Python. In this talk, we will learn more about Robyn. From what is Robyn to the development in Robyn.
With the effort put in at every Python version to increase the runtime performance, we know that throughput efficiency is one of the top priority items in the Python ecosystem.
Inspired by the extensibility and ease of use of the Python Web ecosystem and the increased focus on performance, Robyn was born.
Robyn is one of the fastest, if not the fastest Python web framework in the current Python web ecosystem. With a runtime written in Rust, Robyn tries to achieve near-native rust performance while still having the ease of writing Python code.
This talk will demonstrate the reasons why Robyn was created, the technical decisions behind Robyn, the increased performance by using the rust runtime, how to use Robyn to develop web apps, and most importantly, how the community is helping Robyn grow!
Finally, I will be sharing the future plans of Robyn and would love to get feedback from the developers to see what they would like to see in it.
About Sanskar
Sanskar is a Software Engineer at Bloomberg, London during the day and a FOSS maintainer during the night. He is the author and maintainer of Robyn, which is one of the faster web frameworks in the Python ecosystem.
Sanskar loves attending, speaking and organising conferences and has been an active part of various Open Source and Python conferences.
Similarity Metrics from Vegetation Index Time Series
In this talk we present vegetation index time series similarity metrics for crop type classification. The use of such metrics instead of the raw satellite observations not only reduces inter-class confusion, but also helps to reduce the dimensionality and thus, ensure model transferability.
Similarity Metrics from Vegetation Index Time Series
Dimo Dimov Talk 30 Minutes
In this talk we present vegetation index time series similarity metrics for crop type classification. The use of such metrics instead of the raw satellite observations not only reduces inter-class confusion, but also helps to reduce the dimensionality and thus, ensure model transferability.
One of the greatest challenges in many machine learning applications is to build robust classification models that reduce calibration efforts, ensure model transferability to unseen data distributions and handle high semantic resolution datasets with a relatively large number of target classes. In this paper we present an operational domain adaptation framework for crop type verification from Remote Sensing data for Integrated Administration and Control and Farm Management Systems. It is part of the ag|knowledge crop monitoring platform and demonstrates a more robust spatio-temporal model transferability than conventional supervised crop type classification approaches which use multitemporal and multispectral features to classify the specific characteristics of crop phenology patterns. The proposed method is based on a a machine learning model that has been trained on particular similarity metrics between all target crop types. The metrics are derived by quantifying the similarity of vegetation index time series between the observed parcel and the aggregated time series of reference objects for each parcel and crop type. This method achieves higher classification results than a model trained on the pure Remote Sensing time series data, as the algorithm does not learn the temporal vegetation index patterns, but instead, the adversarial characteristics and the differences between each crop type. Moreover, it also reduces dimensionality as the time series are summarized through the respective similarity metrics. We compare both approaches through different classification algorithms. Overall, the achieved classification accuracy for more 67 labeled crop and agricultural management types scores more than 80%.
State of GeoPandas ecosystem
Joris Van den Bossche, Martin Fleischmann Talk 30 Minutes Tuesday, June 21: 09:15 - 09:45
GeoPandas is one of the core packages in the Python ecosystem to work with geospatial vector data. This talk will give an overview of recent developments in GeoPandas and the broader ecosystem.
State of GeoPandas ecosystem
Joris Van den Bossche, Martin Fleischmann Talk 30 Minutes
GeoPandas is one of the core packages in the Python ecosystem to work with geospatial vector data. This talk will give an overview of recent developments in GeoPandas and the broader ecosystem.
GeoPandas is one of the core packages in the Python ecosystem to work with geospatial vector data. By combining the power of several open source geo tools (GEOS/Shapely, GDAL/fiona, PROJ/pyproj) and extending the pandas data analysis library to work with geographic objects, it is designed to make working with geospatial data in Python easier.
This talk will give an overview of recent developments in the GeoPandas community, both in the project itself as in the broader ecosystem of packages on which GeoPandas depends or that extend GeoPandas. We will highlight some changes and new features in recent GeoPandas versions, such as the new interactive explore() visualisation method, improvements in joining based on proximity, better IO options for PostGIS and Apache Parquet and Feather files, and others. But some of the important improvements coming to GeoPandas are happening in other packages. The Shapely 2.0 release is nearing completion, and will provide fast vectorized versions of all its geospatial functionalities. This will help to substantially improve the performance of GeoPandas. In the area of reading and writing traditional GIS files using GDAL, the pyogrio package is being developed to provide a speed-up on that front. Another new project is dask-geopandas, which is merging the geospatial capabilities of GeoPandas with the scalability of Dask. This way, we can achieve parallel and distributed geospatial operations.
About Joris Van den Bossche
I am a core contributor to Pandas and Apache Arrow and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research, worked at the Paris-Saclay Center for Data Science, and currently, I am a freelance software developer and teacher and working for Voltron Data.
About Martin Fleischmann
Martin is a Research Associate in the Geographic Data Science Lab at the University of Liverpool. He is researcher in urban morphology and geographic data science focusing on quantitative analysis and classification of urban form, remote sensing, and AI.
He is the author of momepy, the open source urban morphology measuring toolkit for Python, and a member of the development teams of GeoPandas, the open source Python package for geographic data, and PySAL, the Python library for spatial analysis.
Teaching GeoPython in a Geo-information Master Programme
Barend Köbben University Twente - ITC Online Talk 30 Minutes Monday, June 20: 15:30 - 16:00
At ITC-University Twente we have been educating geo-professionals for more than 70 years. Nowadays, we try to create problems solvers, not button-pushers, so we teach them GeoComputing using Python. In this talk we explain how.
Teaching GeoPython in a Geo-information Master Programme
Barend Köbben Online Talk 30 Minutes
At ITC-University Twente we have been educating geo-professionals for more than 70 years. Nowadays, we try to create problems solvers, not button-pushers, so we teach them GeoComputing using Python. In this talk we explain how.
At ITC, the Faculty of Geo–Information Science and Earth Observation of the University of Twente, we have been educating geo-professionals for more than 70 years. We believe our students need a thorough knowledge of the processes required to solve geospatial problems. Teaching them off-the-shelf GIS tools will create button-pushers, whereas this age needs problem solvers. The only way to accomplish that is learning how to design, develop and implement your own solutions. Therefore, in our Geoinformatics curriculum, we have created a module called “Scientific GeoComputing”, where the main language used is Python.
In this talk we will introduce the educational setup of our course: the 4 main building blocks (General Programming, Algorithmics, Visualisation & Web development and GeoComputing), and how we take the students from absolute beginners with no coding experience to become confident users of Python and associated coding tool to solve geospatial problems. We realise that the field of geo-computing is wide and offers a gazillion ways for using computers in smart ways. We do know we cannot teach our students everything about computing and coding in a 7 EC course. So we focus on enabling them to develop their capacity to explore, develop and find out things themselves independently.
About Barend Köbben
I am Senior Lecturer in GIS and cartographic visualisation in the Department of Geo-Information Processing (GIP) of the ITC.
My teaching subjects include Cartographic Theory, WebCartography and WebGIS, Geo-webservices, web application building and 3D visualisation.
I participate in the research activities of the departemental Research Theme STAMP (Spatio–Temporal Analytics, Maps and Processing).
The Silence of Global Oceans: Acoustic Impact of the COVID-19 Lockdowns
Artash Nath Online Talk 30 Minutes
The onset of the COVID-19 pandemic in early 2020 brought an unexpected "anthropause". Border closures, travel restrictions, and economic slowdown meant a hiatus in commercial shipping, offshore energy exploration, and ocean tourism. It provided a rare research opportunity to investigate the time-series relationship between anthropogenic activities and ambient noise levels in oceans using Python and open data.
The Silence of Global Oceans: Acoustic Impact of the COVID-19 Lockdowns
Artash Nath Online Talk 30 Minutes
The onset of the COVID-19 pandemic in early 2020 brought an unexpected "anthropause". Border closures, travel restrictions, and economic slowdown meant a hiatus in commercial shipping, offshore energy exploration, and ocean tourism. It provided a rare research opportunity to investigate the time-series relationship between anthropogenic activities and ambient noise levels in oceans using Python and open data.
The ocean soundscape is noisy, merging geophony (natural noises of the earth), biophony (sounds of marine life), and anthrophony (sounds of human activities). Steep rise in trade, globalization, and opening up of new year-round shipping routes in the Arctic because of thinning of ice due to climate change means anthrophony is rising. Low-frequency sound from maritime shipping is a major source of ambient underwater noise in global oceans and a threat to marine life. To test the assumption that COVID-19 restrictions would have decreased underwater ambient noise levels in lower frequency bands (1 kHz) in early 2020, hydrophone data were collected from 8 sites in the Atlantic, Arctic, Pacific,
the Mediterranean, and the North Sea. The classification of primary ambient noise sources in oceans was done using the Wenz curves. Power spectral densities were calculated before and after the lockdown, and ambient noise levels were compared at one-third octave bands centered around 63 Hz, 125 Hz, and 1 kHz. Noise contribution from ocean winds was eliminated by limiting the comparison to days with the same levels on the Beaufort scale – so that the resulting differences were likely due to the variability in the anthropogenic noise.
About Artash Nath
I am a RISE 100 Global Fellow, Schmidt Futures and Rhodes Trust. I work on artificial intelligence, space robotics, quantum computing, and big data. I like solving inter-generational challenges related to space, oceans, seismology, and pandemics. My recent projects include, (i) Using seismic data to measure the effectiveness of COVID-19 lockdowns in real-time, (ii) Developing machine learning algorithms to determine exoplanetary atmospheres using machine learning, and (iii) Modeling how Quadruple Pendulums are used by LIGO to aid the discovery of Gravitational Waves.
I use 'hackathons' as opportunities to learn new skills and get the experience of collaborating with motivated people around the world. I am also a maker. Having built 25+ robots, rockets, rovers, battle bots, and machine learning (ML) projects, my work has been displayed at the Ontario Science Center, the Toronto International Film Festival, MakerFests and have received citations from NASA, the Canadian Space Agency, the European Space Agency's ARIEL Space Mission, the Royal Astronomical Society of Canada, and the Natural Sciences and Engineering Research Council of Canada.
Track and Curtail Carbon Footprint of your Python Code with CodeCarbon
Anmol Krishan Sachdeva Google Online Talk 45 Minutes Monday, June 20: 16:30 - 17:15
With the recent advancements in the field of AI and High Performance Computing, more organizations have started heavily investing in ML/AI research using advanced processors and humongous amount of data. Enormous amount of energy is consumed during the training process, which leads to emission of harmful greenhouse gases like Carbon Dioxide.
Python being one of the most widely used programming languages for ML/AI development, this talk focuses on educating the Python Community on how to track and reduce CO2 emissions of Python Code using CodeCarbon.
Track and Curtail Carbon Footprint of your Python Code with CodeCarbon
Anmol Krishan Sachdeva Online Talk 45 Minutes
With the recent advancements in the field of AI and High Performance Computing, more organizations have started heavily investing in ML/AI research using advanced processors and humongous amount of data. Enormous amount of energy is consumed during the training process, which leads to emission of harmful greenhouse gases like Carbon Dioxide.
Python being one of the most widely used programming languages for ML/AI development, this talk focuses on educating the Python Community on how to track and reduce CO2 emissions of Python Code using CodeCarbon.
Short Description:
Artificial Intelligence has a lot of benefits to offer to the society, but at what cost? With the recent advancements in the field of AI and High Performance Computing, more organizations have started heavily investing in ML/AI research using advanced processors and humongous amount of data. Enormous amount of energy is consumed during the training process, which leads to emission of harmful greenhouse gases like Carbon Dioxide.
Python being one of the most widely used programming languages for ML/AI development, this talk focuses on educating the Python Community on how to track and reduce CO2 emissions of Python Code using CodeCarbon.
Pre-requisites:
Familiarity with programming in Python.
Talk Level:
Suitable for all audiences and levels.
Be it that you build on private infrastructure or public cloud infrastructure, this talk has something for you.
Agenda of the talk:
Understanding Carbon Footprint
Carbon Emissions and Code
Understanding CodeCarbon Package
Measuring CO2 Equivalents of Python Code
CO2 Emission Visualization and Reporting
Q/A
About Anmol Krishan Sachdeva
Anmol Krishan Sachdeva, aka "greatdevaks", is an International Tech Speaker, a Distinguished Guest Lecturer, a Tech Panelist, and has represented India at several reputed International Hackathons. He is a Deep Learning Researcher and has about 8 publications in different domains.
He is an active conference organizer and previously has helped organize some of the most prestigious conferences like EuroPython, GeoPython & Python Machine Learning Conference, PyCon India, etc., and all of them were a huge success. He has done MSc in Advanced Computing (ML, AI, Robotics, Cloud Computing, Human Computer Interaction, and Computational Neuroscience) from University of Bristol, United Kingdom, and currently works at Google as a Hybrid Cloud Architect.
In the past, Anmol has spoken at renowned conferences and tech forums like KubeCon, PyCon, EuroPython, GeoPython, and got invited as a Chief Guest / Guest of Honour at various events.
He likes innovating, keeping in touch with new technological trends, and mentoring people. Additionally, his interest lies in Cosmology and Neuroscience.
Who Said Wrangling Geospatial Data at Scale was Easy?
In this talk, I’ll briefly introduce the various modes in which geospatial data comes. I’ll also focus on the most efficient ways to condense large amounts of geospatial data into analyzable chunks, to speed up data processing and analysis.
Who Said Wrangling Geospatial Data at Scale was Easy?
Brendan Collins Online Talk 30 Minutes
In this talk, I’ll briefly introduce the various modes in which geospatial data comes. I’ll also focus on the most efficient ways to condense large amounts of geospatial data into analyzable chunks, to speed up data processing and analysis.
If you have ever worked with Census Data, you may be recalling nightmares of hours spent staring at the data and finding it impossible to download, store or convert to a sensible format to begin your analysis. And Census Data is not even unstructured data!
Geospatial Data comes in various formats - GeoJSON, Parquet, Shapefile, GeoTIFF, GeoPackage, etc. But what are the most efficient ways to convert the data into formats that are easy to understand, work with, transfer, and ultimately analyze? Then throw in petabytes worth of data and you hit the challenge of wrangling geospatial data at scale.
This talk will walk through some of the best ways to handle geospatial data at scale, with a focus on:
The xarray-spatial library for raster-based spatial analysis.
The RTXpy library for GPU-powered spatial analysis.
Microsoft Planetary Computer examples of geospatial data processing.
About Brendan Collins
Brendan Collins is an expert in data science and geospatial technology and is Founder and Principal of makepath, a spatial data science firm in Austin, Texas. With over 15 years of experience in GIS related to forestry and conservation, he has worked with Anaconda, Blue Raster, The Nature Conservancy, World Resources Institute, New Forests, World Wildlife Fund, Bat Conservation International, NASA, and Samsung.
As an active contributor on several open source projects, he is a core developer on Datashader, Bokeh, and most recently created the xarray-spatial library for large scale spatial analysis.
Brendan earned degrees in Environmental Policy and Latin American Studies from Tulane University and was recognized for acclaimed custom software development while completing his Master's Degree in Geographic Information Science at Pennsylvania State University.
pointcloudset - Efficient analysis of large datasets of point clouds recorded over time
Thomas Gölles Virtual Vehicle Research GmbH Talk 30 Minutes Tuesday, June 21: 09:45 - 10:15
A Python package to analyze and visualize 3D point cloud time series.
pointcloudset - Efficient analysis of large datasets of point clouds recorded over time
Thomas Gölles Talk 30 Minutes
A Python package to analyze and visualize 3D point cloud time series.
Pointcloudset is a unique package to work with point cloud datasets. It is designed for post-processing, analytics and visualization of point clouds from automotive lidar, terrestrial laser scanners, RGB-D (red, green, blue, depth) cameras, photogrammetry and more. The package is based on pandas, open3D, pyntcloud and dask which allows parallel processing of large datasets. The high level API makes it easy to get started.
Common use cases for the package are:
post processing and analytics of lidar datasets recorded by the robot operating system (ROS).
collect multiple laser scans into one dataset
develop a processing algorithm and apply it to large datasets and analyze the results
During the talk I will present datasets and workflows from automotive lidar (Ouster OS1-64) and terrestrial laser scanners (Riegl VZ-6000).
About Thomas Gölles
I am a glaciologist and currently work mostly with lidar data from automotive lidar and on a project for avalanche detection on SAR data.
Currently I am employed as a Senior Researcher at Virtual Vehicle Research & Scientist at the University of Graz.
pygeofilter: geospatial filtering made easy
Fabian Schindler EOX IT Services GmbH Talk 30 Minutes Tuesday, June 21: 11:45 - 12:15
pygeofilter helps integrating geospatial filters in any Python application. Batteries included.
pygeofilter: geospatial filtering made easy
Fabian Schindler Talk 30 Minutes
pygeofilter helps integrating geospatial filters in any Python application. Batteries included.
Abstract
pygeofilter is a library to support the integration of geospatial filters. It is split into frontend language parsers (CQL 1 + 2 text/JSON, JFE, FES) , a common Abstract Syntax Tree (AST) representation and several backends (database systems) where the parsed filters can be integrated into queries.
Parsers
Currently pygeofilter supports CQL 1, CQL 2 in both text and JSON encoding, OGC filter encoding specification (FES) and JSON filter expressions (JFE) as input languages. Additionally pygeofilter provides utilities to help create parsers for new filter languages.
The filters are parsed to an AST representation, which is a common denominator across all filter capabilities including logical and arithmetic operators, geospatial comparisons, temporal filters and property lookups. An AST can also be easily created via the API, if necessary.
Backends
pygeofilter provides several backends and helpers to roll your own. Built-in backends are for Django, SQLAlchemy, raw SQL, (Geo)Pandas dataframes, and native Python lists of dicts or objects.
Hands on: build and deploy a geospatial web-application using Greppo, an open-source Python framework. Without any frontend, backend and web-dev experience.
Build and deploy a geospatial web-application
Adithya Krishnan Online Workshop 90 Minutes
Hands on: build and deploy a geospatial web-application using Greppo, an open-source Python framework. Without any frontend, backend and web-dev experience.
The workshop will guide the attendees to build and deploy an end-to-end responsive web-application completely in Python. The attendees will make use of open-source libraries such as Greppo, Geopandas, Rasterio. They will build a web-application for 2 different use-cases involving vector and raster data respectively. They will also learn to deploy the web-application using Docker.
About Adithya Krishnan
I am Adithya, scientist, full-stack developer and founder of Greppo. I get my hands dirty with Python, JS, HTML and CSS. I like dabbling with Vue, TailwindCSS, Flask, Starlette and FastAPI. I am interested in working on web-applications, cloud-utility projects, data-science and AI/ML projects. I have a PhD in (waste)water management and my expertise GIS, Geospatial data, remote-sensing, process (chemical) systems engineering.
How to use GeoDjango to create location-based service
Geographic web applications on Django framework
Maxim Danilov Workshop 120 Minutes
How to use GeoDjango to create location-based service
GeoDjango is an included contrib module for Django that turns it into a world-class geographic web framework. GeoDjango strives to make it as simple as possible to create geographic web applications.
About Maxim Danilov
I am more than 20 years in development. Start was with RISC assemblers after it i switch myself to python/Django/VueJs through C, VB, PHP, Jquery...
Owner of "wP soft" GmbH, developer of winePad.at and wPshop.at
Pangeo Forge: Crowdsourcing Open Data in the Cloud
Charles Stern, Ryan Abernathey Columbia University Online Workshop 90 Minutes Wednesday, June 22: 14:00 - 15:30
Pangeo Forge is a new open-source platform that aims to make it easy to extract data from traditional data repositories and deposit it in cloud storage in analysis-ready, cloud-optimized (ARCO) formats. This workshop will teach users how to use Pangeo Forge and contribute to the growing, community driven data library.
Pangeo Forge: Crowdsourcing Open Data in the Cloud
Charles Stern, Ryan Abernathey Online Workshop 90 Minutes
Pangeo Forge is a new open-source platform that aims to make it easy to extract data from traditional data repositories and deposit it in cloud storage in analysis-ready, cloud-optimized (ARCO) formats. This workshop will teach users how to use Pangeo Forge and contribute to the growing, community driven data library.
Geospatial datacubes--large, complex, interrelated multidimensional arrays with rich metadata--arise in analysis-ready geopspatial imagery, level 3/4 satellite products, and especially in ocean / weather / climate simulations and [re]analyses, where they can reach Petabytes in size. The scientific python community has developed a powerful stack for flexible, high-performance analytics of databcubes in the cloud. Xarray provides a core data model and API for analysis of such multidimensional array data. Combined with Zarr or TileDB for efficient storage in object stores (e.g. S3) and Dask for scaling out compute, these tools allow organizations to deploy analytics and machine learning solutions for both exploratory research and production in any cloud platform. Within the geosciences, the Pangeo open science community has advanced this architecture as the “Pangeo platform” (http://pangeo.io/)
However, there is a major barrier preventing the community from easily transitioning to this cloud-native way of working: the difficulty of bringing existing data into the cloud in analysis-ready, cloud-optimized (ARCO) format. Typical workflows for moving data to the cloud currently consist of either bulk transfers of files into object storage (with a major performance penalty on subsequent analytics) or bespoke, case-by-case conversions to cloud optimized formats such as TileDB or Zarr. The high cost of this toil is preventing the scientific community from realizing the full benefits of cloud computing. More generally, the outputs of the toil of preparing scientific data for efficient analysis are rarely shared in an open, collaborative way.
To address these challenges, we are building Pangeo Forge (https://pangeo-forge.org/), the first open-source cloud-native ETL (extract / transform / load) platform focused on multidimensional scientific data. Pangeo Forge consists of two main elements. An open-source python package--pangeo_forge_recipes (https://github.com/pangeo-forge/pangeo-forge-recipes)–makes it simple for users to define “recipes” for extracting many individual files, combining them along arbitrary dimensions, and depositing ARCO datasets into object storage. These recipes can be “compiled” to run on many different distributed execution engines, including Dask, Prefect, and Apache Beam. The second element of Pangeo Forge is an orchestration backend which integrates tightly with GitHub as a continuous-integration-style service.
We are using Pangeo Forge to populate a multi-petabyte-scale shared library of open-access, analysis-ready, cloud-optimized ocean, weather, and climate data spread across a global federation of public cloud storage–not a “data lake” but a “data ocean”. Inspired directly by the success of Conda Forge, we aim to leverage the enthusiasm of the open science community to turn data preparation and cleaning from a private chore into a shared, collaborative activity. By only creating ARCO datasets via version-controlled recipe feedstocks (GitHub repos), we also maintain perfect provenance tracking for all data in the library.
You will leave this workshop with a clear understanding of how to access this data library, craft your own Pangeo Forge recipe, and become a contributor to our growing collection of community-sourced recipes.
About Charles Stern
Charles is a Data Infrastructure Engineer at Columbia University's Lamont-Doherty Earth Observatory (LDEO) focusing on Pangeo Forge. He is endlessly curious about elegant, open-source tools that help us understand our changing planet. Charles loves exploring in the mountains and tinkering with anything electronic or mechanical.
About Ryan Abernathey
Ryan P. Abernathey, an Associate Professor of Earth And Environmental Science at Columbia University and Lamont Doherty Earth Observatory, is a physical oceanographer who studies large-scale ocean circulation and its relationship with Earth's climate. He received his Ph.D. from MIT in 2012 and did a postdoc at Scripps Institution of Oceanography. He is a member of the NASA Surface Water and Ocean Topography (SWOT) science team and co-founder of the Pangeo open science community. Prof. Abernathey is an active participant in and advocate for open source software, open data, and reproducible science. He is a core developer for the Python packages Xarray and Zarr and contributor to many others.
Scaling up vector analysis with Dask-GeoPandas
Joris Van den Bossche, Martin Fleischmann Workshop 120 Minutes Wednesday, June 22: 11:00 - 13:00
This workshop introduces the Dask-GeoPandas library and walks you through its key components, allowing you to take a GeoPandas workflow and run it in parallel, out-of-core and even distributed on a remote cluster.
Scaling up vector analysis with Dask-GeoPandas
Joris Van den Bossche, Martin Fleischmann Workshop 120 Minutes
This workshop introduces the Dask-GeoPandas library and walks you through its key components, allowing you to take a GeoPandas workflow and run it in parallel, out-of-core and even distributed on a remote cluster.
The geospatial Python ecosystem provides a nice set of tools for working with vector data, including Shapely for geometry operations and GeoPandas to work with tabular data (and many other packages for IO, visualization, domain specific processing, …). One of the limitations of those core tools is a sub-optimal performance and limited scaling possibilities.
The PyData ecosystem is increasingly embracing Dask as a tool of choice when the scale of the task goes beyond capabilities of Pandas or Numpy. Over the last years, effort has been put in improving the performance through vectorized interfaces to GEOS, the underlying C library of Shapely. In turn, that enables releasing the GIL and makes the Dask - GeoPandas combination more interesting.
Since GeoPandas is an extension to the pandas DataFrame, the same way how Dask scales pandas can be applied on GeoPandas as well. Initial effort to build a bridge between Dask and GeoPandas is currently taking the shape of the dask-geopandas library.
This workshop provides an introduction to dask-geopandas and walks you through the key aspects of the library. We will touch the specificity of vector geospatial data when it comes to parallelisation, cover use cases where dask-geopandas provides major benefits as well as those, where it currently struggles. You will learn how to turn your GeoPandas workflow to Dask-GeoPandas one and what are the rules of thumb when doing so.
Prerequisites:
basic familiarity with GeoPandas
Level:
intermediate
About Joris Van den Bossche
I am a core contributor to Pandas and Apache Arrow and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research, worked at the Paris-Saclay Center for Data Science, and currently, I am a freelance software developer and teacher and working for Voltron Data.
About Martin Fleischmann
Martin is a Research Associate in the Geographic Data Science Lab at the University of Liverpool. He is researcher in urban morphology and geographic data science focusing on quantitative analysis and classification of urban form, remote sensing, and AI.
He is the author of momepy, the open source urban morphology measuring toolkit for Python, and a member of the development teams of GeoPandas, the open source Python package for geographic data, and PySAL, the Python library for spatial analysis.
The Deconvolution of the Aggregated Data into the Fine-Scale Blocks with Pyinterpolate
Do you need high-resolution data for your machine learning, but you have only areal aggregates? Would you like to present continuous maps instead of choropleth maps? We can transform county-level data into smaller blocks with Pyinterpolate. We will learn how to perform Poisson Kriging on the areal dataset during workshops.
The Deconvolution of the Aggregated Data into the Fine-Scale Blocks with Pyinterpolate
Szymon Online Workshop 90 Minutes
Do you need high-resolution data for your machine learning, but you have only areal aggregates? Would you like to present continuous maps instead of choropleth maps? We can transform county-level data into smaller blocks with Pyinterpolate. We will learn how to perform Poisson Kriging on the areal dataset during workshops.
Choropleth maps representing areal aggregates are standard in the social sciences. We aggregate data over areas for administrative purposes and protect citizens' privacy. Unfortunately, those aggregated datasets can be misleading:
Administrative units, especially in Europe, vary significantly in shape and size,
Large units tend to be visually more important than smaller areas,
It is hard to integrate areal data into machine learning pipelines with data at a smaller and regular scale.
There is a solution for the processes that are spatially correlated and represent rates. One example is the disease incidence rate map. An incidence rate is the number of disease cases per area divided by the total population in this area and multiplied by the constant number of 100,000. Through the denominator (total population), we can divide our space into smaller blocks – in this case, the population blocks. Then we regularize the semivariogram of areal data with the population density semivariogram to obtain a final model that considers fine-scale population blocks and can predict disease rates at a smaller scale. After this transformation, we can:
show a continuous map of disease rates,
avoid problems with the visual discrepancy between different areas' sizes,
use data with better spatial resolution as an input for machine learning pipelines; for example, we can merge data with the remotely sensed information.
We will learn how to transform areal aggregates into smaller blocks during workshops. We will use the Pyinterpole package. We will discuss the most dangerous modeling pitfalls and what can be done with the output data. If you are an expert in the economy, social sciences, public health, or similar fields, this workshop is for you.
Pyinterpolate is a Python package for spatial interpolation. It is available here: https://pypi.org/project/pyinterpolate/