Introduction

Resuming Source Cooperative and Cloud Native Geospatial Series , We discussed in a previous article how to upload GeoParquet file onto Source Cooperative Platform https://tabaqat-wagtail.tabaqat.net/en/resources/blogs/unlocking-spatial-insights-a-guide-to-uploading-geoparquet-files-onto-source-cooperative-platform-for-enhanced-data-collaboration.

And we knew before how to import downloaded GeoParquet File into ArcGIS Pro after converting it to ESRI File Geodatabase as it's not a natively supported format in ArcGIS Pro https://tabaqat-wagtail.tabaqat.net/en/resources/blogs/importing-geoparquet-files-into-arcgis-pro-with-geopandas-and-pyarrow.

Now, We will go through the challenge of Importing GeoParquet File Into ArcGIS Pro by streaming GeoParquet file from the source URL.

Technical Guide

The main python libraries that we will use in our workflow are :

  • requests which is used to send HTTP requests to a specified URL.
  • fastparquet.ParquetFile which is used to read Parquet file into a Data frame.
  • pandas which provides data frames for handling and manipulating tabular data.
  • io.BytesIO which provides an in-memory binary stream; meaning that it loads the content of the streamed Parquet file into memory so that it can be processed without saving it to disk.

We will start with opening a new ArcGIS Pro Project.

Screenshot 2024-12-21 133751

Then, we will sign in to our account on Source Cooperative and navigate to the repository that contains the GeoParquet file that we want to stream using the source URL.

Screenshot 2024-12-21 133901

Then, Insert a new python notebook in ArcGIS Pro to execute the target workflow of Importing GeoParquet through streaming and converting it into File Geodatabase to be previewed and imported in ArcGIS Pro.

Screenshot 2024-12-21 133943

We will need to install the required python libraries using ArcGIS Python Command Prompt by opening it from windows Start menu, then choosing it from ArcGIS Program Package.

Clone default arcgispro-py3 environment in python command prompt using this command :

` conda create --clone arcgispro-py3 --name my_arcgis_env ` by replacing "my_arcgis_env" with the desired name of the cloned environment.

Then, activate it through this command : ` conda activate my_arcgis_env `

And Install the required libraries : requests, pandas, shapely, fastparquet, BytesIO, arcpy, json, os through this command : ` conda install <package_name> `

Some of them will be already installed by default in ArcGIS Pro environment.

1. In the inserted python notebook, import the installed libraries :

import requests

import pandas as pd

from shapely.geometry import Point

import shapely.wkb as wkb

from fastparquet import ParquetFile

from io import BytesIO

import arcpy

import json

import os

Screenshot 2024-12-21 134204

2. Copy the URL of the GeoParquet from Source Cooperative to Clipboard to stream the GeoParquet File from the shared URL by sending a request to the URL to fetch the GeoParquet data via this command :

url = "https://data.source.coop/sarahgamal/overture-places-riyadh/riyadh_places.parquet"

response = requests.get(url, stream=True)

response.raise_for_status()

parquet_data = BytesIO(response.content)

Screenshot 2024-12-21 134306

3. Read the GeoParquet file into Pandas Data frame using Fastparquet library meaning that the byte stream containing the Parquet data is loaded into BytesIO and passed to Fastparquet to read the file into a DataFrame through this command :

pf = ParquetFile(parquet_data)

df = pf.to_pandas()

Screenshot 2024-12-21 134358

The output indicates that the GeoParquet file successfully read into a Data frame.

4. Decode WKB Geometry of input data into shapely geometry which could be read in ArcGIS Pro as the geometry column of input data contains WKB data which could be decoded through this command :

df['geometry'] = df['geometry'].apply(lambda x: wkb.loads(x) if isinstance(x, bytes) else None)

Screenshot 2024-12-21 134441

5. Round geometries (coordinates) to 6 decimal places to avoid precision issues through this command :

def round_geometry(geom, precision=6):

if isinstance(geom, Point):

return Point(round(geom.x, precision), round(geom.y, precision))

return geom

df['geometry'] = df['geometry'].apply(lambda geom: round_geometry(geom))

Screenshot 2024-12-21 134514

6. Remove duplicate geometries based on rounded coordinates meaning removes duplicate rows in the DataFrame that have identical geometries through this command :

df = df.drop_duplicates(subset='geometry')

df = df[df['geometry'].notna()]

Screenshot 2024-12-21 134629

7. Save the filtered DataFrame to a feature class in a geodatabase using ArcPy through this provided script which :

  • Defines an output geodatabase and feature class path.
  • Checks if the geodatabase exists, creates it if not, and creates the feature class with specified fields.
  • Ensures that string fields do not exceed specified lengths.

Screenshot 2024-12-21 135320

The output indicates that the filtered data saved to a feature class in a generated FGDB using arcpy and the process completed successfully.

The generated feature class of Places dataset of Riyadh Region will be previewed directly on the map as shown here.

Screenshot 2024-12-21 135520

Full script is available here , you can run it one time into ArcGIS Pro Notebook by providing the input URL and the fields names of the GeoParquet file.

Conclusion

Finally, you have broken the challenge of streaming GeoParquet file URL from Source Cooperative or any other data source and loading it into ArcGIS Pro.

Congratulations