8 unstable releases (3 breaking)
0.5.0 | Sep 11, 2024 |
---|---|
0.4.0 | Sep 7, 2024 |
0.3.4 | Jul 24, 2024 |
0.2.0 | Jul 8, 2024 |
#40 in Geospatial
623 downloads per month
2MB
1K
SLoC
PopGIS
A blazing fast way to insert large GeoJSON, ShapeFile & OsmPBF into a PostGIS database.
Why?
Importing large datasets into a PostGIS database can take a long time and the aim of PopGIS is to optimize the performance of such operations. PopGIS is 2x faster than ogr2ogr, particularly with very large input files against remote databases. Although the performance improvement for smaller datasets may be minimal, the efficiency gains for larger datasets are considerable. For more details, go to the benchmarks section.
Installation
You can install PopGIS directly by running the Cargo install command
cargo install popgis
Usage
Below are the available commands and flags for PopGIS:
input
specifies the path to the GeoJSON or ShapeFile you'd like to insert into a PostGIS database.
uri
specifies the URI of the PostGIS database where you'd like to insert the input data.
schema
specifies the schema where the table will be created. Optional. Default is public.
table
specifies the name of the resulting table.
srid
specifies the SRID of the input data. Optional. Default is 4326.
mode
specifies the mode of the operation. Optional. Default is overwrite. Read more here.
reproject
reprojects the input data to the specified SRID. Optional.
Examples
## GeoJSON -> PostGIS ##
popgis --input spain.geojson \
--uri postgresql://my_username:my_password@localhost:5432/my_database \
--schema osm \
--table waters \
--srid 3857
## ShapeFile -> PostGIS ##
popgis -i water_polygons.shp \
-u postgresql://my_username:my_password@localhost:5432/my_database \
-s osm \
-t waters
-m overwrite
## Reproject a GeoJSON from 4326 to 3857 -> PostGIS ##
popgis --input spain.geojson \
--uri postgresql://my_username:my_password@localhost:5432/my_database \
--schema osm \
--table waters \
--srid 4326 \
--reproject 3857
popgis --input andalucia-latest.osm.pbf
--uri postgresql://my_username:my_password@localhost:5432/my_database \
--schema osm \
--table andalucia
Modes
The overwrite mode will delete existing table if name of schema/table is the same and will write into the new table. The fail mode, it ensures that if the table already exists in the database, the job will fail to prevent data loss.
Benchmarks
Although non extensive, the benchmarking shows PopGIS is twice faster than ogr2ogr. This is most noticeable with large files.
ShapeFile
file size | popgis took |
ogr2ogr took |
environment |
---|---|---|---|
1.2GB | 36sec | 1min 15sec | local PostGIS |
1.2GB | 36min | 1h 14min | virtual machine (n2-standard-4) PostGIS |
The file used for this test can be found here.
GeoJSON
file size | popgis took |
ogr2ogr took |
environment |
---|---|---|---|
103.9MB | 2sec | 5sec | local PostGIS |
103.9MB | 2min 14sec | 5min | virtual machine (n2-standard-4) PostGIS |
The file used for this test can be found here.
OsmPBF
Coming soon.
Future implementations
- Allow filtering based on a key value pair.
- Add GeoParquet support.
- From PostGIS to GeoJSON/ShapeFile.
- Reintroduce the append mode (temporarily removed in
v0.4.0
due to inconsistent results). - Examples to pipe the standard output of
what-osm-pbf
withPopGIS
as input.
Limitations
- PopGIS does not currently support nested GeoJSON properties.
- When using
osm.pbf
, use the smallest Geofabrik areas to get the best performance - try using it in conjuction withwhat-osm-pbf
CLI.
License
See LICENSE
Dependencies
~25–38MB
~516K SLoC