Skip to content

Transform OSM (history) PBF files into GeoParquet. Enrich with OSM changeset metadata and country information.

License

Notifications You must be signed in to change notification settings

GIScience/ohsome-planet

Repository files navigation

ohsome-planet

The ohsome-planet tool transforms OSM (history) PBF files into GeoParquet format. It creates the actual OSM elements geometries for nodes, ways and relations. The tool can join information from OSM changesets such as hashtags, OSM editor or usernames. You can join country codes to every OSM element by passing a boundary dataset as additional input.

You can use the ohsome-planet data to perform a wide range of geospatial analyses, e.g. using DuckDB, GeoPandas or QGIS. Display the data directly on a map and start playing around!

Requirements

  • java 21

Build

First, clone the repository and its submodules. Then, build it with Maven.

git clone --recurse-submodules https://github.com/GIScience/ohsome-planet.git
cd ohsome-planet
./mvnw clean package -DskipTests

Run

You can download the full latest or history planet or download PBF files for smaller regions from Geofabrik.

To process a given PBF file, provide it in the --pbf parameter in the following example.

java -jar ohsome-planet-cli/target/ohsome-planet.jar contributions \
    --pbf data/karlsruhe.osh.pbf \
    --country-file data/world.csv \
    --changeset-db "jdbc:postgresql://HOST[:PORT]/changesets?user=USER&password=PASSWORD" \
    --output out-karlsruhe \
    --overwrite 

The parameters --country-file, --changeset-db, --output and --overwrite are optional. To see all available parameters, call the tool with --help parameter.

Country Data

By passing the parameter --country-file you can perform a spatial join to enrich OSM contributions with country codes. The country file should be provided in .csv format. Geometries should we represented as WKT (well-known text) string. The current version only supports POLYGON or MULTIPOLYGON geometries.

Basically, the file should look like this:

id;wkt
DEU;POLYGON ((7.954102 49.781264, 11.118164 49.781264, 11.118164 51.563412, 7.954102 51.563412, 7.954102 49.781264))
FRA;POLYGON ((1.186523 45.058001, 4.833984 45.058001, 4.833984 48.545705, 1.186523 48.545705, 1.186523 45.058001))
ITA;POLYGON ((10.766602 41.211722, 14.985352 41.211722, 14.985352 44.024422, 10.766602 44.024422, 10.766602 41.211722))

Passing this option will populate the countries attribute in the parquet files.

Changesets

By passing the parameter --changeset-db you can join OSM changeset information. It is expected that you pass the database connection as JDBC URL, e.g. jdbc:postgresql://HOST[:PORT]/changesets?user=USER&password=PASSWORD. Currently, ohsome-planet can connect to a database following the schema of ChangesetMD.

The changeset join will populate the changeset struct attribute in the parquet files with the following information:

  • closed_at
  • tags
  • hashtags
  • editor
  • numChanges

Tag Filtering

At the moment, there is only limited support for tag filtering. By passing the --include-tags parameter you can specify a comma separated list of OSM tag keys, e.g. highway,building,landuse. These tag keys will be used to filter OSM relations only. Currently, filtering for OSM nodes or ways is not implemented.

In case you have more complex tag filtering needs, please refer to the osmium documentation in order to prepare the input OSM pbf file.

We are planning to add more tag filtering options in the future.

Output Structure

When using a history PBF file, the output files are split into history and latest contributions. All contributions which are a) not deleted and b) visible in OSM at the timestamp of the extract are considered as latest. The remaining contributions, e.g. deleted or old versions, are considered as history.

out-karlruhe
├── contributions
│   ├── history
│   │   ├── node-0-163811-history.parquet
│   │   ├── ...
│   │   ├── way-0-2496473-history.parquet
│   │   ├── ...
│   │   ├── relation-0-12345-history.parquet
│   │   └── ...
│   └── latest
│       ├── node-0-163811-latest.parquet
│       ├── ...
│       ├── way-0-2496473-latest.parquet
│       ├── ...
│       ├── relation-0-12345-latest.parquet
│       └── ...
├── minorNodes (rocksdb)
└── minorWays (rocksdb)

Inspect Results

You can inspect your results easily using DuckDB.

-- list all columns
DESCRIBE FROM read_parquet('out-karlruhe/*.parquet');

-- result
┌───────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│    column_name    │                                                                                  column_type                                                                                   │  null   │   key   │ default │  extra  │
│      varcharvarcharvarcharvarcharvarcharvarchar │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ status            │ VARCHAR                                                                                                                                                                        │ YES     │ NULLNULLNULL    │
│ valid_from        │ TIMESTAMP WITH TIME ZONE                                                                                                                                                       │ YES     │ NULLNULLNULL    │
│ valid_to          │ TIMESTAMP WITH TIME ZONE                                                                                                                                                       │ YES     │ NULLNULLNULL    │
│ osm_type          │ VARCHAR                                                                                                                                                                        │ YES     │ NULLNULLNULL    │
│ osm_id            │ BIGINT                                                                                                                                                                         │ YES     │ NULLNULLNULL    │
│ osm_version       │ INTEGER                                                                                                                                                                        │ YES     │ NULLNULLNULL    │
│ osm_minor_version │ INTEGER                                                                                                                                                                        │ YES     │ NULLNULLNULL    │
│ osm_edits         │ INTEGER                                                                                                                                                                        │ YES     │ NULLNULLNULL    │
│ osm_last_edit     │ TIMESTAMP WITH TIME ZONE                                                                                                                                                       │ YES     │ NULLNULLNULL    │
│ user              │ STRUCT(id INTEGER, "name" VARCHAR)                                                                                                                                             │ YES     │ NULLNULLNULL    │
│ tags              │ MAP(VARCHAR, VARCHAR)                                                                                                                                                          │ YES     │ NULLNULLNULL    │
│ tags_before       │ MAP(VARCHAR, VARCHAR)                                                                                                                                                          │ YES     │ NULLNULLNULL    │
│ changeset         │ STRUCT(id BIGINT, created_at TIMESTAMP WITH TIME ZONE, closed_at TIMESTAMP WITH TIME ZONE, tags MAP(VARCHAR, VARCHAR), hashtags VARCHAR[], editor VARCHAR, numChanges INTEGER) │ YES     │ NULLNULLNULL    │
│ bbox              │ STRUCT(xmin DOUBLE, ymin DOUBLE, xmax DOUBLE, ymax DOUBLE)                                                                                                                     │ YES     │ NULLNULLNULL    │
│ centroid          │ STRUCT(x DOUBLE, y DOUBLE)                                                                                                                                                     │ YES     │ NULLNULLNULL    │
│ xzcode            │ STRUCT("level" INTEGER, code BIGINT)                                                                                                                                           │ YES     │ NULLNULLNULL    │
│ geometry_type     │ VARCHAR                                                                                                                                                                        │ YES     │ NULLNULLNULL    │
│ geometry          │ BLOB                                                                                                                                                                           │ YES     │ NULLNULLNULL    │
│ area              │ DOUBLE                                                                                                                                                                         │ YES     │ NULLNULLNULL    │
│ area_delta        │ DOUBLE                                                                                                                                                                         │ YES     │ NULLNULLNULL    │
│ length            │ DOUBLE                                                                                                                                                                         │ YES     │ NULLNULLNULL    │
│ length_delta      │ DOUBLE                                                                                                                                                                         │ YES     │ NULLNULLNULL    │
│ contrib_type      │ VARCHAR                                                                                                                                                                        │ YES     │ NULLNULLNULL    │
│ refs              │ BIGINT[]                                                                                                                                                                       │ YES     │ NULLNULLNULL    │
│ members           │ STRUCT("type" VARCHAR, id BIGINT, "role" VARCHAR, geometry_type VARCHAR, geometry BLOB)[]                                                                                      │ YES     │ NULLNULLNULL    │
│ countries         │ VARCHAR[]                                                                                                                                                                      │ YES     │ NULLNULLNULL    │
│ build_time        │ BIGINT                                                                                                                                                                         │ YES     │ NULLNULLNULL    │
├───────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┤
│ 27 rows                                                                                                                                                                                                                          6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Getting Started as Developer

This is a list of resources that you might want to take a look at to get a better understanding of the core concepts used for this projects. In general, you should gain some understanding of the raw OSM (history) data format and know how to build geometries from nodes, ways and relations. Furthermore, knowledge about (Geo)Parquet files is useful as well.

What is the OSM PBF File Format?

What is parquet?

What is RocksDB?

  • RocksDB is a storage engine with key/value interface, where keys and values are arbitrary byte streams. It is a C++ library. It was developed at Facebook based on LevelDB and provides backwards-compatible support for LevelDB APIs.
  • https://github.com/facebook/rocksdb/wiki

How to build OSM geometries (for multipolygons)?

Further Notes

  • For relations that consist of more than 500 members we skip MultiPolygon geometry building and fall back to GeometryCollection. Check MEMBERS_THRESHOLD in ohsome-contributions/src/main/java/org/heigit/ohsome/contributions/contrib/ContributionGeometry.java.

About

Transform OSM (history) PBF files into GeoParquet. Enrich with OSM changeset metadata and country information.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages