GeoKettle
Current version : 2.0 // Licence : LGPL
(see what’s new in RC1 and the final version)
Download GeoKettle on SourceForge
.
Get support and interact with GeoKettle users in the forum
of Spatialytics.
Report a bug or request for a new feature for GeoKettle in the Bug/Issue tracking system trac
.
Information and documentation on GeoKettle in the wiki
of Spatialytics.ORG.
News and information about GeoKettle here in the blog
of Spatialytics.ORG.
Stay tuned for all the GeoKettle news via Twitter
@GeoKettle and/or @SpatialyticsORG.
What is GeoKettle :
GeoKettle is a powerful, metadata-driven Spatial ETL tool dedicated to the integration of different spatial data sources for building and updating geospatial data warehouses. GeoKettle enables the Extraction of data from data sources, the Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, and the Loading of transformed data into a target DataBase Management System (DBMS) in OLTP or OLAP/SOLAP mode, GIS file or Geospatial Web Service.
GeoKettle is a spatially-enabled version of the generic ETL tool Kettle (Pentaho Data Integration). GeoKettle also benefits from Geospatial capabilities from mature, robust and well know Open Source libraries like JTS, GeoTools, deegree, OGR and, via a plugin, Sextante.

GeoKettle has been released under the LGPL.
Geospatial-specific features:
Extract data from:
- Spatial database types: PostGIS, Oracle spatial, MySQL, Microsoft SQL Server 2008*, Ingres* and IBM DB2*
- SOLAP (Spatial OLAP) system: GeoMondrian
- Geo files (data formats): Shapefile, GML, KML, OGR
- OGC Web services: Sensor Observation Service (SOS), Catalogue Web Service (CSW)
*Non native formats, can be used with some modifications.
Transformation of data:
Calculating:
- Buffers
- Centroid
- Random point on surface
- Area
- Length
- Distance
- Intersection
- Union
- Envelope
- Boundary
- Convex hull
- Difference
- Symetric difference
- Inverse geometry
Geoprocessing:
- General:
- Clip
- Clip with rectangle
- Split multiparts
- Points:
- Delaunay algorithm
- Lines:
- Polylines -> polygons
- Simplify lines
- Smooth lines
- Polylines to single segments
- Split polylines at nodes
- Split polylines with points
- Polygons:
- Simplify polygons
- Remove holes
- Polygons -> polylines
Load data into a target format:
- Spatial database loads
- Spatial data warehouse population
- Data formats: Shapefile, GML, KML, OGR
- OGC Web services: Catalogue Web Service (CSW)
Environment:
Cartographic viewer to preview your transformations, including map customization tools and basic cartographic functions.

Generic ETL Features:
Extract data from:
- 35+ database types: MySQL, PostgreSQL, Oracle, …
- Data warehouse types: Mondrian
- XML files
- XLS files
- Web services
- Xbase files (dBase, Foxpro, etc)
- File systems information
- Generated data
- MS Access files
- LDAP
- Other flat files: text files, Excel files, CSV files
Transformation of data:
- Engine based data transfer (no code generator)
- Looking up data in databases, files or memory
- Calculating
- Scripting: Javascript, SQL, RegExp
- Splitting
- Mapping
- Selecting
- Partitioning
- Filtering
- Merging
- Joining
- Duplicating
- Clustering (MPP)
- Pivotting
Load data into a target format:
- Database loads
- Data warehouse population
- Partitioned loading
- Bulk loading
- Parallel loading
- Clustering
Environment:
- Full GUI to edit every transformation options
- Command line tools: execute jobs and transformations
- Web server: remote execution and clustering perfect in cloud computing environment for very large datasets processing
- Programming API for Java
- Plugin eco-system
This post is also available in: French
