Google Summer of Code 2010 Ideas
Google’s Summer of Code program allows for students to receive a stipend for developing open source software under the supervision of one or two mentors.
The project ideas proposed here, if selected will be mentored by Dr. Thierry Badard (who also leads the GeoSOA research group at Laval University in Quebec City). For some of the project ideas proposed here, the collaboration and possible co-supervision with a member of the OSGeo is also expected.
Each project idea requires reasonably significant Java, Javascript and AJAX knowledge to be successful. In addition, skills in GIS and OLAP (Online Analytical Processing) will be an asset.
The selected student (if one or more project ideas are funded by the Google SoC program) will have the opportunity to be an active member of the Spatialytics’s research and development team, to learn from the methods, to practice and to contribute to the development of some innovative piece of software and to improve his/her skills in the domain of geospatial databases, geo-analytical dashboards, spatial ETL and spatial OLAP (SOLAP) servers and clients.
Project context
Business Intelligence (BI) tools, such as dashboards, reporting and On-Line Analytical Processing (OLAP) allow decision-makers to analyze data and information in order to make better decisions. Summarized data from operational systems are presented to users in interactive tables, charts, graphs and reports.
An open-source BI software stack is now offered by Pentaho. It includes:
- An Extract, Transform and Load (ETL) tool (Kettle) used to integrate data from heterogeneous sources to a data warehouse;
- An OLAP server (Mondrian), which provides multidimensional query facilities on top of the data warehouse;
- Reporting and dashboard tools, used to present data to analysts in a convivial manner.
Geospatial Business Intelligence (Geo-BI), combining GIS and BI technologies, has recently stirred marked interest for the huge potential of combining spatial analysis and map visualization with proven BI tools and techniques such as data warehousing (DW), Online Analytical Processing (OLAP), … A complete open source Geo-BI suite is now available at Spatialytics.org. It comprises:
- GeoKettle, a spatial ETL tool based on Pentaho Data Integration (Kettle) and targeted for analytic data warehousing,
- GeoMondrian, a Spatial OLAP (SOLAP) server which extends the open source Mondrian OLAP server with GIS data types and functions
- SOLAPLayers, a client visualization component for SOLAP data, using GeoExt/OpenLayers as the web mapping front-end. It enables the creation of drillable, interactive thematic maps and can be embedded in Geo-BI web applications such as geo-analytical dashboards.
Project ideas
All project ideas provided hereafter deal with GeoKettle, GeoMondrian and SOLAPLayers, the three open source Geospatial BI (GeoBI) projects hosted by Spatialytics.org:
- Extension to thematic mapping capabilites of SOLAPLayers: At present, SOLAPLayers presents very basic thematic mapping capabilities. It only provides support for choropletes with fixed or dynamic equal intervals and maps with proportional symbols. In order to support the decision process, advanced thematic mapping capabilities are required. For instance, maps combining different visual variables (color, pattern, size, intensity, …), histograms, pie charts, etc. should be supported bySOLAPLayers. The student in charge of this project will have thus: 1) to determine which thematic representations are usefull for decision support applications, 2) to develop in SOLAPLayers the support for these new cartographic styles. Attention should be paid to develop these new mapping capabilities as a separate library which could be used not only by SOLAPLayers but also by OpenLayers and GeoExt (on which SOLAPLayers relies).
- Extending the Mondrian schema workbench to support spatial data types: GeoMondrian brings to Mondrian what PostGIS brings to PostgreSQL, i.e. a consistent and powerful support for spatial data. In order to allow to GeoMondrian to navigate in the multidimensional data cubes, a user needs to define a XML configuration files which defines the mapping between the SOLAP data cube concepts (dimension, hierarchy, etc.) and the underlying multidimensional data structures of the geospatial data warehouse. At present, GeoMondrian only supports the PostgreSQL/PostGIS DBMS. In order to make the design of XML configuration files easier for non spatial data cubes, Pentaho provides a tool named Schema Workbench. It is a GUI which helps in the creation of the file. It allows connection to the data warehouse and provides many wizards to set up the file. The idea here is then to extend this tool so that it will provide a consistent support for spatial data types. This support should be achieved in accordance to the support implemented in GeoMondrian. It should enable the exploration of data warehouses based on PostgreSQL/PostGIS and the design of geospatial data cubes ready to be used in GeoMondrian.
- Support for additional GIS file formats, geospatial services and spatial DBMS in GeoKettle: At present, GeoKettle natively supports the reading/writing of ESRI Shapefiles, GML and KML and the following spatial DBMS: PostgrSQL/PostGIS, Oracle and MySQL. It is required to support more GIS file formats, geospatial services and spatial DBMS, such as MapInfo (.tab or MIF/MID), GeoJSON, OSM (Open Street Map), WFS, MS SQL Server 2008 or Ingres. The student in charge of the project will thus have: 1) to identify existing open source libraries which will enable the native support of additional DBMS, services or GIS file formats in GeoKettle. Attention should be paid to the fact these libraries should not bring cross-compilation constraints, for instance by embedding non Java based source codes, 2) to consistently integrate the selected libraries or to produce new Java source code in order to support more GIS file formats, services or spatial DBMS in GeoKettle.
- Adding spatial data conflation and cleansing capabilities to GeoKettle: Details about this idea will be provided shortly …
- Designing SOLAPLayers components for mobile geo-analytical dashboards: Details about this idea will be provided shortly …
- Building a free geospatial data cube for GeoMondrian: This idea does not focus on the development of a innovative piece of software. It deals with the design, creation and feeding of a free multidimensional data cube for GeoMondrian. At present, only a small multidimensional data set is freely available to demonstrate the capabilities of GeoMondrian. The student in charge of the project will have thus: 1) to design a database (in a PostgreSQL/PostGIS environment) which respects the multidimensional modelling principles (i.e. according to a star or a snowflake schema), 2) to populate this database with existing and freely available geospatial (or not) data or with additional data created by the student when it will be necessary. These new data will be released under a free data licence. Data should present different levels of details and time in order to allow for an interesting navigation in the data cube, 3) to set up a XML based mapping file (defining the data cube for the SOLAP server) so that GeoMondrian is able to read and navigate into the multidimensional data structure, 4) design and implement a basic and visual demo with the help of SOLAPLayers.
- More ideas to come shortly …