This project implements a spatial data management system based on an R-tree / R-tree* structure for efficient storage and querying of multidimensional geographic data.
It was developed as part of the course Database Technology (Τεχνολογία Βάσεων Δεδομένων) during the academic year 2023–2024.
The system supports:
- Range queries
- k-Nearest Neighbor (kNN) queries
- Skyline queries
- Insertions and deletions
- Bulk loading (bottom-up construction)
- Organize multidimensional spatial points stored on secondary memory
- Design and implement an efficient R-tree index
- Support common spatial queries:
- Range queries
- k-nearest neighbors
- Skyline queries
- Compare indexed vs brute-force query performance
- Handle dynamic updates (insert/delete)
- Source: OpenStreetMap (
map.osm) - Data stored in blocks inside
datafile.txt - Each record contains:
- Location ID
- Latitude
- Longitude
Blocks contain 1365 records each to simulate secondary storage organization.
Reads spatial data from datafile.txt in blocks and supports random access using RandomAccessFile.
Parses map.osm and:
- Creates the datafile in blocks
- Inserts points into the R-tree
- Supports adding new locations dynamically
Represents a geographic location with:
- id, latitude, longitude
- Manhattan distance calculations
- Comparable by distance
Low-level spatial point with:
- Coordinates
- Block ID & slot ID (file position)
- Distance computation
Supports:
insert_in_Rtree()– dynamic insertion with node splittingdelete_from_tree()– deletion with rebalancingrange_query_with_index()knn_with_index()range_query_without_index()(brute force)knn_without_index()(brute force)
- Leaf nodes store
Pointobjects - Internal nodes store child
TreeNodes - Each node maintains an MBR (Minimum Bounding Rectangle)
- Maximum children per node: 5
Handles:
- Rectangle expansion
- Area computation
- Overlap calculation
- Distance from point to rectangle
Used in:
- R-tree traversal
- kNN queries
Used to maintain top-k nearest neighbors
Specialized heap for Skyline algorithm ordering nodes by Manhattan distance from origin.
The Skyline algorithm:
- Traverses the R-tree using a MinHeap
- Prunes dominated nodes
- Returns all non-dominated points in 2D space
Dominance definition:
A point A dominates B if A ≤ B in all dimensions and A < B in at least one.
The bottom-up method:
- Parses OSM points
- Sorts by latitude
- Groups into blocks of 5 points
- Builds leaf nodes
- Recursively groups nodes into parent nodes
- Continues until root is formed
This greatly improves construction performance compared to repeated insertions.
Average execution time (5 runs):
| Operation | Time |
|---|---|
| Insert | ~118 ms |
| Delete | ~5324 ms |
| Skyline query | ~6.8 ms |
| Bulk loading | ~933 ms |
- Java
- OpenStreetMap XML parsing
- File-based block storage
- Custom R-tree / R*-tree implementation