Given a URL of a SPARQL endpoint and a query to select a some resources, LOD-RH downloads RDF subject pages for selected resources.
Before using LOD-RH make sure that lgu-commons is installed on your computer.
You can install LOD-RH using maven.
- Checkout repository
$ git clone https://github.com/luigi-asprino/lod-resource-harvester.git
- Compile the project with maven
cd lod-resource-harvester/
mvn clean install
- Import LOD-RH into your project by adding the following dependency to pom.xml of your project.
<dependency>
<groupId>it.cnr.istc.stlab</groupId>
<artifactId>pss.harvester</artifactId>
<version>0.0.1</version>
</dependency>
- Define the harvesting task (e.g. Download subject pages of the first 10 persons in DBpedia) and save it in a text (JSON) file.
{
"tasks": [
{
"endpoint": "http://dbpedia.org/sparql/",
"sparqlResourceSelector": "prefix foaf: <http://xmlns.com/foaf/0.1/> select distinct ?resource {?resource a foaf:Person} LIMIT 10",
"localDestination": "DBpedia_Persons"
}
]
}
- Provide the path to the JSON file in the configuration file.
taskFile=src/main/resources/tasks_dbpedia.json
- Run
public static void main(String[] args) {
try {
HarvesterConfiguration.setConfigFile("/path/to/config/file");
Harvester.harvest(TaskBuilder.getTasks());
} catch (IOException | JSchException | SftpException | InterruptedException e) {
e.printStackTrace();
}
}