Skip to content

ecodigit/lod-resource-harvester

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

LOD Resource Harvester (LOD-RH)

Given a URL of a SPARQL endpoint and a query to select a some resources, LOD-RH downloads RDF subject pages for selected resources.

Dependency

Before using LOD-RH make sure that lgu-commons is installed on your computer.

Installation

You can install LOD-RH using maven.

  1. Checkout repository
$ git clone https://github.com/luigi-asprino/lod-resource-harvester.git
  1. Compile the project with maven
cd lod-resource-harvester/
mvn clean install
  1. Import LOD-RH into your project by adding the following dependency to pom.xml of your project.
<dependency>
	<groupId>it.cnr.istc.stlab</groupId>
	<artifactId>pss.harvester</artifactId>
	<version>0.0.1</version>
</dependency>

Usage

  1. Define the harvesting task (e.g. Download subject pages of the first 10 persons in DBpedia) and save it in a text (JSON) file.
{
	"tasks": [
		{
			"endpoint": "http://dbpedia.org/sparql/",
			"sparqlResourceSelector": "prefix foaf: <http://xmlns.com/foaf/0.1/> select distinct ?resource {?resource a foaf:Person} LIMIT 10",
			"localDestination": "DBpedia_Persons"
		}
	]
}

  1. Provide the path to the JSON file in the configuration file.
taskFile=src/main/resources/tasks_dbpedia.json
  1. Run
public static void main(String[] args) {
	try {
		HarvesterConfiguration.setConfigFile("/path/to/config/file");
		Harvester.harvest(TaskBuilder.getTasks());
	} catch (IOException | JSchException | SftpException | InterruptedException e) {
		e.printStackTrace();
	}
}

About

A tool for harvesting LOD resources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%