Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]
python-version: [3.7, 3.9, 3.11]

steps:
- uses: actions/checkout@v2
Expand Down
2 changes: 1 addition & 1 deletion pyclowder/api/v1/files.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ def _upload_to_dataset_local(connector, client, datasetid, filepath):
"""

logger = logging.getLogger(__name__)
url = '%s/api/uploadToDataset/%s?key=%s' % (client.host, datasetid, cliet.key)
url = '%s/api/uploadToDataset/%s?key=%s' % (client.host, datasetid, client.key)

if os.path.exists(filepath):
# Replace local path with remote path before uploading
Expand Down
20 changes: 18 additions & 2 deletions pyclowder/api/v2/files.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,22 @@ def download_info(connector, client, fileid):

return result

def download_summary(connector, client, fileid):
"""Download file summary from Clowder.

Keyword arguments:
connector -- connector information, used to get missing parameters and send status updates
client -- ClowderClient containing authentication credentials
fileid -- the file to fetch metadata of
"""

url = '%s/api/v2/files/%s/summary' % (client.host, fileid)
headers = {"Authorization": "Bearer " + client.key}
# fetch data
result = connector.get(url, stream=True, verify=connector.ssl_verify if connector else True, headers=headers)

return result


def download_metadata(connector,client, fileid, extractor=None):
"""Download file JSON-LD metadata from Clowder.
Expand Down Expand Up @@ -194,13 +210,13 @@ def upload_preview(connector, client, fileid, previewfile, previewmetadata=None,

# associate uploaded preview with orginal file
if fileid and not (previewmetadata and 'section_id' in previewmetadata and previewmetadata['section_id']):
url = '%s/api/files/%s/previews/%s?key=%s' % (host, fileid, previewid, key)
url = '%s/api/files/%s/previews/%s?key=%s' % (client.host, fileid, previewid, client.key)
result = connector.post(url, headers=headers, data=json.dumps({}),
verify=connector.ssl_verify if connector else True)

# associate metadata with preview
if previewmetadata is not None:
url = '%s/api/previews/%s/metadata?key=%s' % (host, previewid, key)
url = '%s/api/previews/%s/metadata?key=%s' % (client.host, previewid, client.key)
result = connector.post(url, headers=headers, data=json.dumps(previewmetadata),
verify=connector.ssl_verify if connector else True)

Expand Down
18 changes: 17 additions & 1 deletion pyclowder/files.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,21 @@ def download_info(connector, host, key, fileid):
result = v1files.download_info(connector, client, fileid)
return result.json()

def download_summary(connector, host, key, fileid):
"""Download file summary from Clowder.

Keyword arguments:
connector -- connector information, used to get missing parameters and send status updates
host -- the clowder host, including http and port, should end with a /
key -- the secret key to login to clowder
fileid -- the file to fetch metadata of
"""
client = ClowderClient(host=host, key=key)
if clowder_version == 2:
result = v2files.download_summary(connector, client, fileid)
else:
result = v1files.download_info(connector, client, fileid)
return result.json()

def download_metadata(connector, host, key, fileid, extractor=None):
"""Download file JSON-LD metadata from Clowder.
Expand Down Expand Up @@ -287,7 +302,8 @@ def upload_to_dataset(connector, host, key, datasetid, filepath, check_duplicate
"""
client = ClowderClient(host=host, key=key)
if clowder_version == 2:
v2files.upload_to_dataset(connector, client, datasetid, filepath, check_duplicate)
uploadedfileid = v2files.upload_to_dataset(connector, client, datasetid, filepath, check_duplicate)
return uploadedfileid
else:
logger = logging.getLogger(__name__)

Expand Down
8 changes: 8 additions & 0 deletions sample-extractors/test-dataset-extractor/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM python:3.8

WORKDIR /extractor
COPY requirements.txt ./
RUN pip install -r requirements.txt

COPY test-dataset-extractor.py extractor_info.json ./
CMD python test-dataset-extractor.py
81 changes: 81 additions & 0 deletions sample-extractors/test-dataset-extractor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
A simple test extractor that verifies the functions of file in pyclowder.

# Docker

This extractor is ready to be run as a docker container, the only dependency is a running Clowder instance. Simply build and run.

1. Start Clowder V2. For help starting Clowder V2, see our [getting started guide](https://github.com/clowder-framework/clowder2/blob/main/README.md).

2. First build the extractor Docker container:

```
# from this directory, run:

docker build -t test-dataset-extractor .
```

3. Finally run the extractor:

```
docker run -t -i --rm --net clowder_clowder -e "RABBITMQ_URI=amqp://guest:guest@rabbitmq:5672/%2f" --name "test-dataset-extractor" test-dataset-extractor
```

Then open the Clowder web app and run the wordcount extractor on a .txt file (or similar)! Done.

### Python and Docker details

You may use any version of Python 3. Simply edit the first line of the `Dockerfile`, by default it uses `FROM python:3.8`.

Docker flags:

- `--net` links the extractor to the Clowder Docker network (run `docker network ls` to identify your own.)
- `-e RABBITMQ_URI=` sets the environment variables can be used to control what RabbitMQ server and exchange it will bind itself to. Setting the `RABBITMQ_EXCHANGE` may also help.
- You can also use `--link` to link the extractor to a RabbitMQ container.
- `--name` assigns the container a name visible in Docker Desktop.

## Troubleshooting

**If you run into _any_ trouble**, please reach out on our Clowder Slack in the [#pyclowder channel](https://clowder-software.slack.com/archives/CNC2UVBCP).

Alternate methods of running extractors are below.

# Commandline Execution

To execute the extractor from the command line you will need to have the required packages installed. It is highly recommended to use python virtual environment for this. You will need to create a virtual environment first, then activate it and finally install all required packages.

```
Step 1 - Start clowder docker-compose
Step 2 - Starting heartbeat listener
virtualenv clowder2-python (try pipenv)
source clowder2-python/bin/activate
Step 3 - Run heatbeat_listener_sync.py to register new extractor (This step will likely not be needed in future)
cd ~/Git/clowder2/backend
pip install email_validator
copy heartbeat_listener_sync.py to /backend from /backend/app/rabbitmq
python heartbeat_listener_sync.py

Step 4 - Installing pyclowder branch & running extractor
source ~/clowder2-python/bin/activate
pip uninstall pyclowder

# the pyclowder Git repo should have Todd's branch activated (50-clowder20-submit-file-to-extractor)
pip install -e ~/Git/pyclowder

cd ~/Git/pyclowder/sample-extractors/test-dataset-extractor
export CLOWDER_VERSION=2
export CLOWDER_URL=http://localhost:8000/

python test-dataset-extractor.py


Step 5 = # post a particular File ID (text file) to the new extractor
POST http://localhost:3002/api/v2/files/639b31754241665a4fc3e513/extract?extractorName=ncsa.test-dataset-extractor

Or,
Go to Clowder UI and submit a file for extraction
```

# Run the extractor from Pycharm
You can run the heartbeat_listener_sync.py and test_file_extractor.py from pycharm.
Create a pipenv (generally pycharm directs you to create one when you first open the file). To run test_file_extractor.py,
add 'CLOWDER_VERSION=2' to environment variable in run configuration.
29 changes: 29 additions & 0 deletions sample-extractors/test-dataset-extractor/extractor_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"@context": "http://clowder.ncsa.illinois.edu/contexts/extractors.jsonld",
"name": "ncsa.test-dataset-extractor",
"version": "2.0",
"description": "Test Dataset extractor. Test to verify all functionalities of dataset in pyclowder.",
"author": "Dipannita Dey <[email protected]>",
"contributors": [],
"contexts": [
{
"lines": "http://clowder.ncsa.illinois.edu/metadata/sample_metadata#lines",
"words": "http://clowder.ncsa.illinois.edu/metadata/sample_metadata#words",
"characters": "http://clowder.ncsa.illinois.edu/metadata/sample_metadata#characters"
}
],
"repository": [
{
"repType": "git",
"repUrl": "https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git"
}
],
"process": {
"dataset": [
"*"
]
},
"external_services": [],
"dependencies": [],
"bibtex": []
}
1 change: 1 addition & 0 deletions sample-extractors/test-dataset-extractor/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pyclowder==2.6.0
72 changes: 72 additions & 0 deletions sample-extractors/test-dataset-extractor/test-dataset-extractor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/usr/bin/env python

"""Example extractor based on the clowder code."""

import logging
import subprocess
import os

from pyclowder.extractors import Extractor
import pyclowder.files


class TestDatasetExtractor(Extractor):
"""Test the functionalities of an extractor."""
def __init__(self):
Extractor.__init__(self)

# add any additional arguments to parser
# self.parser.add_argument('--max', '-m', type=int, nargs='?', default=-1,
# help='maximum number (default=-1)')

# parse command line and load default logging configuration
self.setup()

# setup logging for the exctractor
logging.getLogger('pyclowder').setLevel(logging.DEBUG)
logging.getLogger('__main__').setLevel(logging.DEBUG)

def process_message(self, connector, host, secret_key, resource, parameters):
# Process the file and upload the results

logger = logging.getLogger(__name__)
dataset_id = resource['id']

# Local file path to file which you want to upload to dataset
file_path = os.path.join(os.getcwd(), 'test_dataset_extractor_file.txt')

# Upload a new file to dataset
file_id = pyclowder.files.upload_to_dataset(connector, host, secret_key, dataset_id, file_path, True)
if file_id is None:
logger.error("Error uploading file")
else:
logger.info("File uploaded successfully")

# Get file list under dataset
file_list = pyclowder.datasets.get_file_list(connector, host, secret_key, dataset_id)
logger.info("File list : %s", file_list)
is_in = file_id in [file['id'] for file in file_list]
if file_id in [file['id'] for file in file_list]:
logger.info("File uploading and retrieving file list succeeded")
else:
logger.error("File uploading/retrieving file list didn't succeed")

# Download info of dataset
dataset_info = pyclowder.datasets.get_info(connector, host, secret_key, dataset_id)
logger.info("Dataset info: %s", dataset_info)
if dataset_id == dataset_info['id']:
logger.info("Success in downloading dataset info")
else:
logger.error("Error in downloading dataset info")

# Downloading metadata of dataset
dataset_metadata = pyclowder.datasets.download_metadata(connector, host, secret_key, dataset_id)
if dataset_metadata is None:
logger.info("No metadata found for dataset %s", dataset_id)
else:
logger.info("Metadata: %s", dataset_metadata)


if __name__ == "__main__":
extractor = TestDatasetExtractor()
extractor.start()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a test file for the test dataset extractor.
8 changes: 8 additions & 0 deletions sample-extractors/test-file-extractor/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM python:3.8

WORKDIR /extractor
COPY requirements.txt ./
RUN pip install -r requirements.txt

COPY test-file-extractor.py extractor_info.json ./
CMD python test-file-extractor.py
81 changes: 81 additions & 0 deletions sample-extractors/test-file-extractor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
A simple test extractor that verifies the functions of file in pyclowder.

# Docker

This extractor is ready to be run as a docker container, the only dependency is a running Clowder instance. Simply build and run.

1. Start Clowder V2. For help starting Clowder V2, see our [getting started guide](https://github.com/clowder-framework/clowder2/blob/main/README.md).

2. First build the extractor Docker container:

```
# from this directory, run:

docker build -t test-file-extractor .
```

3. Finally run the extractor:

```
docker run -t -i --rm --net clowder_clowder -e "RABBITMQ_URI=amqp://guest:guest@rabbitmq:5672/%2f" --name "test-file-extractor" test-file-extractor
```

Then open the Clowder web app and run the wordcount extractor on a .txt file (or similar)! Done.

### Python and Docker details

You may use any version of Python 3. Simply edit the first line of the `Dockerfile`, by default it uses `FROM python:3.8`.

Docker flags:

- `--net` links the extractor to the Clowder Docker network (run `docker network ls` to identify your own.)
- `-e RABBITMQ_URI=` sets the environment variables can be used to control what RabbitMQ server and exchange it will bind itself to. Setting the `RABBITMQ_EXCHANGE` may also help.
- You can also use `--link` to link the extractor to a RabbitMQ container.
- `--name` assigns the container a name visible in Docker Desktop.

## Troubleshooting

**If you run into _any_ trouble**, please reach out on our Clowder Slack in the [#pyclowder channel](https://clowder-software.slack.com/archives/CNC2UVBCP).

Alternate methods of running extractors are below.

# Commandline Execution

To execute the extractor from the command line you will need to have the required packages installed. It is highly recommended to use python virtual environment for this. You will need to create a virtual environment first, then activate it and finally install all required packages.

```
Step 1 - Start clowder docker-compose
Step 2 - Starting heartbeat listener
virtualenv clowder2-python (try pipenv)
source clowder2-python/bin/activate
Step 3 - Run heatbeat_listener_sync.py to register new extractor (This step will likely not be needed in future)
cd ~/Git/clowder2/backend
pip install email_validator
copy heartbeat_listener_sync.py to /backend from /backend/app/rabbitmq
python heartbeat_listener_sync.py

Step 4 - Installing pyclowder branch & running extractor
source ~/clowder2-python/bin/activate
pip uninstall pyclowder

# the pyclowder Git repo should have Todd's branch activated (50-clowder20-submit-file-to-extractor)
pip install -e ~/Git/pyclowder

cd ~/Git/pyclowder/sample-extractors/test-file-extractor
export CLOWDER_VERSION=2
export CLOWDER_URL=http://localhost:8000/

python test-file-extractor.py


Step 5 = # post a particular File ID (text file) to the new extractor
POST http://localhost:3002/api/v2/files/639b31754241665a4fc3e513/extract?extractorName=ncsa.test-file-extractor

Or,
Go to Clowder UI and submit a file for extraction
```

# Run the extractor from Pycharm
You can run the heartbeat_listener_sync.py and test_file_extractor.py from pycharm.
Create a pipenv (generally pycharm directs you to create one when you first open the file). To run test_file_extractor.py,
add 'CLOWDER_VERSION=2' to environment variable in run configuration.
Loading