Register extractor submit file #66

tcnichol · 2022-08-07T18:00:35Z

I'm going to go ahead and create this as a draft to get feedback.

Extractors are registered using the extractor-info.json. Files and datasets can be submitted to a running extractor. The message sent is as close to what was sent before so that minimal changes are needed for pyclowder2.

Right now extractors don't run since they rely on api endpoints that aren't in clowder2.0. Users also don't have extractor keys. I will create other issues and possibly link them here as well.

it now works

2.4.1 becomes 2.41 etc

longshuicy · 2022-08-19T01:53:46Z

backend/app/models/extractors.py

+    external_services: List[str]
+    libraries: List[str] = []
+    bibtex: List[str]
+    maturity: str = "Development"


what does maturity mean?

This was in the Extractor Model before I started working.

I'm not sure what its used for and there isn't a field in the extractor_info.json, but since it was in I left it in.

It's to identify extractors that are still under development versus those that are ready for production.

We had a discussion about this and decided to keep fields in and review all of them together.

longshuicy · 2022-08-19T01:57:26Z

backend/app/rabbitmq/heartbeat_listener.py

@@ -0,0 +1,92 @@
+import asyncio
+import json
+from aio_pika import ExchangeType, connect


if you haven't done so you might want to add aio_pika to the pipfile

longshuicy · 2022-08-19T02:01:59Z

backend/app/rabbitmq/heartbeat_listener.py

+
+async def main() -> None:
+    # Perform connection
+    connection = await connect("amqp://guest:guest@localhost/")


can we put this amqp://guest:guest@localhost/ to config.py

longshuicy · 2022-08-19T02:04:07Z

backend/app/rabbitmq/heartbeat_listener.py

+
+async def main() -> None:
+    # Perform connection
+    connection = await connect("amqp://guest:guest@localhost/")


is there any need to add credentials? I saw our old code does that

credentials = pika.PlainCredentials("guest", "guest") parameters = pika.ConnectionParameters("localhost", credentials=credentials) connection = pika.BlockingConnection(parameters)

I do not think it is necessary with this library. The URL has the name and password, and this seems to work fine.

I use the parameters now in both the (renamed) sync and async heartbeat listeners

longshuicy · 2022-08-19T02:06:06Z

backend/app/rabbitmq/heartbeat_listener_2.py

@@ -0,0 +1,59 @@
+import pika


sorry I might've missed 2 weeks of discussion. Is this meant for clowder 2?
Any reason we are using aio_pika vs pika in different places?

This is something I can change. Part of the issue I had was I was trying to run this as a background process, and I think running in using just pika caused issues.

If it's better, I can try to use pika again, especially since this will be running as a separate script outside of main.

longshuicy · 2022-08-19T02:10:04Z

backend/app/routers/datasets.py

+            body['host'] = 'http://127.0.0.1:8000'
+            body['secretKey'] = 'secretKey'
+            body['resource_type'] = 'dataset'
+            body['flags'] = ""


I'm not sure about if we should spell out the secretKey and host here...

I took that out and used token, to match what happens with files.
So far I haven't actually tested dataset extraction, so this may change again later.

some extractors use a.b.c etc to denote version

version parse replaced the float comparison better logging - log if new extractor registered or updated

tcnichol · 2022-08-22T19:21:50Z

Made a few changes.

Extractor versions are now strings instead of floats, since many extractor versions use 1.0.0. Still compared.

There is also a sync and async listener. Both work, but I would consider the sync one the default and adequate for what we are doing. The async one I wrote because I was trying to get the heartbeat extractor to run as a background thread in main, but if it's run as a separate service it should not matter.

I have not added docker deployment for this. That will be a later issue and pull request.

tcnichol · 2022-08-24T14:33:32Z

Also, for this to work use branch
50-clowder20-submit-file-to-extractor
branch with pyclowder, and test using the wordcount.

tcnichol · 2022-08-24T14:48:03Z

related pull request for pyclowder

clowder-framework/pyclowder#51

branch:

https://github.com/clowder-framework/pyclowder/tree/50-clowder20-submit-file-to-extractor

max-zilla

I think this is mostly good for initial implementation, just a few changes.

max-zilla · 2022-08-10T14:00:21Z

backend/app/models/extractors.py

+    author: str
+    contributors: List[str] = []
+    contexts: List[dict] = []
+    repository: Union[Repository, None] = None


elsewhere we use Optional[Repository]

max-zilla · 2022-09-14T13:59:35Z

backend/app/models/extractors.py

+    external_services: List[str]
+    libraries: List[str] = []
+    bibtex: List[str]
+    maturity: str = "Development"


We should discuss in meeting, we can probably remove some of these old fields.

max-zilla · 2022-09-14T14:00:12Z

backend/app/rabbitmq/heartbeat_listener_async.py

+                    {"_id": existing_extractor["_id"]}
+                )
+                extractor_out = ExtractorOut.from_mongo(found)
+                print(


let's either remove these print() statements or use logger

max-zilla · 2022-09-14T14:02:08Z

backend/app/routers/extractors.py

+    user=Depends(keycloak_auth.get_current_user),
+    db: MongoClient = Depends(dependencies.get_db),
+):
+    result = extractor_in.dict()


tcnichol added 15 commits August 3, 2022 13:15

adding model and router

140222c

adding router to main

59ff342

extractor registers and we can get list of extractors

63f0d77

get edit and delete added

48614f2

message is received by running wordcount extractor

91efd5e

sending extractor by json

a5c9edd

using current routing key

151cafb

adding parameters

4be70a0

adding new fields

d489d4f

adding resource and other fields needed by pyclowder

1ac4bba

resource is sent in

0782d85

list dict for values that are jsvalue in original

7581eeb

no longer need object type for process

ce857c2

adding route for submitting dataset to extractor

f27b3d9

removing resource

0d4fadb

tcnichol requested review from lmarini and max-zilla August 7, 2022 18:00

This was linked to issues Aug 7, 2022

register extractor #64

Closed

submit file and dataset to extractor manually #65

Closed

tcnichol added 4 commits August 9, 2022 13:08

heartbeat listener does not work

ddaead4

added heartbeat listener

c27327d

it now works

hearbeat listener now works

9729df6

removing unusued imports

ee08a4e

tcnichol linked an issue Aug 9, 2022 that may be closed by this pull request

implement extractor heartbeat #62

Closed

tcnichol added 5 commits August 9, 2022 16:14

cast new version to float

9ccda0a

sending the bearer token to pyclowder message queue

fd633c9

converting repository fields for extractor registration

ce5ed50

version may not cast to float, if not example

11f13dc

2.4.1 becomes 2.41 etc

repository is now list of repositories

abdd68b

longshuicy reviewed Aug 19, 2022

View reviewed changes

tcnichol added 4 commits August 19, 2022 13:30

added dependency to pipfile

7ab46b4

changes based on github pull request feedback

79c96e9

change version from float to string

4b3a019

some extractors use a.b.c etc to denote version

there are now a sync and async listener

3dbc037

version parse replaced the float comparison better logging - log if new extractor registered or updated

tcnichol marked this pull request as ready for review August 23, 2022 22:42

tcnichol added 6 commits August 24, 2022 09:55

black formatting

2ad152b

deleting pipfile lock

8b1a7e4

deleting pipfile lock

a90ed65

changing ports for keycloak

7fbf0e9

adding keycloak volume

8ae260a

reverting size

ebe2ad8

max-zilla requested changes Sep 14, 2022

View reviewed changes

max-zilla added 4 commits September 26, 2022 13:32

Merge branch 'main' into register-extractor-submit-file

b0e15fe

formatting

a139f08

Update user dependency

433fb8c

Merge branch 'main' into register-extractor-submit-file

47c00ed

max-zilla self-requested a review September 26, 2022 20:08

max-zilla approved these changes Sep 26, 2022

View reviewed changes

Merge branch 'main' into register-extractor-submit-file

c110a17

max-zilla merged commit 1cb545e into main Sep 26, 2022

max-zilla deleted the register-extractor-submit-file branch September 26, 2022 20:35

Register extractor submit file #66

Register extractor submit file #66

Uh oh!

Conversation

tcnichol commented Aug 7, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcnichol Aug 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcnichol commented Aug 22, 2022

Uh oh!

tcnichol commented Aug 24, 2022

Uh oh!

tcnichol commented Aug 24, 2022

Uh oh!

max-zilla left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tcnichol Aug 19, 2022 •

edited

Loading