-
Notifications
You must be signed in to change notification settings - Fork 6
Register extractor submit file #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| external_services: List[str] | ||
| libraries: List[str] = [] | ||
| bibtex: List[str] | ||
| maturity: str = "Development" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does maturity mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in the Extractor Model before I started working.
I'm not sure what its used for and there isn't a field in the extractor_info.json, but since it was in I left it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to identify extractors that are still under development versus those that are ready for production.
We had a discussion about this and decided to keep fields in and review all of them together.
| @@ -0,0 +1,92 @@ | |||
| import asyncio | |||
| import json | |||
| from aio_pika import ExchangeType, connect | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you haven't done so you might want to add aio_pika to the pipfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
|
||
| async def main() -> None: | ||
| # Perform connection | ||
| connection = await connect("amqp://guest:guest@localhost/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this amqp://guest:guest@localhost/ to config.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| async def main() -> None: | ||
| # Perform connection | ||
| connection = await connect("amqp://guest:guest@localhost/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any need to add credentials? I saw our old code does that
credentials = pika.PlainCredentials("guest", "guest")
parameters = pika.ConnectionParameters("localhost", credentials=credentials)
connection = pika.BlockingConnection(parameters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is necessary with this library. The URL has the name and password, and this seems to work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the parameters now in both the (renamed) sync and async heartbeat listeners
| @@ -0,0 +1,59 @@ | |||
| import pika | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I might've missed 2 weeks of discussion. Is this meant for clowder 2?
Any reason we are using aio_pika vs pika in different places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I can change. Part of the issue I had was I was trying to run this as a background process, and I think running in using just pika caused issues.
If it's better, I can try to use pika again, especially since this will be running as a separate script outside of main.
backend/app/routers/datasets.py
Outdated
| body['host'] = 'http://127.0.0.1:8000' | ||
| body['secretKey'] = 'secretKey' | ||
| body['resource_type'] = 'dataset' | ||
| body['flags'] = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about if we should spell out the secretKey and host here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took that out and used token, to match what happens with files.
So far I haven't actually tested dataset extraction, so this may change again later.
some extractors use a.b.c etc to denote version
version parse replaced the float comparison better logging - log if new extractor registered or updated
|
Made a few changes. Extractor versions are now strings instead of floats, since many extractor versions use 1.0.0. Still compared. There is also a sync and async listener. Both work, but I would consider the sync one the default and adequate for what we are doing. The async one I wrote because I was trying to get the heartbeat extractor to run as a background thread in main, but if it's run as a separate service it should not matter. I have not added docker deployment for this. That will be a later issue and pull request. |
|
Also, for this to work use branch |
|
related pull request for pyclowder clowder-framework/pyclowder#51 branch: https://github.com/clowder-framework/pyclowder/tree/50-clowder20-submit-file-to-extractor |
max-zilla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mostly good for initial implementation, just a few changes.
backend/app/models/extractors.py
Outdated
| author: str | ||
| contributors: List[str] = [] | ||
| contexts: List[dict] = [] | ||
| repository: Union[Repository, None] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elsewhere we use Optional[Repository]
| external_services: List[str] | ||
| libraries: List[str] = [] | ||
| bibtex: List[str] | ||
| maturity: str = "Development" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss in meeting, we can probably remove some of these old fields.
| {"_id": existing_extractor["_id"]} | ||
| ) | ||
| extractor_out = ExtractorOut.from_mongo(found) | ||
| print( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's either remove these print() statements or use logger
| user=Depends(keycloak_auth.get_current_user), | ||
| db: MongoClient = Depends(dependencies.get_db), | ||
| ): | ||
| result = extractor_in.dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
I'm going to go ahead and create this as a draft to get feedback.
Extractors are registered using the extractor-info.json. Files and datasets can be submitted to a running extractor. The message sent is as close to what was sent before so that minimal changes are needed for pyclowder2.
Right now extractors don't run since they rely on api endpoints that aren't in clowder2.0. Users also don't have extractor keys. I will create other issues and possibly link them here as well.