-
Notifications
You must be signed in to change notification settings - Fork 6
Initial implementation of metadata indexing #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ddey2
commented
Nov 1, 2022
- metadata indexing
- also updated syntax of elasticsearch dependency injection
…or elasticsearch dependency injection
| } | ||
|
|
||
| metadata_mappings = {} | ||
| # "properties": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the mappings because I noticed when the mapping are there for static fields, it somehow creating duplicate fields under a field "doc". So it looked like below:
{
"metadata": {
"aliases": {},
"mappings": {
"properties": {
"contents": {
"type": "object"
},
"context": {
"type": "text"
},
"context_url": {
"type": "text"
},
"created": {
"type": "date"
},
"creator": {
"type": "keyword"
},
"doc": {
"properties": {
"contents": {
"properties": {
"alternateName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"latitude": {
"type": "float"
},
"longitude": {
"type": "float"
}
}
},
"context_url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"created": {
"type": "date"
},
"creator": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"reource_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"resource_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"resource_id": {
"type": "text"
},
"resource_type": {
"type": "text"
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "5",
"provided_name": "metadata",
"creation_date": "1667489155177",
"number_of_replicas": "5",
"uuid": "g5MQr--BQB6q2B8boKZWkA",
"version": {
"created": "8030399"
}
}
}
}
}
It made the search complicated. I removed the mappings and the index now looks like this:
{
"metadata": {
"aliases": {},
"mappings": {
"properties": {
"doc": {
"properties": {
"contents": {
"properties": {
"alternateName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"latitude": {
"type": "float"
},
"longitude": {
"type": "float"
}
}
},
"context_url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"created": {
"type": "date"
},
"creator": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"reource_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"resource_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "5",
"provided_name": "metadata",
"creation_date": "1667507228722",
"number_of_replicas": "5",
"uuid": "WO4zd7trTGyf4P_2wGpPOw",
"version": {
"created": "8030399"
}
}
}
}
}
I can now refer to the fields as doc.creator or doc.content.latitude etc. Sorry for long post, I hope it makes sense.
|
Could we index enough information about the file or datasets in the metadata index? If a user searches for metadata, they probbaly just want to see the file or dataset it belongs to. |
|
Right now, we have dataset_id and file_id in the metadata index. we can ad more info or just retrieve the info from mongo using the id |
lmarini
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@lmarini I addressed the above issue. Great find, thanks! We don't need 'doc' while inserting record but need that while updating it. |

