Skip to content

Conversation

@ddey2
Copy link
Member

@ddey2 ddey2 commented Nov 1, 2022

  1. metadata indexing
  2. also updated syntax of elasticsearch dependency injection

@ddey2 ddey2 requested a review from max-zilla as a code owner November 1, 2022 21:18
@ddey2 ddey2 linked an issue Nov 1, 2022 that may be closed by this pull request
@ddey2 ddey2 marked this pull request as draft November 1, 2022 21:18
}

metadata_mappings = {}
# "properties": {
Copy link
Member Author

@ddey2 ddey2 Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the mappings because I noticed when the mapping are there for static fields, it somehow creating duplicate fields under a field "doc". So it looked like below:

{
    "metadata": {
        "aliases": {},
        "mappings": {
            "properties": {
                "contents": {
                    "type": "object"
                },
                "context": {
                    "type": "text"
                },
                "context_url": {
                    "type": "text"
                },
                "created": {
                    "type": "date"
                },
                "creator": {
                    "type": "keyword"
                },
                "doc": {
                    "properties": {
                        "contents": {
                            "properties": {
                                "alternateName": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "latitude": {
                                    "type": "float"
                                },
                                "longitude": {
                                    "type": "float"
                                }
                            }
                        },
                        "context_url": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        },
                        "created": {
                            "type": "date"
                        },
                        "creator": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        },
                        "reource_type": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        },
                        "resource_id": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        }
                    }
                },
                "resource_id": {
                    "type": "text"
                },
                "resource_type": {
                    "type": "text"
                }
            }
        },
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "5",
                "provided_name": "metadata",
                "creation_date": "1667489155177",
                "number_of_replicas": "5",
                "uuid": "g5MQr--BQB6q2B8boKZWkA",
                "version": {
                    "created": "8030399"
                }
            }
        }
    }
}

It made the search complicated. I removed the mappings and the index now looks like this:

{
    "metadata": {
        "aliases": {},
        "mappings": {
            "properties": {
                "doc": {
                    "properties": {
                        "contents": {
                            "properties": {
                                "alternateName": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "latitude": {
                                    "type": "float"
                                },
                                "longitude": {
                                    "type": "float"
                                }
                            }
                        },
                        "context_url": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        },
                        "created": {
                            "type": "date"
                        },
                        "creator": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        },
                        "reource_type": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        },
                        "resource_id": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "5",
                "provided_name": "metadata",
                "creation_date": "1667507228722",
                "number_of_replicas": "5",
                "uuid": "WO4zd7trTGyf4P_2wGpPOw",
                "version": {
                    "created": "8030399"
                }
            }
        }
    }
}

I can now refer to the fields as doc.creator or doc.content.latitude etc. Sorry for long post, I hope it makes sense.

@ddey2 ddey2 marked this pull request as ready for review November 7, 2022 19:46
@longshuicy
Copy link
Member

If it's possible we might to think about what will the search interface of metadata looks like. Right now the query syntax of search works; but because the record for metadata is so different than "dataset" or "file", how can we display the result?

image

@ddey2
Copy link
Member Author

ddey2 commented Nov 14, 2022

If it's possible we might to think about what will the search interface of metadata looks like. Right now the query syntax of search works; but because the record for metadata is so different than "dataset" or "file", how can we display the result?

image

I agree that it looks different. We can probably discuss this and address this in a separate PR.

@ddey2 ddey2 requested a review from longshuicy November 14, 2022 18:26
@ddey2 ddey2 mentioned this pull request Nov 15, 2022
@lmarini
Copy link
Member

lmarini commented Nov 15, 2022

Could we index enough information about the file or datasets in the metadata index? If a user searches for metadata, they probbaly just want to see the file or dataset it belongs to.

@ddey2
Copy link
Member Author

ddey2 commented Nov 15, 2022

Right now, we have dataset_id and file_id in the metadata index. we can ad more info or just retrieve the info from mongo using the id

Copy link
Member

@lmarini lmarini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first add metadata to a dataset, the elasticsearch document has the contents under doc.contents. But if I update that entry, the new value is stored under 'contents. For example, this is what it looks like for a field called alternateName.

Screenshot 2022-11-16 at 1 51 57 PM

@ddey2
Copy link
Member Author

ddey2 commented Nov 16, 2022

When I first add metadata to a dataset, the elasticsearch document has the contents under doc.contents. But if I update that entry, the new value is stored under 'contents. For example, this is what it looks like for a field called alternateName.

Screenshot 2022-11-16 at 1 51 57 PM

Looking into this.

@ddey2
Copy link
Member Author

ddey2 commented Nov 16, 2022

@lmarini I addressed the above issue. Great find, thanks! We don't need 'doc' while inserting record but need that while updating it.

@ddey2 ddey2 requested a review from lmarini November 16, 2022 20:52
@lmarini lmarini merged commit f75e559 into main Nov 21, 2022
@lmarini lmarini deleted the 143-implement-metadata-in-elasticsearch branch November 21, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement metadata in elasticsearch

5 participants