Skip to content

zyberg2091/BlogTagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blog Tagger

This Package can be used to extract keywords from a page to create tags for any blogs,news or any textual information available on the web page. These tags highlights the topic content by providing a glance of large volume of texts embedded in a page.Tag generation is an important feature in many sectors of IT such as Amazon uses tags for customer segmentation.

Prerequisites:

  • Install packages in the requirements.txt using pip install -r requirements.txt

  • Download the spaCy English model after installation:
    python -m spacy download en_core_web_sm

  • Follow the instruction given below to use albert-base model from hugging face model hub, you can change the model but it might need some customization in source code. so albert model is adviced here to download.

    model=TFAutoModel.from_pretrained('albert-base-v2')
    tokenizer=AutoTokenizer.from_pretrained('albert-base-v2')

Usage Instructions

  1. Clone the repository on local system

  2. Collect web data

  • For example

    from web_data import Blog_Data
    data=Blog_Data("https://influencermarketinghub.com/12-best-food-blogs/") pass website
    Text_data=data.text_prep(req=['h1', 'h2', 'h3', 'h4', 'p']) pass tags

  1. Use main class Blog tagger to generate top k tags.
  • For example

    tagger=Blog_Tagger(Text_data,maxlen=<int num>)
    tagger.token_embedding_gen(model,tokenizer)
    top_tokens=tagger.tag_gen(k)

Source Repository that contains package

About

This is a package that can extract important keywords from a page to create tags for any blogs, news etc. The link to the source repository of package is mentioned in the readme.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages