This project aims to build an intelligent search engine that understands the semantic meaning of research queries and ranks documents based on abstract importance and key concept extraction.
- Sentence Transformers – Understanding query semantics
- LSTM, RNN – Document ranking based on abstract importance
- CBOW Embeddings – Extracting key concepts from papers
- Named Entity Recognition (NER) – Extracting citations, authors, and key entities
✅ Semantic Search: Uses Sentence Transformers to improve query understanding
✅ Intelligent Ranking: LSTM-based ranking of research papers
✅ Concept Extraction: CBOW embeddings identify key topics
✅ Entity Recognition: NER extracts authors, citations, and key entities
- Language Models: BERT, Sentence Transformers
- Deep Learning: LSTM, RNN, PyTorch/TensorFlow
- NLP Techniques: Named Entity Recognition (NER), Word Embeddings (CBOW)
- Database: PostgreSQL / MongoDB for storing research papers
- Backend: FastAPI / Flask
- Frontend: Next.js, Typescript, TailwindCSS
- Make sure Nodejs and python are installed before following the next steps
- To check for Nodejs and Python run these commands on command prompt
node -v
python --version
- If any problem understanding the folder structure ask me
- arxiv_metadata.json was put in .gitignore because it was 4GB's big download it locally and put it in search_engine/data/raw folder
-
Clone the repository:
git clone https://github.com/Augnik03/ResearchAI.git cd ResearchAI -
For working on the frontend:
cd frontend npm i
- If error occurs run this command:
npm i --legacy-peer-deps or npm i --force
- To run development server:
npm run dev
- For working on the backend:
cd search_engine
- Before installing python dependencies make sure to create a virtual environment:
python -m venv venv venv/Scripts/activate
-
Install dependencies:
pip install -r requirements.txt
-
To run scripts on the dataset:
python preprocess.py
Contributions are welcome! Fork the repo, create a feature branch, and submit a PR.