This project was completed as part of the I304 – Data Analytics and Intelligence course.
It applies data analytics and machine learning techniques using Python to solve real-world business intelligence problems.
The assignment consists of four major tasks:
- Linear Regression – Stock price prediction using Yahoo Finance API
- Data Clustering – Credit card dataset clustering using K-Means
- Data Classification – Breast Cancer Wisconsin dataset classification
- Principal Component Analysis (PCA) – Dimensionality reduction with classification
- Python 3.x
- Pandas, NumPy – Data manipulation
- Matplotlib, Seaborn – Data visualization
- scikit-learn – Machine learning (Regression, Classification, Clustering, PCA)
- yfinance – Stock market data retrieval
- Retrieved stock data using Yahoo Finance API
- Performed exploratory data analysis on stock features
- Built a multivariate linear regression model to predict stock closing prices
- Evaluated model performance with R² score
- Tested predictions on custom input values
- Applied K-Means clustering (clusters = 3 → 15)
- Visualized results with different color-coded clusters
- Determined the most suitable number of clusters
- Preprocessed and scaled the dataset
- Trained two classifiers:
- Logistic Regression
- K-Nearest Neighbours (KNN)
- Compared accuracy results between models
- Performed PCA with n_components=2 on Wine dataset
- Reduced dimensionality while preserving key variance
- Trained a Logistic Regression classifier on reduced data
- Visualized results with a scatter plot
- Clone this repository:
git clone https://github.com/yourusername/data-analytics-assessment1.git cd data-analytics-assessment1 - Install dependencies:
pip install -r requirements.txt
- Run any task script, e.g.:
python Task1_LinearRegression/linear_regression_stock.py
- Regression: Achieved meaningful stock price predictions with acceptable R² scores.
- Clustering: K-Means formed well-separated customer groups in the credit card dataset.
- Classification: Both Logistic Regression and KNN achieved strong accuracy; Logistic Regression performed slightly better.
- PCA: Successfully reduced data to 2D while maintaining classification effectiveness.