🧠 Sentiment Analysis with Logistic Regression

This model performs multi-class sentiment analysis on tweets, classifying them into the following categories:

Positive
Negative
Neutral
Irrelevant

It uses a custom preprocessing pipeline with:

CountVectorizer
TF-IDF transformation
Logistic Regression classifier (max_iter=1000)

🏗 Model Architecture

CountVectorizer: Converts tweets into token count vectors.
TfidfTransformer: Reweights tokens by importance.
LogisticRegression: Interpretable and robust classification baseline.

🧪 Evaluation

Evaluated on a separate validation set of 999 tweets:

Class	Precision	Recall	F1-score
Irrelevant	0.88	0.85	0.87
Negative	0.87	0.94	0.91
Neutral	0.97	0.86	0.91
Positive	0.89	0.94	0.91
Overall Accuracy			0.90

📦 Usage

python
import joblib

model = joblib.load("sentiment_model_lr.pkl")
user_input = "This update is surprisingly good!"

prediction = model.predict([user_input])
print(prediction[0])  # → Positive, Negative, etc.

> ⚠️ Requires scikit-learn 1.6.1+ to avoid version mismatch warnings.

📚 Dataset

Tweets were preprocessed using a clean_text routine and labeled into
the four sentiment categories. If you’d like to experiment or re-train, contact
the author or fork this repo.

🧑‍💻 Author

Built by @arshvir Model version: 1.0 License: MIT