Jan 2024 - May 2024
Sentiment Analysis using Naive Bayes
NLP classification pipeline for movie review sentiment.
This project builds a text classification pipeline that predicts review sentiment using preprocessing, tokenization, word frequency distributions, Laplace smoothing, and log-probability scoring.
Generated from project structure
NLP Classification Pipeline
Raw Review
Preprocessing
Tokenization
Naive Bayes
Highlight
NLP Pipeline
Highlight
Naive Bayes Model
Highlight
Laplace Smoothing
Highlight
CLI Workflow
Executive Summary
This project builds a text classification pipeline that predicts review sentiment using preprocessing, tokenization, word frequency distributions, Laplace smoothing, and log-probability scoring.
Problem Statement
Raw text needs structured preprocessing and robust probability scoring before it can be classified reliably. This project demonstrates a foundational NLP workflow from data processing to prediction.
What I Built
Text preprocessing
Tokenization
Laplace smoothing
Configurable datasets
CLI execution
How It Works
A conceptual workflow showing how the project moves from input to processing and output.
Step 1
Dataset
Step 2
Cleaning
Step 3
Tokenization
Step 4
Word Frequency Training
Step 5
Log Probability Scoring
Step 6
Sentiment Prediction
Architecture / System Design
A simplified system view of the major project components and how responsibilities connect.
Step 1
Text Input
Step 2
Preprocessor
Step 3
Feature Extractor
Step 4
Naive Bayes Classifier
Step 5
Prediction Output
Technical Implementation
Preprocessing
- Lowercasing
- Punctuation removal
- Tokenization
Model
- Word frequency distributions
- Laplace smoothing
- Log-probability scoring
Workflow
- Configurable datasets
- CLI execution
- Positive/negative classification
Tools
- Python
- NLP fundamentals
- Probabilistic modeling
Screenshots & Visuals
Real project screenshots and outputs appear first. Where a project has no existing screenshots, the visuals are grounded diagrams or output previews based on the actual project structure.
NLP Pipeline Diagram
Grounded pipeline visual showing the implemented preprocessing, tokenization, Naive Bayes scoring, smoothing, and sentiment prediction flow.
Classification Output Example
Grounded output card based on the project's review classification workflow and probability-scoring approach.
Classification Preview
Input:
"The movie was surprisingly emotional and well acted."
Prediction:
Positive ReviewChallenges & Solutions
Challenge
Raw text is noisy and cannot be modeled directly.
Solution
Built a preprocessing pipeline for lowercasing, punctuation removal, and tokenization.
Challenge
Unseen words can break simple probability estimates.
Solution
Used Laplace smoothing and log-probability scoring for more stable classification.
Results / Impact
Demonstrates practical software engineering through modular structure, readable workflows, and clear technical documentation.
Shows ability to convert course and research concepts into working systems with real implementation constraints.