Soojal Kumar
Back to Projects

Jan 2024 - May 2024

Sentiment Analysis using Naive Bayes

NLP classification pipeline for movie review sentiment.

This project builds a text classification pipeline that predicts review sentiment using preprocessing, tokenization, word frequency distributions, Laplace smoothing, and log-probability scoring.

PythonNLPMachine LearningText ProcessingNaive Bayes

Generated from project structure

NLP Classification Pipeline

Sentiment Analysis Naive Bayes NLP pipeline

Raw Review

Preprocessing

Tokenization

Naive Bayes

Highlight

NLP Pipeline

Highlight

Naive Bayes Model

Highlight

Laplace Smoothing

Highlight

CLI Workflow

Executive Summary

This project builds a text classification pipeline that predicts review sentiment using preprocessing, tokenization, word frequency distributions, Laplace smoothing, and log-probability scoring.

Problem Statement

Raw text needs structured preprocessing and robust probability scoring before it can be classified reliably. This project demonstrates a foundational NLP workflow from data processing to prediction.

What I Built

Text preprocessing

Tokenization

Laplace smoothing

Configurable datasets

CLI execution

How It Works

A conceptual workflow showing how the project moves from input to processing and output.

Step 1

Dataset

Step 2

Cleaning

Step 3

Tokenization

Step 4

Word Frequency Training

Step 5

Log Probability Scoring

Step 6

Sentiment Prediction

Architecture / System Design

A simplified system view of the major project components and how responsibilities connect.

Step 1

Text Input

Step 2

Preprocessor

Step 3

Feature Extractor

Step 4

Naive Bayes Classifier

Step 5

Prediction Output

Technical Implementation

Preprocessing

  • Lowercasing
  • Punctuation removal
  • Tokenization

Model

  • Word frequency distributions
  • Laplace smoothing
  • Log-probability scoring

Workflow

  • Configurable datasets
  • CLI execution
  • Positive/negative classification

Tools

  • Python
  • NLP fundamentals
  • Probabilistic modeling

Screenshots & Visuals

Real project screenshots and outputs appear first. Where a project has no existing screenshots, the visuals are grounded diagrams or output previews based on the actual project structure.

Sentiment Analysis Naive Bayes NLP pipeline
WorkflowGenerated from project structure

NLP Pipeline Diagram

Grounded pipeline visual showing the implemented preprocessing, tokenization, Naive Bayes scoring, smoothing, and sentiment prediction flow.

Sentiment Analysis classification output example
OutputGenerated from project structure

Classification Output Example

Grounded output card based on the project's review classification workflow and probability-scoring approach.

Classification Preview

Input:
"The movie was surprisingly emotional and well acted."

Prediction:
Positive Review

Challenges & Solutions

Challenge

Raw text is noisy and cannot be modeled directly.

Solution

Built a preprocessing pipeline for lowercasing, punctuation removal, and tokenization.

Challenge

Unseen words can break simple probability estimates.

Solution

Used Laplace smoothing and log-probability scoring for more stable classification.

Results / Impact

Demonstrates practical software engineering through modular structure, readable workflows, and clear technical documentation.

Shows ability to convert course and research concepts into working systems with real implementation constraints.