🤖 NLP Chatbot Using Cosine Similarity

Welcome to the NLP Chatbot Project! This is a first project demonstrates how to create a simple chatbot using cosine similarity for question answering.

🚀 Project Overview

Purpose: A chatbot that matches user queries to predefined questions and returns the corresponding answers.
Technologies: Python, NLTK, NumPy, scikit-learn

📝 Problem Statement

This chatbot:

Tokenizes and removes stopwords from user input.
Matches the input to a list of predefined questions using cosine similarity.
Returns the corresponding answer if a match is found.
Responds with "I can't answer this question." if no match is found.

🛠️ Requirements

Python 3.x
Google colab/jupyter notebbok
Libraries:
- nltk
- numpy
- scikit-learn
- pandas

📂 Dataset

Source: CSV file containing questions and answersregarding data analytics.
Path: test.csvas per your location

🔧 Setup

Mount Google Drive:

from google.colab import drive
drive.mount('/content/drive')

Import Libraries:

import numpy as np
import pandas as pd
import nltk
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Download NLTK Data:

nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')

Read Dataset:

path = r"/content/drive/MyDrive/IMP1DS INTERVIEW PREP2024/15.DSPROJECT2024/1.NLPPROJECTS2024/test.csv"
df = pd.read_csv(path, encoding='unicode_escape')
questions_list = df['Questions'].tolist()
answers_list = df['Answers'].tolist()

🔍 Preprocessing

Initialize Tools:

from nltk.stem import WordNetLemmatizer, PorterStemmer
from nltk.corpus import stopwords
import re

Preprocess Function:

def preprocess_with_stopwords(text):
    lemmatizer = WordNetLemmatizer()
    stemmer = PorterStemmer()
    text = re.sub(r'[^\w\s]', '', text)
    tokens = nltk.word_tokenize(text.lower())
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in tokens]
    stemmed_tokens = [stemmer.stem(token) for token in lemmatized_tokens]
    return ' '.join(stemmed_tokens)

📈 Vectorization

Setup Vectorizer:

vectorizer = TfidfVectorizer(tokenizer=nltk.word_tokenize)
X = vectorizer.fit_transform([preprocess_with_stopwords(q) for q in questions_list])

🤔 Response Generation

Get Response Function:

def get_response(text):
    processed_text = preprocess_with_stopwords(text)
    vectorized_text = vectorizer.transform([processed_text])
    similarities = cosine_similarity(vectorized_text, X)
    max_similarity = np.max(similarities)
    if max_similarity > 0.6:
        high_similarity_questions = [q for q, s in zip(questions_list, similarities[0]) if s > 0.6]
        target_answers = [answers_list[questions_list.index(q)] for q in high_similarity_questions]
        Z = vectorizer.fit_transform([preprocess_with_stopwords(q) for q in high_similarity_questions])
        final_similarities = cosine_similarity(vectorized_text, Z)
        closest = np.argmax(final_similarities)
        return target_answers[closest]
    else:
        return "I can't answer this question."

📊 Usage Example

Example Query:
```
get_response('Who is MS Dhoni?')
```

📚 Additional Tools

GingerIt for Grammar Check:

!pip install gingerit
from gingerit.gingerit import GingerIt
text = 'What is Data Anlytics'
parser = GingerIt()
corrected_text = parser.parse(text)
print(corrected_text['result'])

TextBlob for Spelling Correction:

!pip install textblob
from textblob import TextBlob
text = 'What is Data Anlytics'
blob = TextBlob(text)
corrected_text = blob.correct()
print(corrected_text)

Feel free to explore and contribute to the project! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
1.NLPPROJECTS		1.NLPPROJECTS
2.DEEPLEARINGPROJECT		2.DEEPLEARINGPROJECT
4.ML-DAPROJECTS2024		4.ML-DAPROJECTS2024
DataScience_Project24		DataScience_Project24
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 NLP Chatbot Using Cosine Similarity

🚀 Project Overview

📝 Problem Statement

🛠️ Requirements

📂 Dataset

🔧 Setup

🔍 Preprocessing

📈 Vectorization

🤔 Response Generation

📊 Usage Example

📚 Additional Tools

About

Releases

Packages

Languages

ParimalA24-DS/DATA-SCIENCEPROJECT24

Folders and files

Latest commit

History

Repository files navigation

🤖 NLP Chatbot Using Cosine Similarity

🚀 Project Overview

📝 Problem Statement

🛠️ Requirements

📂 Dataset

🔧 Setup

🔍 Preprocessing

📈 Vectorization

🤔 Response Generation

📊 Usage Example

📚 Additional Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages