NLP

Natural Language Processing (NLP) is a subfield of Artificial Intelligence that focuses on the interaction between computers and humans using natural language. The goal of NLP is to enable computers to understand, interpret, and generate human language in a way that is meaningful and useful.
NLP involves a variety of tasks, such as text classification, sentiment analysis, named entity recognition, machine translation, question answering, and speech recognition. These tasks are often achieved using machine learning and deep learning techniques, such as neural networks.

NLP has many practical applications, including chatbots, virtual assistants, sentiment analysis for social media monitoring, automated translation services, and text-to-speech systems. NLP is also used in the field of healthcare for clinical decision support and electronic health record management.
As the amount of text data continues to grow, the importance of NLP in business and research is becoming increasingly evident. The development of new algorithms and techniques is advancing the field, and there is a growing need for professionals with NLP expertise in a variety of industries.

READ MORE

Topics

The guessing game is an AI game wherein at the beginning, user is allocated 5 points. The user is presented with a set of blanks that demonstrates the length of the word the user has to guess. User guesses any letters that could be in the word. If the user guessed the letter that is present in the word, a point is added to the user's score. For every incorrect guess, 1 point is deducted from the user's score. If the score becomes zero, the user looses. If the user guesses the whole word correctly, they win.

Topics covered

  • Natural Language Processing (NLP) fundamentals
  • Data pre-processing and preparation
  • Training and testing machine learning models
  • Performance evaluation and optimization
  • Removing non-alpha tokens
  • Removing stopwords
  • Lemmatizing tokens
  • POS tagging

| View on GitHub |

A web crawler, also known as a spider, is a software program that systematically browses the World Wide Web to index and retrieve information. The process of web crawling involves automatically following links from one web page to another, and extracting information from each page visited.

Web crawlers are commonly used by search engines to collect data on web pages and create an index that can be used to respond to search queries. The crawler starts with a list of URLs to visit, and then follows the links on each page to discover new URLs to visit. As it visits each page, the crawler extracts information such as the page title, URL, and content. Web crawlers can also be used for other purposes, such as data mining, web scraping, and monitoring changes to a website.

Topics covered

  • Web scraping and data extraction
  • Natural Language Processing (NLP) fundamentals
  • Data pre-processing and preparation
  • Text analysis and visualization
  • Filter and Crawl relevant links
  • Scrape and Clean the web sites data
  • Calculate TF-IDF
  • Create Knowledge Base for the chatbot
  • Storing Knowledge Base to a database

| View on GitHub | Report |

Wordnet and sentwordnet summary

WordNet is a lexical database that organizes English words into groups of synonyms called synsets. Each synset represents a distinct concept, and each word in the synset is considered a possible synonym of the others. WordNet also includes relationships between synsets, such as hyponymy (where one synset is a subtype of another), meronymy (where one synset is a part of another), and antonymy (where one synset is the opposite of another). WordNet is widely used in natural language processing (NLP) and computational linguistics for tasks such as text classification, word sense disambiguation, and information retrieval.

SentiWordNet is an extension of WordNet that assigns sentiment scores to synsets. Each synset is given three scores between 0 and 1, representing the degree of positivity, negativity, and neutrality associated with the concept it represents. These scores are generated by combining the scores of its component words, which are manually annotated with positive and negative sentiment values. SentiWordNet is commonly used in sentiment analysis applications, where it can be used to determine the overall sentiment of a piece of text based on the sentiment scores of its component words.

Topics covered

  • Lexical semantics and ontologies
  • Word sense disambiguation
  • Semantic similarity and relatedness
  • NLP applications such as text classification and information retrieval
  • Synset

| View on GitHub | Report |

In natural language processing (NLP), n-grams are contiguous sequences of n items (usually words) in a text. For example, a 2-gram (also called a bigram) might be "natural language," and a 3-gram (also called a trigram) might be "machine learning algorithms."

N-grams are commonly used in NLP for a variety of tasks, such as language modeling, text classification, and information retrieval. One of the main benefits of using n-grams is that they capture the local context of words in a text. For example, a 2-gram like "ice cream" provides more specific information than the individual words "ice" and "cream" would on their own.

Topics covered

  • Text processing and representation
  • Language modeling and prediction
  • Statistical methods in NLP
  • Applications such as machine translation and speech recognition
  • Unigram
  • Bigrams

| View on GitHub | Report |

Sentence parsing, also known as syntactic parsing or parsing, is a natural language processing (NLP) task that involves analyzing the grammatical structure of a sentence to determine its underlying syntactic relationships. The goal of sentence parsing is to produce a structured representation of a sentence that captures its grammatical structure.

There are two main approaches to sentence parsing in NLP: rule-based parsing and statistical parsing. Rule-based parsing involves using a set of predefined rules or grammar to analyze the structure of a sentence. These rules can be based on formal grammars, such as context-free grammars or dependency grammars. Statistical parsing, on the other hand, involves using machine learning algorithms to automatically learn the rules or patterns that govern the syntactic structure of language.

Topics covered

  • Syntax and grammatical rules in NLP
  • Parsing algorithms and techniques
  • Dependency and constituency parsing
  • Applications such as named entity recognition and sentiment analysis
  • Constituent parsing
  • Part-of-speech tagging
  • Semantic role labeling
  • PSG Tree
  • Dependency parsing
  • SRL Parser

| View on GitHub | Report |

A chatbot, also known as a conversational agent, is a software program designed to simulate human-like conversations with users, either through text or voice interactions. Chatbots use natural language processing (NLP) and artificial intelligence (AI) technologies to understand and respond to user queries and requests. Chatbots can be designed for a variety of applications, such as customer service, e-commerce, entertainment, and personal assistance. They can be deployed on websites, mobile apps, messaging platforms, and voice assistants, such as Amazon Alexa and Google Assistant.

Chatbots can be categorized into two types: rule-based and AI-based. Rule-based chatbots follow predefined rules and scripts to respond to user inputs. They are typically limited in their capabilities and require manual intervention to update or modify their responses.

AI-based chatbots, on the other hand, use machine learning and natural language processing algorithms to understand and generate responses to user queries. They can learn from user interactions and improve their responses over time, without the need for manual intervention.

Topics covered

  • Natural Language Understanding (NLU)
  • Intent classification
  • Named Entity Recognition
  • Dialogue systems and conversational agents
  • Language generation and response planning
  • Evaluation and optimization of chatbots

| View on GitHub | Report |

Adaptive Testing and Debugging of NLP Models is an academic paper that refer to the process of iteratively testing and improving an NLP model based on feedback from users or other sources.

In adaptive testing, the NLP model is modified based on the results of previous tests. For example, if the model consistently fails to correctly identify a certain type of entity, such as a person's name, the model can be adjusted to improve its performance in that area. Adaptive testing can be used to improve the accuracy and efficiency of NLP models over time, as the model is updated to better handle real-world data and use cases.

Debugging NLP models involves identifying and fixing errors or bugs in the model's code or design. This can involve manually reviewing the model's output and identifying areas where it is not performing as expected, or using automated debugging tools to identify and fix issues more quickly.

Both adaptive testing and debugging are important processes in the development and optimization of NLP models, as they help to ensure that the model is accurate, efficient, and effective at handling real-world data and use cases.

Topics covered

  • Model evaluation and quality assessment
  • Error analysis and diagnosis
  • Data-driven approaches to model debugging
  • Practical tips and tools for debugging NLP models

| Report |

Text classification is the process of assigning predefined categories to text data. In this project, we use the scikit-learn library to perform text classification on a dataset of fake news. The program can be used for various applications such as sentiment analysis, spam detection, and topic modeling.

Topics covered

  • WordCloud
  • CountVectorizer
  • TfidfVectorizer
  • Naive Bayes
  • LogisticRegression
  • MLPClassifier
  • Deature extraction
  • Model selection
  • Evaluation metric: accuracy, precision, recall, f1-score

| View on GitHub | Report |

Keras is a high-level neural network API that allows for rapid prototyping of deep learning models. In the context of NLP, Keras can be used for tasks such as sentiment analysis, named entity recognition, and machine translation.

Topics covered

  • Deep learning for NLP
  • Convolutional Neural Networks (CNNs) for text classification
  • Recurrent Neural Networks (RNNs) for text classification
  • LSTM for text classification
  • Embedding
  • Predefined Embedding
  • Transfer learning
  • Model interpretation and visualization

| View on GitHub | Report |

Technical Skills

Soft Skills

About Me

I am passionate about NLP and constantly seek to learn more about this rapidly changing field. I have plans to work on personal projects that involve updating the chatbot and building a recommendation system for online news articles. To keep up with the latest developments, I read research papers, participate in online communities, and attend conferences and workshops.

I am also interested in possible employment opportunities that allow me to apply my skills and contribute to the NLP community.

Special Thanks

Grateful for Dr. Mazidi, Karen's exceptional dedication to teaching and creating a supportive learning environment.