Natural Language Processing`(1`)

4 minute read

In this course you will learn how to solve common NLP problems using classical and deep learning approaches.

Prerequisites

Product rule, sum rule, Bayes’s theorem, likelihood maximization
Classification, clustering, and regression tasks in machine learning
Loss functions, training vs inference, overfitting problem
Optimization techniques, e.g. (stochastic) gradient descent
Deep Learning architectures, e.g. Recurrent and Convolutional Neural Networks
Python programming and willingness to learn new tool, e.g. Tensorflow.

Main approaches in NLP

There are three main groups of methods in NLP. One Group would be about rule-based approaches. Another one would be traditional machine learning. And the last one would be deep learning.

Rule-based approaches

regular expressions would go to this group. And context-free grammars. The CFG show you what would be the rules to produce some words. When you have this CFG, you can use it to parse your data.

Advantages and disadvantages`: usually done manually. You have to write all those rules. It would be the precision of this approch. Usually, rule-based approaches have high precision but low recall.

Machine learning system

To do that, first of all you need some training data. You need a corpus with some markup. After training data, need to do some feature engineering(like for example is the word capitalized?). Then, you need to define your model. The parameters of the model should be trained. You will need to take your train data and fit your model to this data. For the inference, you will apply it, and you will find the most probable text for your words with some fixed parameters. This is called inference or test or deployment.

Deep learning

There also have, inference, this stages but usually don’t have stage of feature generation.

Text preprocessing

We can think of text as a sequence of tokens
Tokenization is a process of extracting those tokens
We can normalize tokens using stemming or lemmatization
We can also normalize casing and acronyms

Feature extraction from text

n-gram:
Let’s remove some n-grams based on their occurrence frequency in our document corpus(how many documents have a particular n-gram divided by the total number of documents.). Which can be removed? High frequency n-grams. These are most likely words like "are", "is", "the" which might be meaningless. Low frequency n-gram. Here we can have typos and rarely used words. They might lead to overfitting.

Summary

We’ve made simple counter features in bag of words manner
You can add n-grams
You can replace counters with TF-IDF values

Linear models for sentiment analysis

Let’s consider a bag-of-words representation for text. Which models are better suited for such sparse features? Linear model, Naive Bayes

Hashing trick in spam filtering

Calculate a hash of string as a code of its first character is bad hash function. We will have tons of collisions! All words starting with a same character will have the same hash value!

Share on

Twitter Facebook LinkedIn

peter(星期八)