Natural language processing

Ing. Daniel Hládek PhD.

daniel.hladek@tuke.sk

Natural language is highly ambiguous

We can say the same thing in different ways
One statement can have many different meanings
We often transmit non-verbal information during communication:
- Feelings
- Gestures
- Accent and style of speech

Homonyms:

    I'm sitting at school right now. I am not familiar with civil law.
    That car costs 10,000 euros. The car is standing on the side of the road.

Synonyms:

    I went to Bratislava. I went to Blava.

Indeterminate order of words in a sentence:

    Today is a nice day. It's a nice day today. The day is nice today.

Neologisms and slang terms:

Google it and then post it on fb.

Emotions and social conventions:

    Sir! You did a great job!

Typos:

    See the lecture.

Computer language is unambiguous We need methods for working with uncertainty

There is a growing need to process large amounts of human-generated text or spoken speech

Natural Language Processing (NLP)

A combination of several techniques from the field of:

Machine learning
Linguistics
Theory of formal languages
Statistics
Psychology

Natural language processing helps in common activities by acquiring knowledge

data => information => knowledge

text => features => findings

Knowledge is useful information

(can be converted into money).

Typical NLP tasks

Your every day Google, Facebook, Apple

Some Google NLP services:

Question answering
Full text search
Advertising targeting
Machine translation

Some Facebook NLP Services:

Sentiment evaluation (for ad. targeting)
hate speech detection
Spam detection

Some Apple NLP Service

Siri assistant

Working with uncertainty in NLP

Classification of contexts or their sequences
Overwriting the sequence of symbols

Classification of contexts

Mapping:

    c => S

C: context: Sentence, Document
S: symbol: Some knwoledge about the context: Morphological marker, lemma, clause...

Tokenization

Process of identification of atomic units of meaning:

interpunction
words
subword units
letters, phones

Feature function

It helps us in classification if we know which part of the context is important for classification.

Feature function

Such a binary context function that is true only if the given flag occurs in the context. A suitable set of symptom functions helps us to solve the problem.

Word
Ending, Root of the word
Previous word, Next word
First letter type

Feature function

Mapping

    Symbol => unit vector

    today => 0000100001

Classifier of contexts

Feature extraction, classification

    symbol => feature vector => class

Classifier of contexts

Human knowledge in the form of rules
Statistical information from training corpora
A combination of both approaches

Rules

Dictionaries
Formal grammar
Regular expressions

Statistical approaches

Hidden Markov Models
N-gram model
Support Vector Machine

Deep neural networks

LSTM, Convolutional networks, Transformers

Computationally demanding

Rewriting the sequence of symbols

Mapping:

    sequence => another sequence

Rewriting the sequence of symbols

machine translation
correction of typos and grammar
dialogue systems

Encoder-Decoder

Encoder:

symbols => signs => meaning vector

Decoder:

model and meaning vector => output symbols

Encoder Decoder

Deep neural networks

You too can do NLP

General programming language

Python

General libraries for machine learning

keras
pytorch

General libraries for NLP

Sleepy
Flair

Machine translation

fairseq

Extraction of semantic features

heads
fasttext
word2thing

Obtaining information

ation and log processing

Elasticsearch

Dialogue systems and language comprehension

RACE

Bibliography

Jurafsky, Martin: Natural Language Processing Christopher Manning: Natural Language Processing, Stanford University Online Video Lectures