Syntactic Analysis: an Overview

  • Bhumika Dutta
  • Sep 21, 2021
  • NLP
Syntactic Analysis: an Overview title banner

Natural Language Processing (NLP) is a very interesting field of study under machine learning that enables computers to understand the natural language of humans. To understand the complexity of languages, complex patterns need to be studied and analyzed via different processes starting from noisy and incomplete voice input processing through lexical identification, syntactic and semantic analysis, and language interpretation in context. 

 

In this article, we are mainly going to focus on Syntactic Analysis, which is a crucial part of NLP. We are going to discuss the following in brief:

 

  • Syntactic Analysis

  • Parsers

  • Grammar

  • Syntactic Analysis vs Lexical Analysis

 

(More to learn: NLP guide for beginners)


 

Syntactic Analysis

 

The first question that is bound to be asked by everyone is, What exactly is Syntactic Analysis? Syntactic analysis is described as the study of the logical meaning of specific phrases or portions of sentences. 

 

  • It is the process of analyzing the natural language with the rules of formal grammar to find out the dictionary meaning of any sentence. 

  • It is the third phase of NLP and it only works on a group of words or sentences. 

  • It does not work on individual words as individual words do not determine the overall grammar of any sentence. 

 

Syntactic analysis is also known as Syntax analysis or Parsing. To implement the task of parsing, we use parsers. Now let us learn what parsers are.

 

(Must read: NLP interview questions)

 

 

About Parser

 

We already know parsers are used to implement parsing, but what is the definition of a parser? It is described as a software component meant to take input text data and provide a structural representation of the data after validation for correct syntax using formal grammar. 

 

It also creates a data structure, which is often in the form of a parse tree, an abstract syntax tree, or another hierarchical structure. After searching over the space of a variety of trees, it attempts to identify an ideal tree for a certain text.

 

What are the types of Parsing?

 

Generally, there are two types of Parsing: Top-down parsing and Bottom-up parsing.

 

In top-down parsing, the parser builds the parse tree from the start symbol and then attempts to convert the start symbol to the input. The recursive technique is used to process the input in the most popular type of top-down parsing, but it has one major drawback: backtracking. 

 

Whereas, In bottom-up parsing, the parser begins with the input symbol and works its way up to the start symbol, attempting to create the parser tree. Now, these types of parsings are used by different parsers. 

 

(Also read: Applications of NLP)

 

What are the different types of Parsers?

 

The following are the types of parsers that are available:

 

  1. Recursive Descent Parser:

 

It is a to-the-point parser used frequently during parsing. It follows a top-down process where it checks if the syntax of the input is correct or not, by scanning the text from left to right. 

 

For these sorts of parsers, the required operation is to read characters from the input stream and match them with the terminals using grammar. We will learn about grammar later in this article. 

 

 

  1. Shift-reduce Parser:

 

Shift-reduce parsers use a bottom-up process, unlike recursive descent parsers. Its goal is to locate the words and phrases that correspond to the right-hand side of a grammatical production, replace them with the left-hand side, and try to find a word sequence that continues until the entire sentence is reduced. 

 

Thus this parser starts with the input symbol and builds the parser tree all the way to the start symbol. 

 

 

  1. Chart Parser:

 

Chart parser is mainly used for ambiguous grammars, like grammars of natural languages. It solves parsing difficulties using the dynamic programming idea. It saves partly theorized findings in a structure called a 'chart' as a consequence of dynamic programming. The 'chart' can also be utilized in a variety of situations.

 

 

  1. Regexp Parser:

 

It's one of the most popular parsers out there. On top of a POS-tagged string, it applies a regular expression defined in the form of grammar. Basically, it parses the input phrases using regular expressions and generates a parse tree as a result.

 

Now we know the types of parsing and types of parsers, let us learn about another important topic, Parse Trees. (Source)

 

(Suggested blog: Text mining techniques)

 

Parse Trees

 

A Parse tree is a graphical representation of a derivation. The root node of the parse tree is the start symbol of derivation, whereas the leaf nodes are terminals and the inner nodes are non-terminals. The most useful characteristic of the parse tree is that it produces the original input string when traversed in sequence.


 

About Grammar

 

Parsing is done to analyze the grammar of a sentence, so we must have a basic idea about the concept of grammar. To explain the syntactic structure of well-formed programs, grammar is highly significant. They imply syntactical norms for dialogue in natural languages in the literary sense. 

 

Since the beginning of natural languages such as English, Hindi, and others, linguists have sought to define the grammar. The theory of formal languages is also useful in computer science, particularly in the areas of programming languages and data structures. 

 

In the ‘C' programming language, for example, the precise grammar rules specify how functions are created from lists and instructions. 

 

(Recommended blog: Text Cleaning & Preprocessing in NLP)

 

What are the types of grammar?

 

There are three types of grammar that we will list out here: Constituency grammar, dependency grammar, and context-free grammar.

 

  1. Constituency grammar:

 

Constituency grammar is also known as phrase structure and is proposed by Noam Chomsky. It is based on constituency relation (hence, the name), and is completely the opposite of dependency grammar. 

 

The sentence structure in this type of grammar is seen via the lens of constituency relations in all relevant frameworks. The constituency connection is derived from Latin and Greek grammar's subject-predicate division. 

 

The noun phrase NP and verb phrase VP are used to understand the basic sentence structure. A parse tree that uses constituency grammar is known as a constituency-based parse tree.

 

 

  1. Dependency grammar:

 

The following are the most important aspects of Dependency Grammar and Dependency Relationship:

 

  • The linguistic units, i.e. words, are linked together via directed connections in DG.

  • The verb takes center stage in the sentence structure.

  • In terms of directed connection, all other syntactic elements are related to the verb. Dependencies are the syntactic components in question.

 

Parse trees that use dependency grammar are called dependency-based parse trees. 

 

 

  1. Context-free grammar:

 

Context-free grammar (CFG) is a superset of Regular grammar and a notation for describing languages. The following 4 components consisting of a finite set of grammar rules:

 

  • Set of Non-terminals:

 

It is indicated by the letter V. The non-terminals are syntactic variables that represent groups of strings that the grammar generates to help define the language.

 

  • Set of Terminals:

 

It's also known as tokens, and it's defined by Σ. The fundamental symbols of terminals are used to create strings.

 

  • Set of productions:

 

P is the symbol for it. The set specifies the possible combinations of terminals and non-terminals. Non-terminals, an arrow, and terminals make up every production(P) (the sequence of terminals). Non-terminals are referred to as the left side of the production, whereas terminals are referred to as the right side.

 

  • Start symbol:

 

The production process starts with the start sign. The letter S stands for it. The start symbol is always a non-terminal symbol.


 

Syntactic analysis vs Lexical analysis:

 

The main difference between syntactic analysis and lexical analysis is that lexical analysis is concerned with data cleaning and feature extraction with techniques like stemming, lemmatization, correcting misspelled words, and many more. Whereas in syntactic analysis, the roles played by words in a sentence are analyzed, the relationship between different words in the sentence is determined, and the grammatical structure of the sentence is interpreted.

 

For example, if we look into two sentences:

 

“Delhi is the capital of India” and “Is Delhi the capital of India?” 

 

In these two sentences, the words are the same, yet the first sentence is more decipherable than the second, making the first one syntactically correct. However, using basic lexical processing approaches, we are unable to make these differences. 

 

As a result, more advanced syntax processing algorithms are required to comprehend the link between individual words in a phrase. The following diagram shows the relation between lexical analysis and syntactical analysis:


The image is showing an active interaction amid lexical analyzer and a parser.

Interaction between lexical analyzer and a parser


The syntactic analysis uses the following techniques that are not used by lexical analysis.

 

  • Words Order and Meaning:

 

The goal of the syntactical analysis is to extract the relationship between words in a document. It will be difficult to understand the statement if the words are rearranged in a different sequence.

 

 

  • Retaining Stop-words:

 

It is possible to completely change the meaning of a phrase by removing the stop-words. So stop-words are required to be retained.

 

 

  • Morphology of Words:

 

Stemming and lemmatization will reduce the words to their simplest form, changing the sentence's syntax.

 

 

  • Parts-of-speech of Words in a sentence:

 

It's crucial to determine a word's right part of speech.

 

(Top reading: Text Analytics and Models in NLP)


 

Conclusion

 

NLP is getting more and more popular every day as it has many applications like chatbots, voice assistants, speech recognition, and many more. Syntactic analysis is a very important part of NLP that helps in understanding the grammatical meaning of any sentence. In this article, we have discussed the definition of syntactic analysis or parsing, talked about the types of parsers, and understood the basic concept of grammar. We have also learned the difference between syntactic analysis and lexical analysis.

Comments