The probabilistic blindness

Let your mind and heart read this piece carefully, it is important to learn the expect from the unexpected. This post is going to stand up on certain things we really know it but we ignore what we…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Extractive Text Summary using TextRank

For someone familiar with NLP and Deep learning, the Seq2Seq models come first in mind when we talk about summarization. But the main issue with it is that it requires a significant amount of data and resources for training such networks. In this post, I’ll cover TextRank which doesn’t require any training at all.

The main purpose of this blog post is to provide an understanding of TextRank, which very intuitive way of summarizing the text.

Before going through TextRank, let us first understand, what are the ways we can summarize a text. Mainly there are two ways:

Extractive: It is similar to highlighting, We pick relevant sentences from the document, that make up the summary.

Abstractive: It is similar to reading the whole document and then making notes in our own words, that is used as a summary.

Quoting from the paper “TextRank — is a graph-based unsupervised method for keyword and sentence extraction”. Because it is an unsupervised ranking method, it does require any training and supervised data.

Focusing only on the summarization or sentence extraction, basic intuition behind text-rank is that, we want to extract sentences that can cover a major part of the text or which are a lot similar to other sentences. And in order to extract them, we somehow need to rank them based on this similarity criteria, this is where Page-rank comes into the picture. It is used for ranking the sentences based on their similarity.

Algorithm

Basically, this is it. Nothing complex and still provides decent results.

The weight of the edge is a value representing the similarity between the two sentences. Originally in the paper, it was calculated using a number of common words present in the sentence. In order to repress the selection of long sentences, the value is divided by the lengths of the sentences.

Finally, given two sentences Si and Sj, with a sentence being represented by the set of Ni words that appear in the sentence: Si = w1,w2,w3…wn the similarity of Si and Sj is defined as:

Thanks :)

Add a comment

Related posts:

How do we know engaging community matters?

When journalists engage the public at the beginning of the reporting process, it helps elevate stories and issues of underrepresented audiences, a report by Journalism That Matters (JTM) concludes…

Clearing the Nervous System of Trauma

Trauma can be defined as any experience of high intensity or long duration that cannot be fully processed by the body/mind in the moment it occurs. Often, especially in childhood, we may be unaware…

8 Types of Businesses That Can Greatly Benefit From Security Monitoring

There used to be a time when all a business had to do was bolt the doors shut when it was time to close shop and just call it a day. Today, however, businesses face all sorts of threats, both from…