Tools & Technologies

DR.GEEK
Oct 11, 2019
3 min read

( 12th October 2019 )

The following tools & technologies can be utilized for the Sentimental Analysis purposes:

1. Jericho HTML Parser (for web scrapping in Java)

2. J-Soup HTML Parser (for web scrapping in Java)

3. Senti-WordNet (contains sentiment score for various tokens)

4. Rita WordNet

5. Stanford NLP (for performing NLP in Java)

6. Apache Open NLP

7. NLTK (for performing NLP in Python)

8. TextBlob (in Python)

9. Tweepy (Tweets library in Python)

10. Pandas (Graph library in Python)

11. Twitter4J (for extracting tweets in Java)

Below we provide more details regarding the above items.

1) Jericho HTML Parser

It is a popular HTML parser for Java language. It is an open source HTML parser.

2) J-Soup

It is an HTML parser for the Java language. The source code of J-Soup is open sourced. It is MIT licensed which is a more commercial friendly license. It is used in applications like Sentiment Analysis, Web Scrapping etc. In addition to desktop system it also supports the Android platform.

3) Senti-WordNet

Senti-WordNet adds sentiment score to every token of the WordNet database. It can be an ideal tool for sentimental analysis applications. It is CC BY-SA 3.0 licensed.

Fig-1: Senti-WordNet Illustration (http://ontotext.fbk.eu/sentiwn.html)

1) Rita Wordnet

Rita wordnet can be utilized for accessing the wordnet database.

2) Stanford NLP

It is one of the popular NLP library for Java language. It supports NLP operations like Named Entity Recognition (NER), Part of Speech (POS) detection, tokenization etc. Popular applications for Stanford NLP includes chat bot design, sentimental analysis etc. It supports multiple languages.

3) Apache Open NLP

Apache Open NLP is an open source NLP library for Java language. It is a machine learning based toolkit. Popular NLP operations involve sentence detection, tokenization, Named Entity Recognition (NER) etc. It supports natural language text in languages other than English also.

4) NLTK

NLTK stands for Natural Language Tool Kit. It can be termed as the most popular library for performing NLP in Python. It supports different NLP operations like tokenization, Named Entity Recognition (NER), Part of Speech (POS) tagging etc. One of the applications of NLTK involves summarizing large amounts of text. It supports languages other than English also.

Fig-2: Sample Source Code (https://cloud.archivesunleashed.org/derivatives/text- sentiment)

1) TextBlob

It is a Python library for processing of the textual data. It also includes a built-in sentiment analyzer. It is built upon NLTK which is the Natural Language Tool Kit in Python. TextBlob can also perform other NLP related tasks like Part of Speech (POS) tagging. It can be easily installed using the pip command.

2) Pandas

This library can be utilized for generating graphs in Python. In fact, it is a data analysis library for Python. It is one of the most popular Python libraries. It can work for data that is in tabular format. It can also help in filtering of data.

Fig-3: Python Pandas Data Frame (https://www.shanelynn.ie/)

1) Tweepy

Tweepy can be utilized in Python for extracting tweets. This library can be installed using the pip package manager for Python. It should be noted that Twitter adopts a rate limitation strategy that is we can extract tweets for a certain time period only. Afterwards we have to wait for a certain time period for fetching more tweets.

2) Twitter4J

Twitter4J can be utilized in Java applications for extracting tweets from Twitter. Commercial and non-commercial both usages are allowed. Developers need to have Twitter account. Additionally, they need to create an application under their account and generate credentials which will be utilized in their Java application finally. More information is available at http://twitter4j.org/en/

3) Sentiment Analyzer Flow Diagram:

The following diagram describes Sentiment Analyzer flow diagram. Firstly, some words which are not too much important from the sentiment point of view can be removed. These words include words like is, am, are, was, were etc. Secondly short forms need to be converted into full form like n’t into not etc. At this stage the text is in correct form and sentiment analysis can be performed on it.

Fig-4: Sentimental Analysis Flow Diagram

Monologue of

Dr. GEEK

Daily Blog by Dr. GEEK

Tools & Technologies

Recent Posts

Comments