( 02nd July 2019 )
Sentiment Analysis has become an important field these days. A simple search on a popular search engine will reveal several important resources for this field. An important application of sentiment analysis is to determine the public opinion or sentiment about different crypto-currencies in SNS. Using sentiment analysis techniques on SNS crypto-currencies reputation can be judged from the public point of view.
This document proposes a Sentimental Analysis method for crypto-currencies using multiple input sources.
Data Sources for Analysis:
Following data sources can be utilized for performing crypto-currency related sentimental analysis:
· YouTube Comments
Tasks List (Sub-modules Identification):
Topic Identification (Text Categorization)
It is one of the important subtasks for sentimental analysis for crypto-currencies. Social media can contain data regarding multiple topics like sports, religion, and crypto-currency. Therefore first we need to short list the relevant data using topic identification techniques or text categorization. This categorization can be done using semantic or machine learning techniques. Some of the subtasks involve tokenization, Natural Language Processing (NLP) and inference.
Custom Positive & Negative Words List
In addition to Senti-Wordnet we can create a custom list of positive and negative words for improving the sentiment score accuracy. For this we need to collect a list of words related to the crypto-currency domain.
Sentiment Analyzer:
After the identification of relevant data, each will be input into the sentiment analyzer. This sub-module will tokenize the provided input. Next for each token the sentiment will be calculated using the created positive and negative lists as well as the Senti-Wordnet. Next the sentiment score for each token will be added to calculate the overall sentiment of the given input.
A flow diagram for the Sentimental Analysis has also been provided in this proposal document.
Front End Designing
User interface need to be designed for the sentimental analysis tool. A web based user interface can be designed or a desktop application can be created.
Integration & Testing
After the unit testing of each sub-module, the last step needs integration and overall testing for finding as much bugs as possible.
Tools & Technologies:
The following tools & technologies can be utilized for the Sentimental Analysis purposes:
· Jericho HTML Parser (for web scrapping in Java)
· J-Soup HTML Parser (for web scrapping in Java)
· Senti-WordNet (contains sentiment score for various tokens)
· Rita WordNet
· Stanford NLP (for performing NLP in Java)
· Apache Open NLP
· NLTK (for performing NLP in Python)
· TextBlob (in Python)
· Tweepy (Tweets library in Python)
· Pandas (Graph library in Python)
· Twitter4J (for extracting tweets in Java)
Below we provide more details regarding the above items.
Jericho HTML Parser:
It is a popular HTML parser for Java language. It is an open source HTML parser.
J-Soup:
It is an HTML parser for the Java language. The source code of J-Soup is open sourced. It is MIT licensed which is a more commercial friendly license. It is used in applications like Sentiment Analysis, Web Scrapping etc. In addition to desktop system it also supports the Android platform.
Senti-WordNet:
Senti-WordNet adds sentiment score to every token of the WordNet database. It can be an ideal tool for sentimental analysis applications. It is CC BY-SA 3.0 licensed.
Senti-WordNet Illustration (http://ontotext.fbk.eu/sentiwn.html)
Rita Wordnet:
Rita wordnet can be utilized for accessing the wordnet database.
Stanford NLP:
It is one of the popular NLP library for Java language. It supports NLP operations like Named Entity Recognition (NER), Part of Speech (POS) detection, tokenization etc. Popular applications for Stanford NLP includes chat bot design, sentimental analysis etc. It supports multiple languages.
Apache Open NLP:
Apache Open NLP is an open source NLP library for Java language. It is a machine learning based toolkit. Popular NLP operations involve sentence detection, tokenization, Named Entity Recognition (NER) etc. It supports natural language text in languages other than English also.
NLTK:
NLTK stands for Natural Language Tool Kit. It can be termed as the most popular library for performing NLP in Python. It supports different NLP operations like tokenization, Named Entity Recognition (NER), Part of Speech (POS) tagging etc. One of the applications of NLTK involves summarizing large amounts of text. It supports languages other than English also.
Sample Source Code (https://cloud.archivesunleashed.org/derivatives/text-sentiment)
TextBlob:
It is a Python library for processing of the textual data. It also includes a built-in sentiment analyzer. It is built upon NLTK which is the Natural Language Tool Kit in Python. TextBlob can also perform other NLP related tasks like Part of Speech (POS) tagging. It can be easily installed using the pip command.
Pandas:
This library can be utilized for generating graphs in Python. In fact it is a data analysis library for Python. It is one of the most popular Python libraries. It can work for data that is in tabular format. It can also help in filtering of data.
Python Pandas Data Frame (https://www.shanelynn.ie/)
Tweepy:
Tweepy can be utilized in Python for extracting tweets. This library can be installed using the pip package manager for Python. It should be noted that Twitter adopts a rate limitation strategy that is we can extract tweets for a certain time period only. Afterwards we have to wait for a certain time period for fetching more tweets.
Twitter4J:
Twitter4J can be utilized in Java applications for extracting tweets from Twitter. Commercial and non-commercial both usages are allowed. Developers need to have Twitter account. Additionally they need to create an application under their account and generate credentials which will be utilized in their Java application finally. More information is available at http://twitter4j.org/en/
Comments