top of page
Search
Writer's pictureDR.GEEK

Semantic categorization

(27th-March-2020)


Once the text for the speech is obtained then we can start semantic categorization of the text in order to determine the speech category. The first important step in the text categorization part is of NLP. The important parts of NLP include tokenization of text, NER (Named Entities Recognition) and stop words removal. We need to tokenize the text as we have to perform semantic enrichment that would further help in performing milestone based categorization. Three class NER involves determining the names of persons, organizations and places. Finally, we need to remove those tokens that have no effect on the category of the text.


we make query against YAGO2s knowledgebase which is an upper level ontology. Through this query we can determine the category of the token. At this point we will have category for each token of the text. Finally, the category which has the maximum count can be considered as the final category for the speech text. The final category can be further classified into ‘Personal’ or ‘General’ categories as an individual speech can be at a broader level either personal like on some family related topic or general like political discussion. Next the categorized speech of the person will be uploaded to some server machine using the Internet. This completes part one of the proposed architecture.

2 views0 comments

Recent Posts

See All

Comments


bottom of page