Gallery

Gallery.

Natural Language Processing

Natural Language Processing

This repository reviews useful tools to mine text, also infamously known as Text Mining.

About Text Mining

Text mining and or natural language processing is a huge field. There are many topics covered amongst them we can find the followings:

Tools

An interesting slide on several open-source text mining tools available. Here.

In the following you will find details on a non-exhaustive list of available tools:

R-Project (R)

The Text Mining Infrastructure in R book summaries .

Gate (java)

Weka (java)

Weka does not seem to support text as such. Will require preprocessing before we can use it as a tool for our goal. I currently see 3 packages available for text classification only: DMNBtext, SparseGenerativeModel and bayesianLogisticRegression.

note: package manager is available since v3.7.2, so be sure to download the latest version (not the “stable” one).

OpenNLP (Java)

jtopia

https://github.com/srijiths/jtopia

Term extractor.

POS

https://github.com/brendano/ark-tweet-nlp https://gate.ac.uk/wiki/twitie.html

http://factorie.cs.umass.edu/ https://github.com/factorie/factorie

NLPTK (Python)

[more to come]

Pattern (Python)

Includes many API access points (including Twitter, Facebook and Wikipedia) and many interesting text mining/analyses methods.

pip install pattern should be enough to be able to use the Python module. And then you should be able to run the following code:

from pattern.web import Twitter, plaintext
from pattern.en import parse, sentiment, ngrams, pprint
brand = "coca-cola"
twitter = Twitter(language='en')
for tweet in twitter.search(brand, cached=False, count=5):
    print plaintext(tweet.text)
    print sentiment(tweet.text)

scikit-learn (Python)

[more to come]

Other Miscellaneous

Distributed Text Mining

See:

#

This repository reviews useful tools to mine text, also infamously known as Text Mining.

Text mining and or natural language processing is a huge field and there are many topics covered. Before using the tools, we need to understand what we are talking about…

Vocabulary

Grammar Basics

Sentences are made up of words. Words have a syntactic role (noun, verb, adjective, …) depending on their location in the sentence. For example, can can be a verb or a noun, depending on the context (the can, I can).

Metrics

Processing: Text > Preparation > Exploration > Structure Analysis > Semantic Analysis > Meaning

Text Mining Step-by-Step

!!! We may consider identifying intermediate classes for each client/date

Pipeline

Tools

An interesting slide on several open-source text mining tools available. Here.

In the following you will find details on a non-exhaustive list of available tools:

OpenNLP (Java)

StanfordNLP (Java)

Chalk (Scala)

R-Project (R)

The Text Mining Infrastructure in R book summaries .

Libraries for Semantic Role Labeling (Java)

Other verticals:

NLPTK (Python)

Clips-Pattern and Clips-MBSP (Python)

Includes many API access points (including Twitter, Facebook and Wikipedia) and many interesting text mining/analyses methods.

scikit-learn (Python)

Gate (Java)

Language Detection

Weka (Java)

Other useful libraries

Distributed Text Mining

See following information related to distributed text mining:

Tools by Language

Natural Language Processing in JAVA

Natural Language Processing in Scala

Natural Language Processing in Python

Natural Language Processing in javascript