11 projects tagged "NLP"

No download Website Updated 26 Apr 2010 Okapi Framework

Screenshot
Pop 35.58
Vit 1.46

The Okapi project’s main purpose is to architect a set of building blocks for the creation of larger open source localization and translation tools. But many Okapi components are generic enough to be of interest to the text mining, natural language processing, and text retrieval communities. Okapi’s many text filters (HTML, Properties, XML (ITS XPath-based rules), OpenXML, ODF, Regex etc.) provide a straightforward way to access the text of multiple document formats. Its document events and pipeline can be made to integrate with other frameworks such as UIMA, LingPipe, OpenPipeline, OpenNLP, GATE, and Lucene. The advantage of Okapi’s text filters is that not only is text extracted, but all non-textual formatting is preserved. It is possible to decompose a document into events, process them via the pipeline, and then rebuild the input document without loss. Structural information can be added to Okapi document events so that tables, lists, links, titles etc. are grouped together and treated as a unit. This is useful when context based on a “universal” document structure is needed. The Okapi event model supports user configurable annotations, similar to UIMA, but simpler and more restricted in scope. User can annotate spans of text or add new resources such as translation memory matches, terminology, token types, or part of speech information.

Download Website Updated 29 Nov 2011 Apache OpenNLP

Screenshot
Pop 85.18
Vit 1.49

Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.

No download Website Updated 15 Dec 2011 foma

Screenshot
Pop 53.85
Vit 1.00

foma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes. It comes with an xfst-compatible interface and regular expression language. The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, and boolean operations. More advanced construction methods are also available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.

No download No website Updated 14 Feb 2014 TreeTagger for Java

Screenshot
Pop 72.69
Vit 5.87

TreeTagger for Java (TT4J) is a Java wrapper around the popular TreeTagger package by Helmut Schmid, a language independent part-of-speech tagger and lemmatizer. It was written with a focus on platform-independence and easy integration into applications.

No download No website Updated 29 Apr 2014 DKPro Core

Screenshot
Pop 100.12
Vit 7.68

DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released continuously. The components cover the whole range of NLP-related processing tasks. DKPro Core provides wrappers for such third-party tool as well as original NLP components. DKPro Core builds heavily on uimaFIT which allows for rapid and easy development of NLP processing pipelines.

No download No website Updated 28 Nov 2013 TWSI

Screenshot
Pop 36.08
Vit 1.69

TWSI is software that produces lexical substitutions in context for over 1000 frequent nouns. It processes English text. This functionality is realized by a supervised word sense disambiguation system, which is trained by sense-labeled occurrences of target words. A classification model is trained for each word, and used to decide which sense an unseen occurrence most likely belongs to. Associated with senses are lists of substitutions, which are injected into the text using inline annotation.

No download No website Updated 30 Nov 2013 DKPro WSD

Screenshot
Pop 56.25
Vit 2.08

DKPro WSD provides UIMA components which encapsulate corpus readers, linguistic annotators, lexical semantic resources, WSD algorithms, and evaluation and reporting tools. You configure the components, or write new ones, and arrange them into a data processing pipeline. DKPro WSD is modular and flexible. Components which provide the same functionality can be freely swapped. You can easily run the same algorithm on different data sets, or test several different algorithms on the same data set.

No download Website Updated 28 Nov 2013 JobimText

Screenshot
Pop 35.89
Vit 1.01

JobimText provides a software solution for automatic text expansion using contextualized distributional similarity.

No download Website Updated 15 Sep 2013 JWKTL

Screenshot
Pop 38.08
Vit 1.01

JWKTL (Java-based Wiktionary Library) is an application programming interface for the free multilingual online dictionary Wiktionary. Wiktionary is collaboratively constructed by volunteers and continually growing. JWKTL enables efficient and structured access to the information encoded in the English, German, and Russian Wiktionary language editions, including sense definitions, part of speech tags, etymology, example sentences, translations, semantic relations, and many other lexical information types.

No download Website Updated 14 Sep 2013 JOWKL

Screenshot
Pop 31.40
Vit 16.08

JOWKL (Java OmegaWiki Library) is a Java-based application programming interface which allows the user to access all information in the free, multilingual online dictionary OmegaWiki.

Screenshot

Project Spotlight

Clzip

A C language version of lzip.

Screenshot

Project Spotlight

iBoostUp

A system optimization tool.