ngrams

This is a ngrams package in C++, which can be used for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers.

It is a simplified implementation of Dr. Vlado Keselj's Text-Ngrams 1.6, which is a very flexible Ngram package in perl. See more information at http://users.cs.dal.ca/~vlado/srcperl/Ngrams/Ngrams.html

How to use it

download and save the source code.
$> make
$> ngrams --type=word --n=3 --in= sample.txt or $> ngrams --type=character -n=3 --in= sample.txt

That's it. If you found any bug or have any suggestion, please kindly send me email [email protected]. Thanks.

Zheyuan Yu. Feb 18,2006

Revisions

Mar 28, 2005. Zheyuan Yu Modify ternary search tree to improve performance, save pointer of TstItems into vector, instead of saving TstItems. It will save some time when vector grows.
Feb 18, 2006. Zheyuan Yu Initial implemenation

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CMakeLists.txt		CMakeLists.txt
CharNgrams.cpp		CharNgrams.cpp
CharNgrams.h		CharNgrams.h
INgrams.cpp		INgrams.cpp
INgrams.h		INgrams.h
Makefile		Makefile
README.md		README.md
WordNgrams.cpp		WordNgrams.cpp
WordNgrams.h		WordNgrams.h
config.h		config.h
mystring.h		mystring.h
ngrams.cpp		ngrams.cpp
ngrams.h		ngrams.h
ngrams.sln		ngrams.sln
ngrams.vcproj		ngrams.vcproj
sample.txt		sample.txt
string.cpp		string.cpp
ternarySearchTree.h		ternarySearchTree.h
text2wfreq.cpp		text2wfreq.cpp
text2wfreq.h		text2wfreq.h
vector.h		vector.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ngrams

How to use it

Revisions

About

Releases

Packages

Languages

random-mud-pie/ngrams

Folders and files

Latest commit

History

Repository files navigation

ngrams

How to use it

Revisions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages