Skip to content

A node.js module that creates a term vector from a mixed text input. Supports stopword removal and customisable separators.

License

Notifications You must be signed in to change notification settings

fergiemcdowall/term-vector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NPM version NPM downloads MIT License

term-vector

A node.js module that creates a term vector from tokenized text. Use term-vector when implementing a vector space model

Works with Unicode!

Does ngrams!

const tv = require('term-vector') 
// alternatively if you are all fancy and new-fangled:
// import tv from 'term-vector'
const tokens = 'this is really really really cool'.split(' ')

// just make a simple term vector
tv(tokens)
// [
//   { term: [ 'cool' ], positions: [ 5 ] },
//   { term: [ 'is' ], positions: [ 1 ] },
//   { term: [ 'really' ], positions: [ 2, 3, 4 ] },
//   { term: [ 'this' ], positions: [ 0 ] }
// ]

// make a term vector with ngrams of length 1 and 2
tv(tokens, { ngramLengths: [ 1, 2 ] })
// [
//   { term: [ 'cool' ], positions: [ 5 ] },
//   { term: [ 'is' ], positions: [ 1 ] },
//   { term: [ 'is', 'really' ], positions: [ 1 ] },
//   { term: [ 'really' ], positions: [ 2, 3, 4 ] },
//   { term: [ 'really', 'cool' ], positions: [ 4 ] },
//   { term: [ 'really', 'really' ], positions: [ 2, 3 ] },
//   { term: [ 'this' ], positions: [ 0 ] },
//   { term: [ 'this', 'is' ], positions: [ 0 ] }
// ]

About

A node.js module that creates a term vector from a mixed text input. Supports stopword removal and customisable separators.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published