Skip to content

Optimized implementation of suffix tree in python using Ukkonen's algorithm.

Notifications You must be signed in to change notification settings

kasravnd/SuffixTree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This module is an optimized implementation of Ukkonen's suffix tree algorithm in python which will be having most of the important text processing functionalities such as:

Search for strings:

Check if a string P of length m is a substring in O(m) time.
Find the first occurrence of the patterns P1,... ,Pq of total length m as substrings in O(m) time.

Find all z occurrences of the patterns P1,... ,Pq of total length m as substrings in O(m+z) time.

  • Search for a regular expression P in time expected sublinear in n
  • Find for each suffix of a pattern P the length of the longest match between a prefix of P[i... m] and a substring in D in image time. This is termed the matching statistics for P

Find properties of the strings:

  • Find the longest common substrings of the string Si and Sj in image time.
  • Find all maximal pairs, maximal repeats or supermaximal repeats in image time.
  • Find the Lempel–Ziv decomposition in image time.[10]
  • Find the longest repeated substrings in image time.
  • Find the most frequently occurring substrings of a minimum length in image time.
  • Find the shortest strings from image that do not occur in D in O(n+z) time, if there are z such strings.
  • Find the shortest substrings occurring only once in image time.
  • Find, for each i the shortest substrings of Si not occurring elsewhere in D in image time.

sources:

About

Optimized implementation of suffix tree in python using Ukkonen's algorithm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages