pystringmatcher

description

a small utility tool for finding substrings and text patterns in an input file the tool is cutting the text in the file into chunks and processes every chunk in a separate process using python's multiprocessing module

installation:

pip install pystringmatcher

usage:

using the python module

python -m py pyringmatcher -h

Finding text patterns in input text file

optional arguments:
  -h, --help            show this help message and exit
  -f FILE_PATH, --file FILE_PATH
                        the input file to search the patterns in
  -p PATTERNS, --patterns PATTERNS
                        the pattern\s to search in the file separated by ,
  -n NUM_LINES_PER_CHUNK, --num-lines NUM_LINES_PER_CHUNK
                        the number of lines per chunk of text from the input file

or by using the included console script

stringmatcher -h

In your own program

from pystringmatcher.Algorithms import RabinKarp
from pystringmatcher.Types import TextFile


try:
    text = TextFile(file_path="/path/to/file")
    algorithm = RabinKarp()
    chunks = text.divide_into_chunks(num_of_lines_each_chunk=1000)
    patterns = "alpha,beta,charlie,delta,echo,foxtrot".split(",")
    print(f"[X] - Start finding the patterns : {patterns} in the file: {text}")
    matches = text.find_matches(chunks=chunks, patterns=patterns, algorithm=algorithm)

    if matches:
        print("Found matches")
        print(matches)

    print("No matches were found")
except FileNotFoundError:
    print(f"The file: {text} was not found and may not exist")

Implementing your own matching algorithm

from pystringmatcher.Algorithms import Algorithm
from pystringmatcher.Types import Match


class MyAlgorithm(Algorithm):
    
    def preprocess(self, pattern, text, *args, **kwargs):
        """some preprocess logic goes here if needed"""
    
    def run(self, pattern, text, *args, **kwargs):
        matches = []
        """the mathcing algorithm logic goes here
        for any match: matches.append(Match(char_offset=start_index_of_match)) 
        """         
        return matches

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
pystringmatcher		pystringmatcher
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test.txt		test.txt
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pystringmatcher

description

installation:

usage:

About

Releases

Packages

Languages

License

aviadtamir/pystringmatcher

Folders and files

Latest commit

History

Repository files navigation

pystringmatcher

description

installation:

usage:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages