GitHub - ZeeshanAyub01/C_String_Search: String search algorithms

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
A3.h		A3.h
Makefile		Makefile
P11.c		P11.c
P12.c		P12.c
P21.c		P21.c
P22.c		P22.c
P23.c		P23.c
data_4.txt		data_4.txt
data_5.txt		data_5.txt
main.c		main.c
readMe		readMe

Repository files navigation

Student Name: Muhammad Zeeshan Ayub,
Student Id: 0994442,

Oveview of the program:
This program contains various string-search algorithms coded in C. The .c files are named according to the question numbers in the assignment, but their contents are bulleted below:
=> P11.c: The brute-force search method to find all the anagrams of a string,
=> P12.c: Finding all the anagrams of a string after pre-sorting,
=> P21.c: The brute-force string-search method to find all the instances of a string in a given text,
=> P21.c: The Horspool algorithm to find all the instances of a string in a given text,
=> P21.c: The Boyer's-Moore algorithm to find all the instances of a string in a given text
=> main.c: The main function, and verifyChoice() to return a corresponding number based on user's choice of the algorithm tey wish to view.

How compile and run the program:

To compile the program, type 'make' on the console. That will compile the program, and produce an executable file called 'run'.
To run type './run' on the console.When the program runs, it will ask the user what program they want to run, and the user can
select the program by entering the corresponding number. If the the user enters a wrong number they will be prompted to make
a valid entry again, and but if they make a correct entry, the program will run the relevant functions, display the output and
then exit.

*NOTE: THE NAMES OF THE FILES HAVE BEEN HARDCODED IN THE PROGRAM, because different methods were used to read them. 'data_4.txt'
(used for problems 11 and 12) is read in word by word, with each word stored in an array; whereas 'data_5.txt' (used for problems
21, 22 and 23) was read into a single string. If the user wishes to use different input files, their names can be changed in main.c
where the data files have been written.

Incase of any questions please contact the author of the code at: [email protected].
If the files are not included in the same folder, that will result in an error.

Analysis and comparison for Horspool's and Boyer-Moore's:

The two algorithms resemble in their method of operation, but Horspool is simpler than Boyer-Moore, because it
simply shifts based on the last character, whereas the Boyer-Moore algorithm shifts based on the first mismatched
character, and the repetition of the matched suffix in the pattern. In carrying out the testing I found that both
algorithms were quite reliable and efficient. However, a closer look at the two algorithms would reveal that the
Boyer-Moore usually results in more comparison operations taking place, so it takes more time. However, Boyer-Moore
has the effect of fewer total shift operations taking place, because it takes into consideration the repetition of the
suffix in the pattern. We tried out the algorithms using the following input patterns:

Amendment
Chapter
Calendar
diplomas
Universities
Ministry
Algonquin
Points
Recommendations

We obtained the following results:

Average shifts per time (in milliseconds) ratio for Horspool: 4.99,
Average shifts per time (in millisecods) ratio for Boyer-Moore: 4.94.

Although, the results for the two algortihms are not very far apart, we can observe that the Horspool algorithm is
slightly more efficient than Boyer-Moore. That is because the total number of operations is usually fewer. Both algorithms
can be considered to be O(nlogn).