open-source-search-engine

An open source web search engine and spider/crawler. This was once the codebase for a search engine called Gigablast, but the site is no longer operational. This is a fork of the original codebase located at https://github.com/gigablast/open-source-search-engine

Quick Start

To experiment, you can quickly launch via docker by running:

docker run -p 8000:8000 -it --rm moldybits/open-source-search-engine

If you wish to preserve data between runs, you can:

docker run -p 8000:8000 -it --rm -v $(pwd)/data:/var/gigablast/data0 moldybits/open-source-search-engine

Major changes in this fork

cleanup! - Moved sources that are actually used into src dir. Everything else has been stuffed in the junkdrawer dir.
More cleanup - formatting, removing TONS of commented code, fixing some segfaults. This is ongoing...
I have replaced the original Makefile with CMake. This now installs the correct files required so you can execute ./gb in the build directory and run a test server there without it borking your source dir.
Stubbed out some testing functionality for building tests if this ever gets cleaned up enough to start making "real" changes.

Building

This does not build on ARM and does not work correctly on modern versions of MacOS, though it looks like there once was support at one point in time.

Install Catch2

git clone https://github.com/catchorg/Catch2.git
cd Catch2
cmake -Bbuild -H. -DBUILD_TESTING=OFF
sudo cmake --build build/ --target install

Debian or Ubuntu

sudo apt-get install make g++ libssl-dev libz-dev cmake

RedHat or AlmaLinux

Last tried with AlmaLinux 9

sudo yum install gcc-c++ openssl-devel libz-devel cmake

Build

cd open-source-search-engine
cmake -Bbuild
cmake --build build/

Issues & Pull Requests

Should be filed at https://github.com/twistdroach/open-source-search-engine

Testing

Tests can be put in the tests directory. I have written a few simple examples just to make sure it (mostly) works.

Documentation

There are various docs located in the html directory. The FAQ & developer.html are particularly interesting.

Name		Name	Last commit message	Last commit date
Latest commit History 3,897 Commits
antiword-dir		antiword-dir
diffbot-widget		diffbot-widget
doxygen		doxygen
html		html
junkdrawer		junkdrawer
script		script
src		src
test		test
ucdata		ucdata
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
Make.depend		Make.depend
Makefile		Makefile
README.md		README.md
S99gb		S99gb
antiword		antiword
badcattable.dat		badcattable.dat
bmptopnm		bmptopnm
catcountry.dat		catcountry.dat
character-sets		character-sets
control.deb		control.deb
copyright.head		copyright.head
copyright.tail		copyright.tail
entrypoint.sh		entrypoint.sh
gb-1.0.spec		gb-1.0.spec
gb.deb.rules		gb.deb.rules
gb.pem		gb.pem
giftopnm		giftopnm
gigablast.cbp		gigablast.cbp
gigablast.layout		gigablast.layout
init.gb.conf		init.gb.conf
injectme3		injectme3
injectmedemo		injectmedemo
jpegtopnm		jpegtopnm
libjpeg.so.62		libjpeg.so.62
libnetpbm.so.10		libnetpbm.so.10
libpng12.so.0		libpng12.so.0
libtiff.so.4		libtiff.so.4
mysynonyms.txt		mysynonyms.txt
parse_iana_charsets.pl		parse_iana_charsets.pl
pdftohtml		pdftohtml
pngtopnm		pngtopnm
pnmscale		pnmscale
postalCodes.txt		postalCodes.txt
ppmtojpeg		ppmtojpeg
pstotext		pstotext
sitelinks.txt		sitelinks.txt
supported_charsets.txt		supported_charsets.txt
tifftopnm		tifftopnm
unifiedDict.txt		unifiedDict.txt
wikititles.txt.part1		wikititles.txt.part1
wikititles.txt.part2		wikititles.txt.part2
wiktionary-buf.txt		wiktionary-buf.txt
wiktionary-lang.txt		wiktionary-lang.txt
wiktionary-syns.dat		wiktionary-syns.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

open-source-search-engine

Quick Start

Major changes in this fork

Building

Install Catch2

Debian or Ubuntu

RedHat or AlmaLinux

Build

Issues & Pull Requests

Testing

Documentation

About

Releases

Packages

Languages

License

twistdroach/open-source-search-engine

Folders and files

Latest commit

History

Repository files navigation

open-source-search-engine

Quick Start

Major changes in this fork

Building

Install Catch2

Debian or Ubuntu

RedHat or AlmaLinux

Build

Issues & Pull Requests

Testing

Documentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages