Creating a knowledge graph about the rent stuff. This is a fork of BauGraph
To create the graph there are two steps. First we have to crawl a few websites then we create the graph.
- Clone the repository
git clone https://git.informatik.uni-leipzig.de/information_extraction_group/bauprofessor_crawler.git
- Change into the repository folder and install the necessary requirements with
pip install -r requirements.txt
- Change into the directory
cd MietGraph/crawling
and run the crawling with./run.sh [output_folder]
- Change to MietGraph/graphing, move the output from step 3 to a folder for input and run
python MietGraph.py --input=[input_folder] --output=MietGraph.graphml
to generate the graph. (python MietGraph.py -h
for more options) - ???
- Profit
The output formats ".gml", ".gexf" and ".graphml" can all be used with gephi.
You should also be able to install the package with pip install .
from the repository's base directory.
After that you can invoke the MietGraph class and generate the Graph in python environment.
>> from MietGraph import MietGraph
>> g = MietGraph("input/")
>> g.add_keyword_in_text()
>> g.add_crosslinks()
>> g.graph
<networkx.classes.digraph.DiGraph at 0x7f38243927c0>
You can also get a igraph representation with
g.igraph()
You can use this to manipulate the graph further. More Information coming up..
KnowledgeBase
import MietGraph
kb = MietGraph.KnowledgeBase("name of your knowledge base file")
# See a list of contained documents
docs = kb.list_documents()
# Get all neighbours of a document with all attributes
kb.get_neighbors(docs[0], True)
#See the content of the document
kb.get_attr(docs[0], "content")
# Show document names of similar documents
kb.most_similar(docs[0])
# you can also add new pdfs to the knowledgebase
kb.add("path_to_new_pdf.pdf")
# And save the modified graph
kb.save("save.graphml")