-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #29 from am9zZWY/josef-crawler-tokenizer-update
Update crawler and tokenizer
- Loading branch information
Showing
10 changed files
with
311 additions
and
174 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,41 +1,41 @@ | ||
tübingen | ||
tübingen university | ||
tübingen attractions | ||
food and drinks | ||
tübingen weather | ||
tübingen hotels | ||
tübingen traditional food | ||
tübingen coffee shops | ||
tübingen nightlife spots | ||
tübingen museums | ||
tübingen castles | ||
tübingen outdoor activities | ||
tübingen nightlife | ||
tübingen markets | ||
tübingen shopping centers | ||
tübingen local products | ||
Best cafes in Tübingen for students | ||
Upcoming events at the University of Tübingen | ||
History of Tübingen's old town | ||
Popular hiking trails near Tübingen | ||
Tübingen student housing options | ||
Vegan and vegetarian restaurants in Tübingen | ||
Cultural activities in Tübingen | ||
Tübingen public transportation map | ||
University of Tübingen research departments | ||
Tübingen nightlife spots | ||
Bookstores in Tübingen | ||
Tübingen local farmers' markets | ||
Tübingen weather forecast | ||
Student discounts in Tübingen | ||
Tübingen library hours and services | ||
Language exchange programs in Tübingen | ||
Top tourist attractions in Tübingen | ||
Cycling routes in Tübingen | ||
Tübingen sports clubs and gyms | ||
Tübingen local festivals and fairs | ||
Best places to study in Tübingen | ||
Tübingen historical landmarks | ||
Tübingen university application process | ||
Local art galleries in Tübingen | ||
Tübingen second-hand stores | ||
1 tübingen | ||
2 tübingen university | ||
3 tübingen attractions | ||
4 food and drinks | ||
5 tübingen weather | ||
6 tübingen hotels | ||
7 tübingen traditional food | ||
8 tübingen coffee shops | ||
9 tübingen nightlife spots | ||
10 tübingen museums | ||
11 tübingen castles | ||
12 tübingen outdoor activities | ||
13 tübingen nightlife | ||
14 tübingen markets | ||
15 tübingen shopping centers | ||
16 tübingen local products | ||
17 Best cafes in Tübingen for students | ||
18 Upcoming events at the University of Tübingen | ||
19 History of Tübingen's old town | ||
20 Popular hiking trails near Tübingen | ||
21 Tübingen student housing options | ||
22 Vegan and vegetarian restaurants in Tübingen | ||
23 Cultural activities in Tübingen | ||
24 Tübingen public transportation map | ||
25 University of Tübingen research departments | ||
26 Tübingen nightlife spots | ||
27 Bookstores in Tübingen | ||
28 Tübingen local farmers' markets | ||
29 Tübingen weather forecast | ||
30 Student discounts in Tübingen | ||
31 Tübingen library hours and services | ||
32 Language exchange programs in Tübingen | ||
33 Top tourist attractions in Tübingen | ||
34 Cycling routes in Tübingen | ||
35 Tübingen sports clubs and gyms | ||
36 Tübingen local festivals and fairs | ||
37 Best places to study in Tübingen | ||
38 Tübingen historical landmarks | ||
39 Tübingen university application process | ||
40 Local art galleries in Tübingen | ||
41 Tübingen second-hand stores |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,9 @@ | ||
# file to test the written functions | ||
import logging | ||
|
||
from custom_tokenizer import tokenize_data, tf_idf_vectorize, top_30_words | ||
|
||
CUSTOM_TEXT = "Lorem Ipsum is simply dummy text" + " " + " \n "+ "of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum." | ||
|
||
top_30_words = top_30_words([CUSTOM_TEXT]) | ||
print(top_30_words) | ||
logging.info(top_30_words) |
Oops, something went wrong.