Skip to content

miguelgondu/latinamerican-philosophy-mining

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latin American Philosophy Mining

Authors: Juan R. Loaiza (URosario / HU Berlin) and Miguel González Duque (ITU Copenhagen)

In this repository we track progress on a research project in which we apply text mining to philosophy journals in Latin America. Our aim is to provide insights into the history of philosophy in Latin America using a data-driven approach.

We are starting with Ideas y Valores (Colombia) and articles from 2009 to 2017. We plan on expanding later to include more years and other journals such as Crítica (Mexico) and Análisis Filosófico (Argentina).

Structure

.
├── data                # Data files (omitted from Git repository for the moment)
|   ├── raw_html        # Raw HTML files directly as scraped with metadata     
|   └── clean_json      # Parsed HTML files and metadata in JSON format
├── utils               # Helper utilities
├── notebooks           # Notebooks with preprocessing and analyses
|   └── wordlists       # Stopwords and protected words lists
└── README.md

To-Do

  • Extract view information from main HTML page.
  • Calibrate the number of topics for the LDA model.
    • Implement LDA in gensim and use topic coherence measures to calibrate the number of topics.

Preliminary figures and visualizations

Figure 1. Documents by document type.

Documents by type

Figure 2. Documents by main type per year.

Documents by type/year

Figure 3. Word cloud of the most mentioned philosophers in the corpus.

Most mentioned authors in the corpus

Figure 4. Word cloud of the most frequent keywords in the corpus according to article metadata.

Most frequent keywords in the corpus

Figure 5. Word counts by year.

Word counts by year

Using a provisional model

The following plots are only proofs of concept. We are using a temporary LDA model with 10 topics to find which visualizations would work best. There is still work to fully optmize the LDA model though. We use a model with the following top 10 most salient words.

Topic 0 Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8 Topic 9
lenguaje kant religioso ser creencia ser político acción alma político
interpretación concienciar religión cuerpo mundo mundo formar moral ser derecho
teoría ser ciudad formar ser hegel vida ser platón moral
experiencia concepto filosofía heidegger teoría filosofía ser accionar filosofía ser
wittgenstein objetar historia modo propiedad dios filosofía agente conocimiento justicia
filosofía experiencia siglo aristóteles término bien nietzsche personar sócrates bien
ser arte cultura ente contener vida foucault desear hombre social
problema husserl tradición naturaleza concepto razón social intención virtud sociedad
autor trascendental ciencia bien físico hombre crítico bien bien teoría
filosófico modo obrar existencia objeto pensar pensamiento libertar obrar razón

Figure 6. Proportion of articles by topic

Proportion of articles by topic

Figure 7. Word counts by topic.

Word counts by topic

About

Text mining philosophy journals in Latin America.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.7%
  • Python 1.3%