-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.qmd
68 lines (35 loc) · 7.6 KB
/
intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Introduction
## A brief outline
The main goal of knowledge retrieval and knowledge discovery systems is to extract hidden patterns, trends, behaviours, or rules to solve large-impact real-world problems. Usually, these problems present heterogeneous data sources and are at the core of more general decision processes.
A fundamental principle is that the extracted knowledge should provide some understanding of the analyzed data. As computational systems get in critical and sensitive areas such as medicine, the justice system or financial markets, the knowledge becomes much more relevant to make predictions or recommendations or to detect common interest groups or leaders. However, in many cases, the inability of humans to understand the extracted patterns seems problematic.
To represent and retrieve knowledge from datasets, it has become more important to use formal methods based on logic tools. The formal representation of knowledge and the use of logic tools are more suitable for providing understandable answers. Therefore, it can help avoid the lack of interpretability and explainability of the results.
In particular, formal concept analysis (FCA) [@wille1982restructuring;@Ganter1999] is a well-founded mathematical tool, based on lattice theory and logic, which can retrieve and store the knowledge in the form of concepts (analogous to closed itemsets in transactional databases) and implications (association rules with confidence 1). From this perspective, FCA constitutes a framework that complements and extends the study of exact and approximate association rules.
The origins of FCA were devoted to the study of binary datasets (formal contexts) where variables are called attributes. A relevant extension of FCA uses fuzzy sets (see [@BeVyAddgI;@BeVyAddgII]) to model real-world problems since datasets may contain imprecise, graded or vague information that is not adequately represented as binary values. The fuzzy extension can also model problems with numerical and categorical attributes since these can be scaled to a truth value describing the degree of fulfilment of the attribute.
Some authors have considered the use of FCA in machine learning. @Kuznetsov04 relates FCA to some mathematical models of machine learning. @Ignatov2015 summarizes the main topics in machine learning and data mining where FCA has been applied: frequent itemset mining and association rules to make classification and clustering. A closer approach appears in [@Trabelsi17], where a method for supervised classification based on FCA is used. The authors extract rules based on concepts generated previously from data. These rules are used to compute the closure of a set of attributes to obtain a classification rule.
From a dataset, FCA can establish maximal clusters, named concepts, between objects and attributes. Each cluster consists of objects having common properties (attributes), which are only fulfilled for these objects. The hierarchy between the concepts and relationships between the attributes (rules or implications) are computed with the same computational cost in FCA.
Among all the techniques used in other areas to extract knowledge, we emphasize using rules for its theoretical and practical interest. The notion of if-then rules, with different names, appears in several areas (databases, machine learning, data mining, formal concept analysis) as a relationship between attributes, called items, properties or atomic symbols regarding the domain. Nowadays, the number of rules extracted even from medium-sized datasets is enormous in all these areas. Therefore, the intelligent manipulation of these rules to reason with them is a hot topic to be explored.
In this direction, @Cordero2002 introduced a logic, named simplification logic for functional dependencies ($SL_{FD}$), firmly based on a simplification rule, which allows us to narrow the functional dependency set by removing redundant attributes. Although the semantic of implications or if-then rules in other areas are different, the logic can be used too.
Using directly $SL_{FD}$, some automated deduction
methods directly based on this inference system have been developed for classical systems and fuzzy systems [@mora_efficient_2004;@Cordero2012;@Mora2012;@cla2014;@Lorenzo2015].
In the fuzzy framework, several approaches to the definition of fuzzy implications (functional dependencies, rules) are proposed in the literature (see [@JEZKOVA2017] for a survey). Our work considers that the definition of graded implication proposed in [@BeVyAddgI; @BeVyAddgII] generalizes all the previous definitions. Furthermore, for this general definition of graded implications, an axiomatic system named FASL (fuzzy attribute simplification logic [@belohlavek2016automated]) was developed, becoming a helpful reasoning tool. Note that FASL is a generalization of $SL_{FD}$ to the fuzzy framework.
The core of our proposal is to provide a user-friendly computational interface to the principal operators and methods of fuzzy FCA, including the mentioned logic tools. This interface is easy to extend to new functionalities and incorporate new methods, such as minimal generators or the computation of different implication bases quickly. The operators and methods implemented are designed to work in the general fuzzy setting, but they are also applicable in the classical binary case.
Thus, the focus of our proposal, the `fcaR` package, is to provide easy access to formal methods to extract all the implicit knowledge in a dataset in the form of concepts and implications, working natively with fuzzy sets and fuzzy implications. Our goal is to provide a unified computational framework for the theoretically-oriented FCA users to develop, test, and compare new methods and knowledge extraction strategies.
Other objectives of this work include presenting an FCA-based tool for knowledge discovery accessible to other scientific communities, allowing for the development of new packages in other fields using FCA techniques, especially in the construction of recommendation systems.
The work is organized as follows: Section \nameref{definitions} begins with a brief look of FCA and Simplification Logic. Section \nameref{related} presents other software libraries that implement FCA or related paradigms' core notions. In Section \nameref{design}, an explanation of the
data structures, classes and constructor methods is covered. Section \nameref{fcaR} shows how
to use the package, describing the implemented FCA methods and the use of the simplification logic. In Section \nameref{examples}, a real application of the package in developing a recommender system is illustrated. Finally, some conclusions and future works are presented in Section \nameref{conclusions}.
## History of FCA
---
Mejor no usar "quotes"
---
> > Port-Royal logic (traditional logic): formal notion of concept, Arnauld A., Nicole P.: La logique ou l'art de penser, 1662 (Logic Or The Art Of Thinking, CUP, 2003): concept = extent (objects) + intent (attributes)
> > G. Birkhoff (1940s): work on lattices and related mathematical structures, emphasizes applicational aspects of lattices in data analysis.
> > Barbut M., Monjardet B.: Ordre et classiffication, algebre et combinatoire. Hachette, Paris, 1970.
> > Wille R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: I. Rival (Ed.): Ordered Sets. Reidel, Dordrecht, 1982, pp. 445-470.
> > **Ganter B., Wille R.: Formal Concept Analysis. Springer, 1999.**
## Application of FCA
- knowledge extraction
- clustering and classification
- machine learning
- concepts, ontologies
- rules, association rules, attribute implications