-
Notifications
You must be signed in to change notification settings - Fork 6
CTGenerator
Solves the Contingency Table Problem described in Qian et al. CIKM 2014. Implements the solution in that paper, which uses the Fast Moebius Transform.
The procedure is passed connections objects for different databases.
-
con_std
connects to adata_db
database with the original data (e.g. unielwin_std). [This should be renamedcon_data
.] -
con_setup
is a database connection that connects to a metadata databasesetup_db
(e.g unielwin_std_setup). The metadata comprise first-order random variable called functor nodes (e.g. 1Nodes, RNodes, FNodes), . Optional Arguments:-
FunctorSet
a table insetup_db
. Restricts the computation to the functor nodes listed inFunctorSet
. Default setting: contains all functor nodes. -
Groundings
a table insetup_db
. Contains population variables (e.g Student). The contingency tables are expanded with entity Ids (e.g. student-id), so that the computation returns counts for individuals. Default setting: empty.
-
-
con_bn
connects to abn_db
database that contains metadata for learning (e.g. the lattice of relationship chains). -
con_ct
connects to act_db
with the contingency tables that are constructed by dynamic programming algorithm. [db_db
andct_db
should be merged.]
- after running CTGenerator,
ct_db
contains the contingency table for the first-order random variables listed insetup_db.FunctorSet
and the data listed indata_db
. Ifsetup_db.Groundings
contains first-order population variables, then the contingency table lists counts for each tuple of population members.
Assumes the following steps have been taken:
- Runs script
transfer.sql
. Transfer metadata fromsetup_db
tobn_db
. - Generates relationship chain lattice in
bn_db
. - Generates more metadata using
metadata_2.sql
or a variant depending on which option was chosen (link analysis on or off).
Then does the following:
-
Builds contingency tables for each population variable (
BuildCT_Pvars
). -
For each relationship chain length, builds contingency tables for that length (
BuildCT_Rnodes_join
). -
For each length less than maxNumberofMembers, it calls
BuildCT_Rnodes_counts(len)
.(NOTE: if Link Correlation is set to 0,BuildCT_Rnodes_counts2(len)
is called. -
If link correlation is not set, the procedure uses simple table joins. If it is set, it performs a virtual join using the Moebius Transform and calls
-
for length 1
a.
BuildCT_Rnodes_flat(len)
for building the _flat tablesb.
BuildCT_Rnodes_star(len)
for building the _star tablesc.
BuildCT_Rnodes_CT(len)
for building the _false tables first and then the _CT tables -
for length 2 or more
d.
BuildCT_RChain_flat(len)
SQL Queries
`select name as RChain from lattice_set where lattice_set.length = " + len + ";` `SELECT distinct parent, removed as rnid FROM lattice_rel where child = '"+rchain+"' order by rnid ASC;` `SELECT DISTINCT Entries FROM ADT_RChain_Star_Select_List WHERE rchain = '" + rchain + "' and '"+rnid+"' = rnid;` `SELECT DISTINCT Entries FROM ADT_RChain_Star_Where_List WHERE rchain = '" + rchain + "' and '"+rnid+"' = rnid;`
Table creation for star tables
Table creation for flat tables
Table creation for false tables
-
-
Deletes the rows where MULT = 0, in the table named after the biggest RChain (The most important CT Table)
- Make this a self-contained repository.
- Add screenshots
- Add a gallery of examples