Skip to content

CTGenerator

Vidhi Jain edited this page Jul 14, 2017 · 5 revisions

Functionality

Solves the Contingency Table Problem described in Qian et al. CIKM 2014. Implements the solution in that paper, which uses the Fast Moebius Transform.

Input

Required Arguments

The procedure is passed connections objects for different databases.

  • con_std connects to a data_db database with the original data (e.g. unielwin_std). [This should be renamed con_data.]
  • con_setup is a database connection that connects to a metadata database setup_db (e.g unielwin_std_setup). The metadata comprise first-order random variable called functor nodes (e.g. 1Nodes, RNodes, FNodes), . Optional Arguments:
    • FunctorSet a table in setup_db. Restricts the computation to the functor nodes listed in FunctorSet. Default setting: contains all functor nodes.
    • Groundings a table in setup_db. Contains population variables (e.g Student). The contingency tables are expanded with entity Ids (e.g. student-id), so that the computation returns counts for individuals. Default setting: empty.
  • con_bnconnects to a bn_db database that contains metadata for learning (e.g. the lattice of relationship chains).
  • con_ct connects to a ct_db with the contingency tables that are constructed by dynamic programming algorithm. [db_db and ct_db should be merged.]

Output

  • after running CTGenerator, ct_db contains the contingency table for the first-order random variables listed in setup_db.FunctorSet and the data listed in data_db. If setup_db.Groundings contains first-order population variables, then the contingency table lists counts for each tuple of population members.

Program Flow

Assumes the following steps have been taken:

  1. Runs script transfer.sql. Transfer metadata from setup_db to bn_db.
  2. Generates relationship chain lattice in bn_db.
  3. Generates more metadata using metadata_2.sql or a variant depending on which option was chosen (link analysis on or off).

Then does the following:

  1. Builds contingency tables for each population variable (BuildCT_Pvars).

  2. For each relationship chain length, builds contingency tables for that length (BuildCT_Rnodes_join).

  3. For each length less than maxNumberofMembers, it calls BuildCT_Rnodes_counts(len).(NOTE: if Link Correlation is set to 0, BuildCT_Rnodes_counts2(len) is called.

  4. If link correlation is not set, the procedure uses simple table joins. If it is set, it performs a virtual join using the Moebius Transform and calls

    • for length 1

      a. BuildCT_Rnodes_flat(len) for building the _flat tables

      b. BuildCT_Rnodes_star(len) for building the _star tables

      c. BuildCT_Rnodes_CT(len) for building the _false tables first and then the _CT tables

    • for length 2 or more

      d. BuildCT_RChain_flat(len)

    SQL Queries

      `select name as RChain from lattice_set where lattice_set.length = " + len + ";`
    
     `SELECT distinct parent, removed as rnid FROM lattice_rel  where child = '"+rchain+"' order by rnid ASC;`
    
     `SELECT DISTINCT Entries FROM ADT_RChain_Star_Select_List WHERE rchain = '" + rchain + "' and '"+rnid+"' = rnid;`
    
     `SELECT DISTINCT Entries FROM  ADT_RChain_Star_Where_List WHERE rchain = '" + rchain + "' and '"+rnid+"' = rnid;`
    

    Table creation for star tables

    Table creation for flat tables

    Table creation for false tables

  5. Deletes the rows where MULT = 0, in the table named after the biggest RChain (The most important CT Table)

TODO

  • Make this a self-contained repository.
  • Add screenshots
  • Add a gallery of examples
Clone this wiki locally