This demo emulates a fraud detection system for credit card transactions. The system works with 3 data sets that originate from 3 different sources.
Credit card account data is loaded from CSV files located in HDFS. GridGain Spark Loader is used for this.
Every credit card has a country code associated with it to indicate where the card was issued.
Transactions are continuously streamed from Kafka via GridGain Kafka Connector.
Every transaction record holds a country code to indicate where this transaction occurred.
In addition, credit card users can use Web interface to notify the system about anticipated travels providing a list of country codes.
This data is stored only in GridGain.
An incoming transaction is considered fraudulent if one of these conditions is true:
- The transaction is associated with a non-existent account (e.g., provided credit card number is invalid).
- The transaction occurred outside of the home country, as well as not in one of the countries specified for travel.
- Download GridGain 8.7.23 Enterprise Edition ZIP from here: https://www.gridgain.com/resources/download
- Unzip the downloaded file to a preferred location (
$GRIDGAIN_HOME
). - Prepare Kafka Connector package:
cd $GRIDGAIN_HOME/integration/gridgain-kafka-connect
./copy-dependencies.sh
- Download Kafka 2.4.1 from here: https://kafka.apache.org/downloads
- Unzip the downloaded file to a preferred location (
$KAFKA_HOME
). - Open the
$KAFKA_HOME/config/connect-standalone.properties
file for editing and add the following line (replace$GRIDGAIN_HOME
with the actual path to GridGain installation):
plugin.path=$GRIDGAIN_HOME/integration/gridgain-kafka-connect
- Create
$KAFKA_HOME/config/gridgain-kafka-connect-sink.properties
file with the following content:
name=gridgain-kafka-connect-sink
topics=ignite.TRANSACTIONS
topicPrefix=ignite.
connector.class=org.gridgain.kafka.sink.IgniteSinkConnector
igniteCfg=META-INF/ignite-client-config.xml
shallProcessUpdates=true
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
- Start ZooKeeper server:
$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
- Start Kafka server:
$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
- Create Kafka topic:
$KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic ignite.TRANSACTIONS
- Download Hadoop 2.10.0 from here: https://hadoop.apache.org/releases.html
- Unzip the downloaded file to a preferred location (
$HADOOP_HOME
). - Update the
$HADOOP_HOME/etc/hadoop/core-site.xml
with the following content:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- Update the
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
with the following content:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
- Format HDFS:
$HADOOP_HOME/bin/hdfs namenode -format
- Start HDFS:
$HADOOP_HOME/sbin/start-dfs.sh
- Copy the
data/accounts.csv
file located in this project to HDFS:
$HADOOP_HOME/bin/hdfs dfs -put $DIH_DEMO/data/accounts.csv /
- Verify:
$HADOOP_HOME/bin/hdfs dfs -ls /
$HADOOP_HOME/bin/hdfs dfs -cat /accounts.csv
- Build the project (replace
$GRIDGAIN_HOME
with the actual path to GridGain installation):
mvn clean package -Dgridgain.home=$GRIDGAIN_HOME
- Open the project in your favorite IDE.
- Start GridGain Kafka connector:
$KAFKA_HOME/bin/connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties $KAFKA_HOME/config/gridgain-kafka-connect-sink.properties
- Run
IgniteServer
class in your IDE to start an Ignite server node. - Run
AccountsLoader
class in your IDE to load accounts data from HDFS. - Run
TransactionsProducer
class in your IDE to start streaming transactions from Kafka. - Run
AccountsWebApp
class in your IDE. You can now edit travel countries for all the accounts via the Web interface: http://localhost:8080/accounts - Run
FraudChecker
class in your IDE. It will connect to the Ignite cluster and periodically execute SQL queries to check if there are any fraudulent transactions. In case there are any, they will be printed out. Here are the queries that are used for this:
-- Transactions associated with a non-existent account
SELECT id, ccNumber, amount
FROM Transaction
WHERE status = 'NO_ACCOUNT'
-- Transactions occurred in an unexpected country
SELECT a.ccNumber, CONCAT(a.firstName, ' ', a.lastName) as name, a.issueCountry, t.country, t.status
FROM Account a, Transaction t
WHERE a.ccNumber = t.ccNumber
AND t.status = 'WRONG_COUNTRY'