Skip to content

vtiwari227/CSCI-599-Content-Detection-and-Big-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

CSCI-599-Content-Detection-and-Big-Data-Analysis

This repository contain the assignment that we covered during CSCI-599(Content Detection and Big Data Analytics) under Prof. Chris Mattman.

Subject provides a in depth overivew of various Content detection approaches, MetaData cataloging ,Language Detection and Machine Translation techniques.

Assignment-1 (MIME Diversity in TREC Polar DataSet)

Learning Byte-based fingerprints of the data via Byte Frequency Analysis (BFA), Byte Frequency Distribution (BFD) Correlation, Byte Frequency Cross-Correlation (BFC), and File Header Trailer (FHT). To implement a set of MIME diversity programs and applications that will help in better understanding these unknown types in a rich scientific domain.Compute BFA,BFC and FHT of these unknown (and other) Polar data types from the dataset, and build a system that allows visual interaction and introspection of the MIME diversity in this dataset. Those classifications will improve Tika’s overall ability by suggesting new MIME magic for its database, and improve techniques for MIME detection in the Big Data present in the TREC-DD-Polar dataset. read more here

Demo for MIME Divesity for various MIME type-BFA approach

Demo for MIME Divesity for various MIME type-BFC approach

Demo for MIME Divesity for various MIME type-FHT approach

Assignment-2(Scientific Content Enrichment for TREC polar dataset)

To significantly enrich the metadata, and automatically extracted text and entities from the TREC Polar Dataset, and to make the dataset easily to relate to and to interact with. To do so, you will apply and leverage knowledge gained from context extraction, metadata, information similarity and clustering, and from the named entity recognition lectures. read more

Demo for Scientific Content Enrichment for TREC polar dataset

Assignment-3 (Evaluating the content Analysis on TREC polar dataset)

To expand the analysis of the TREC-DD-Polar Dataset.Evaluating the efficacy, utility, and overall contribution of your Content detection approach is an extremely important and difficult challenge. Questions such as Is my MIME detection good? Are my parsers extracting the right text? Are we selecting the right parser? Is my Metadata appropriate? What’s missing? How well is my language detection performing? Are there mixed languages? How well is my Machine Translation? Do my Named Entities make sense? read more here

Visualization Demo for Assignment 3

About

CSCI -599 (Content Detection and Big Data Analytics) Assignments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published