Geospatial analysis using Hadoop Distributed File System(HDFS) and Apache Spark ▪ Performing geospatial analysis on large spatial data stored in HDFS using Apache Spark ▪ Retrieving geographical hotspots in a locality based on the data available in HDFS
Summary:
- Performed geospatial database operations on large datasets stored in distributed systems using Hadoop, Apache Spark, Scala, GeoSpark library in Linux
- Performed cluster analysis (efficiency, memory usage and CPU usage for each node) using Ganglia
- Successfully implemented an algorithm for Spatial-Temporal hotspot analysis that included determining the top 50 hotspots for taxi pickups in New York city in January 2015 using Getis- Ord statistics