The course will explore the origins and features of large datasets, commonly referred to as "big data," along with methods for analyzing such data at scale. It will highlight the advantages that large-scale data analysis brings to different industry sectors and examine programming models and middleware that support scalable data solutions. Students will study algorithms designed for processing massive datasets and their application in specific domains. Key platforms and frameworks for big data analytics, such as Hadoop, MapReduce, Spark, and H2O, will be covered. Additionally, learners will gain hands-on experience with machine learning algorithms, using Spark MLlib and H2O for practical examples and visualizations. Topics on data storage, batch versus real-time analysis, and frameworks for interactive querying will also be introduced. Prerequisite: Admission to either the graduate program in Electrical, Computer or Software Engineering or Engineering program advisor's permission.
The course will explore the origins and features of large datasets, commonly referred to as "big data," along with methods for analyzing such data at scale. It will highlight the advantages that large-scale data analysis brings to different industry sectors and examine programming models and middleware that support scalable data solutions. Students will study algorithms designed for processing massive datasets and their application in specific domains. Key platforms and frameworks for big data analytics, such as Hadoop, MapReduce, Spark, and H2O, will be covered. Additionally, learners will gain hands-on experience with machine learning algorithms, using Spark MLlib and H2O for practical examples and visualizations. Topics on data storage, batch versus real-time analysis, and frameworks for interactive querying will also be introduced. Prerequisite: Admission to either the graduate program in Electrical, Computer or Software Engineering or Engineering program advisor's permission.