FIT5148 – Distributed and Big Data Processing, Semester 1, 2016
Big Data Report (40%) Group Assignment
− This is a group assignment, groups of 2 and from the same tutorial ONLY.
− There is no interview for this assignment.
− You will present this work as a group in Presentation of Big Data Report (10%). The
presentation will be for Part 2 of this assignment.
This report consists of two parts:
• The first part is performance evaluation. You will perform a number of tasks and queries
in the Hortonworks environment using Hive and Pig. You need to write the correct queries for Pig and Hive to produce the results specified in the assignment. Then you will record all the details that logs and reports show in Hortonworks. You will use all this information to compare the performance of Pig and Hive such as how long it took for each or how many MapReduce jobs were executed etc. A table should be included along with brief but informative discussions in a paragraph format.
• The second part (see page 6) involves research. You will select a specific area in big data, and read 4 seminal papers about your selected area. Then you will discuss, analyse and compare these papers based on their approaches, contributions, methods, limitations, and any other criteria. This part has to be written according to a specified template, with high quality and correct APA referencing.