UEL-CN-7031 Big Data Analytics Coursework Assignment – UK

Module Code :-  UEL-CN-7031
Module title :- Big Data Analytics
Assignment title :- Big Data Analytics: Coursework
Assignment number :- 1
Weighting :- 100% (Final Project 60% and Presentation 40%)
This coursework (CRWK) must be attempted as an individual work This coursework is
divided into two sections: (1) Big Data analytics on a real case study and (2) presentation.
UEL-CN-7031 Big Data Analytics Coursework Assignment – UK

UEL-CN-7031 Big Data Analytics Coursework Assignment

Overall mark for CRWK comes from two main activities as follows:
1- Big Data Analytics report (around 5,000 words, with a tolerance of ± 10%) (60%)
2- Presentation (around 1000 words, with a tolerance of + 10%) (40%)


(1) Understanding Dataset: UNSW-NB15
The raw network packets of the UNSW-NB151 data set was created by the IXIA Perfect Storm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours.

Tcp dump tool used to capture 100 GB of the raw traffic (e.g., Pcap files).

This data set has nine types of attacks namely Fuzzers Analysis Backdoors DoS Exploits Generic Re connaissance Shell code and Worms. The Argus and Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label.

a) The features are described Here
b) The number of attacks and their sub-categories is described here
c) In this coursework we use the total number of 10-million records that was stored in the CSV file

The total size is about 600MB which is big enough to employ big data methodologies for analytics. As a big data specialist firstly we would like to read and understand its features then apply modeling techniques. If you want to see a few records of this dataset you can import it into Hadoop HDFS then make a Hive query for printing the first 5-10 records for your understanding.

(2) Big Data Query & Analysis by Apache Hive
This task is using Apache Hive for converting big raw data into useful information for the end users. To do so firstly understand the dataset carefully.

Then make at least 4 Hive queries refer to the marking scheme.

Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.

Finally take screenshot of your outcomes e.g., tables and plots together with the scripts queries into the report.

Tip: The mark for this section depends on the level of your HIVE queries complexities for instance using the simple select query is not supposed for full mark.

(3) Advanced Analytics using PySpark
In this section you will conduct advanced analytics using PySpark.

UEL-CN-7031 Big Data Analytics Coursework Assignment – UK

UEL-CN-7031 Big Data Analytics Coursework Assignment

3.1. Analyze and Interpret Big Data
We need to learn and under stand the data through at least 4 analytical methods descriptive statistics correlation hypothesis testing density estimation etc. You need to
present your work numerically and graphically. Apply tooltip text legend title X-Y labels etc. accordingly to help end users for getting insights.

3.2. Design and Build a Classifier
a) Design and build a binary classifier over the data set. Explain your algorithm and its configuration. Explain your findings into both numerical and graphical representations. Evaluate the performance of the model and verify the accuracy and the effectiveness of your model.
b) Apply a multi class classifier to classify data into ten classes categories: one normal and nine attacks (e.g., Fuzzers Analysis Backdoors DoS Exploits Generic Reconnaissance Shell code and Worms. Briefly explain your model with supportive statements on its parameters accuracy and effectiveness.

(4) Individual Assessment 
Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how they are differ use academic references and (2) what was surprisingly new thinking evoked and or neglected at your end?

(5) Documentation
Document all your work Your final report must follow 5 sections detailed in the format of final submission section refer to the next page. Your work must demonstrate appropriate understanding of academic writing and integrity.

ORDER This UEL-CN-7031 Big Data Analytics Coursework Assignment NOW And Get Instant DiscountOrder Your Assignment