Aim : Practical Application Of Big Data And Cloud Computing
The aim of this assignment is to introduce a practical application of Big Data and Cloud Computing using a realistic big data problem. Students will implement a solution using an industry leading Cloud computing provider together with appropriate distributed processing environments such as Apache Spark. This will involve the provisioning and configuring of appropriate Cloud Computing resources and the selection of problem appropriate algorithms and visualization methods.
Learning Outcomes Assessed
Knowledge & Understanding
LO1. Apply big data analytic algorithms, including those for visualization and cloud computing techniques to multi-terabyte data sets.
LO2. Critically assess data analytic and machine learning algorithms to identifythosethatsatisfy given big data problemrequirements
Intellectual / Professional skills & abilities
LO3. Critically evaluate and select appropriate big data analytic algorithms to solve a given problem, considering the processing time available and other aspects of the problem.
LO4. Design and develop advanced big data applications that integrate with third party cloud computing services
Personal Values Attributes (Global / Cultural awareness, Ethics, Curiosity) (PVA):
LO5. Critically assess and interpret primary research to identify its applicability to a given big data problem scenario.
Individual work – Big Data Product: ‘Bicycle Theft’. This activity assesses module learning outcomes 1, 2, & 4.
Individual work – Report: “A Critical Assessment of the Big Data Approach to Crime Analysis.”
This activity assesses module learning outcomes 3, & 5.
Big Data Product: Bicycle Theft (Individual Work)
In this scenario you are a data scientist working with a marketing consultancy. Your client has developed a new bicycle lock that they wish to test market. Since it is hypothesised that customers who have experienced bicycle theft would be more interested in a new lock, the company need to find whether “Bicycle theft” is more frequent in particular areas of England. If that is the case the company needs to determine whether these are areas of affluence, where a premium model could be sold,or one of relative deprivation where an economy model would be more appropriate.
To solve this problem you will use publicly available data sets that have been prepared for you and placed in Amazon S3. These include (but are not limited to):-
- Street level crime data Published by the UK Home Office, this data set contains 19 million data rows giving a crime type, together with their location as a latitude and longitude.
- Land Registry Price Paid Data: This gives the postcode of a property, the property type from a enumeration of D (Detached), S (Semi-Detached), T (Terraced), F (Flats/Maisonettes) and the price paid.
- ENGLISH INDICES OF DEPRIVATION. The English Indices of Deprivation 2010 data set contains the rankings of measures of deprivation within small area level across England. The 32000 localities are ranked from the least to most deprived, scored on seven different dimensions of deprivation.
- POSTCODE DATA: This data set provided by the Ordinance Survey gives a latitude and longitude to every postcode. This is useful in the product to provide a relation between the Land Registry Price Paid data set postcode, and the original crime data set latitude/longitude.
- Process the data prepared for you using Apache Spark
- Filter the dataset so that crimes refer to bicycle theftonly.
- Using appropriate software, determine whether bicycle theft is more closely associated with areas of affluence, relative deprivation, orneither.
- Select and prepare no more than three visualizations to support your analytic findings from(3).
- Individual work () – Report: “A Critical Assessment of the Big Data Approach to Crime Analysis.”
- You are each required to write a critical report with the title
- “A Critical Assessment of the Big Data Approach to Crime Analysis.”
- The idea of this research report is to critically analyze the Big Data approach to Crime Analysis undertaken in the practical work. The report should identify advantages and disadvantages of the approach for technical, social and ethical perspectives. It should not just be a description.
- Your report should recommend (or not) the technique you have studied giving appropriate reasoning.
- The report should consider as many factors as you consider important.
- Your report should be a maximum of 2000 words (plus any amount for references). We stop reading after that point.
- References must be used and your discussion must be your own words. DO NOT CUT AND PASTE FROM THE INTERNET. Please avoid ALL literal quotations, as they will not gain marks. If you require guidance on citation, please use the booklet ‘Cite Them Right’ available from the City Campus library (or online at http://www.citethemrightonline.com).
Students will receive brief written feedback on the final submission together with the option to receive detailed verbal feedback on request.
- Submit your critical discussion using Turnitin assignment on the eLP (eLearning Portal) Further information on how to do this will be given on the eLP.
Submit your programming work as ONE ZIP file (compressed with WINZIP — not an alternative) using the assignment submission on the eLP, where you will find further instructions. You must also include a text file that gives your application prerequisites, and explains how your application should be run. Short samples of output such as i visualizations produced, or data extracts that indicate that the application is running as designed would also be helpful
Question – Practical Application Of Big Data And Cloud Computing
The following marking scheme will be used for this assignment
|Big Data Implementation:
Practical Instructions that describe the solution : Specifically
1. Component Selection and Data Pipeline Implementation
2. Data Extraction and Filtering
3. System running, test and diagnostics.
Design and Implement ML solutions for given BigData set
Appropriateness of selected visualization method in relation to:
1. data size
2. data type
3. data pattern examined (i.e. clusters, correlation…)
4. Appropriate use of principles of visualization and perception (use of pre- attentive features, colour, gestalt theory…)
5. Successfuldisplayoffindings(i.e.visibilityandclarityofdatafindings and data patterns, how well the visualization support the findings)
Note: The marks awarded will be converted to a percentage in order to obtain thefinalmark for theassignment.
|Introduction: The Crime Analysis task|
|Approaches taken to the problem|
|Detailed Analysis and consideration of the appropriateness of the solution for the initial problem|
|Evaluation and Conclusion|
|Style and Referencing|
Since the elements above are wide ranging, general criteria are given that are applied as a percentage to each component of the portfolio. In the following, ‘writing’ is understood to apply both to coding and English.