Unit Code :- CE705
Title :- Introduction To Programming In Python
Assessment Type :- Assignment
1) Can I submit anything other than a working .py file?
Absolutely not.You must submit a .py file that works out of the box in Idle. If you submit any thing else (e.g. ipynb, pyw, pyi, etc.) this will be an automatic fail. You must submit a .py file that works in Idle.
CE705 Introduction To Programming In Python Assignment – UK
2) Can I import modules?
You can import any modules that come with Python (e.g. math, os, etc). You cannot use any module that requires extra installation (e.g. Pandas). The only exception to this rule is NumPy.
3) Can I make a small change in the return type of a function or method?
No. If a function or method is supposed to return a number say 5 and you return 5 [5] or anything other than just 5 you will lose all marks related to this function or method.
4) Can I make a small change in the data type of a parameter?
No. Such changes will lead to you losing all marks related to the function or method in question.
5) Can I add or remove a parameter?
No. Such changes will lead to you losing all marks related to the function or method in question.
6) Can I make a minor change in the name of a function or method?
No. Such changes will lead to you losing all marks related to the function or method in question.
Please note that Python is case-sensitive. For instance, the name run_test is not the same thing as Run_Test.
7) Can I implement extra functions or methods to make my code easier/cleaner?
Yes. Please note you must implement all the functions and methods described in the assignment brief. If you’d like to implement more, you are welcome to do so.
8) Can I implement the algorithm in this assignment in any other way than what the
assignment brief describes?
No. In this assignment we are trying to measure your ability to code a programme following a specification Hence you must follow this specification.
9) Does my code need to work only for the data set provided?
No it should work for any data set. In other words, do not hard code values such as the number of rows the number of columns etc.
10) Why am I not allowed to make changes?
Large pieces of software (e.g. Windows) are not written by a single programmer, but by many. All programmers will be working on different parts of the software, but all of these parts are likely to interact in some way. The programme specification makes sure everybody knows what each function expects to receive and what each function should return. If one programmer unilaterally decides to make a small change that goes against the specification… then the software will not work as expected.
Assignment: identifying groups of similar wines
Make sure you read the below carefully as there are key differences between this and the previous assignment!
A sommelier is a trained professional who spends his or her day tasting different wines and identifying similarities or sometimes dissimilarities between these. Given this is clearly an exhausting task you have been hired to develop a software capable of grouping similar wines together. Your software will load a data set containing information about each wine (Alcohol content alkalinity of ash Proanthocyanins colour intensity etc) and identify which wines are similar.
Luckily your employer has already identified a suitable algorithm and designed the software for you. All you are required to do is to write the actual source code (with comments).
Technical details:
You’ll be using different data structures to accomplish the below. Your assignment must contain the code for the functions and methods below. If you wish you can write more functions and methods but those described below must be present. Any reference to matrix relates to the matrix class below (you have to figure out how it relates).
1) Class: matrix
You will code a class called matrix, which will have an attribute called array_2 d. This attribute is supposed to be a Num Py array containing numbers in two dimensions. The class matrix must have the following methods: (in these the parameters are in addition to self)
load_from_csv
This method should have one parameter, a file name (including, if necessary, its path and extension).
This method should read this CSV file and load its data to the array_2d of matrix. Each column in this file should be a row in array_2d. Notice that in CSV files a comma separates columns CSV = comma separated values.
You should also write code so that
m = matrix(‘validfilename.csv’)
Creates a matrix m with the data in the file above in array_2d.
standardise
This method should have no parameters. It should standardise the array_2d in the matrix calling this method For details on how to standardise a matrix read the appendix.
get_distance
This method should have three mandatory parameters, two matrices (let us call them other_matrix and weights and a number let us call it beta. It should also have one optional parameter, let us call it i (it must have a default value which you must figure out). If the matrix calling this method and the matrix weights have only one row this method should return a matrix containing the weighted distance between the row in the matrix calling this method and each of the rows in other_matrix. If the matrix calling this method has more than one row (and the matrix weights has only one row this method should return a matrix containing the weighted distance between the row i in the matrix calling this method and each of the rows in other_matrix (using the weights in the matrix weights).For details about how to calculate this distance, read the appendix. To be clear: if other_matrix has n rows, the matrix returned in this method will have n rows and 1 column.
get_frequency_count
This method should have no parameters and it should work if the array_2d of the matrix calling this method has only one column. This method should return a list with as many elements as there are different values in array_2d. For instance if array_2d has elements 1, 2, 2, 3, 3, 5 then there are 4 different elements.
Each element of this list should be a list with two values. The first value should be a unique value of array_2d. The second value should be how many times this unique value appears in array_2 d. So if array_2d has the elements 1, 2, 2, 3, 3, 5 this method should return [[1,1], [2,2], [3,2], [5,1]]
2) Functions
The code should also have the functions (i.e. not methods, so not part of the class matrix) below. No code should be outside any function or method in this assignment.
get_initial_weights
This function should have one parameter, an integer m.This function should return a matrix with 1 row and m columns containing random values each between zero and one. The sum of these m values should be equal to m divided by 2.
get_centroids
This function should have three parameters: (i) a matrix containing the data, (ii) the matrix S, (iii) the value of K. This function should implement the Step 9 of the algorithm described in the appendix. It should return a matrix containing K rows and the same number of columns as the matrix containing the data.
CE705 Introduction To Programming In Python Assignment – UK
get_groups
This function should have three parameters: a matrix containing the data and the number of groups to be created (K) and a number beta for the distance calculation. This function follows the algorithm described in the appendix. It should return a matrix S (defined in the appendix). This function should use the other functions you wrote as much as possible. Do not keep repeating code you already wrote.
get_new_weights
This function takes four parameters: a matrix containing the data a matrix containing the centroids a matrix S see the algorithm in the Appendix and a number beta. This function should return a new matrix weights with 1 row and as many columns as the matrix containing the data and the matrix containing the centroids. Follow Step 10 of the algorithm in the Appendix.
run_test
Your code must contain the function below (do not change anything)
def run_test():
m = matrix(‘Data.csv’)
for k in range(2,5):
for beta in range(11,25):
S = get_groups(m, k, beta/10)
print(str(k)+‘-’+str(beta)+‘=’+str(S.get_count_frequency()))
The aim of this function is just to run a series of tests By consequence here and only here you can use hard coded values for the strings containing the file names of data and values for K.
CE705 Introduction To Programming In Python Assignment – UK
More details
You will implement a data driven algorithm that creates groups of entities here an entity is a wine described as a row in our data matrix that are similar. If two entities are assigned to the same group by the algorithm it means they are similar. This will create groups of similar wines. Your software just needs the number of groups the user wants to partition the data into the data itself and a numeric value for Beta.
The number of partitions (K) is clearly a positive integer Your software should only allow values in the interval [2, n-1] where n is the number of rows in the data This way you’ll avoid trivial partitions. You can test values of Beta that are higher than 1.