(a) You can use any platform and you need to convert all 8 files to the corresponding format.
(b) Run DBScan clusterer on the dataset and find the parameter values that find the optimal number of clusters (if such parameter values exist) - the optimal number of clusters is provided in the site above.
(c) Give the screen shot of the figures of clustering result with the coding part if you have. Runeach of the 8 shapes datasets separately, and
scatterplotshould be fine. This can be helpful for the Project No.3.
Project1 Basic Classification
Provided data:
The training sets with label information
The testing sets with no labels. The lecturer hold the hidden labels for testing set.
Goal: train a good classifier and apply the testing set to achieve the predicted labels.
Data preprocessing before start:
You need to convert the data from the left side to the right side.
And check if there are some missing values, and deal with the missing value if you have with the techniques we have already covered in our course.
You need to explore:
(You should also talk about those part in both the presentation and your project report part)
There are so many classifiers(at least 4 classifiers you have learned should tried)
Which one is better?
There may be some parameters in training the classifiers, how to determine the optimal values?
Submission:
1. You should submit a ‘res.csv’ or ‘res.xlsx’ file as the result and you need to specify the column name is ‘pred’. So if you open the file it should look like this:
https://newlearn.govst.edu/bbcswebdav/pid XXXXXXXXXXdt-content-rid-15839000_1/xid-15839000_1
file for assignment 2 (Project 1) is attached