- Questions & Answers
- Accounting
- Computer Science
- Automata or Computationing
- Computer Architecture
- Computer Graphics and Multimedia Applications
- Computer Network Security
- Data Structures
- Database Management System
- Design and Analysis of Algorithms
- Information Technology
- Linux Environment
- Networking
- Operating System
- Software Engineering
- Big Data
- Android
- iOS
- Matlab

- Economics
- Engineering
- Finance
- Thesis
- Management
- Science/Math
- Statistics
- Writing
- Dissertations
- Essays
- Programming
- Healthcare
- Law

- Log in | Sign up

For the following assignments, please provide as much evidence of the results as possible, including the code, screenshots (only plots – not text or code) and documentation. Submit only one pdf file and .ipynb / .py files containing the code with documentation.

Choose any cleaned dataset such as the ones here: https://www.kaggle.com/search?q=cleaned+datasets+datasetFileTypes%3Acsv

1.a. [10 points]

Ignore the label column and apply the AgglomerativeClustering method from sklearn.cluster on this dataset. Use min, average, and ward methods explained in the class to perform the hierarchical clustering. Please feel free to refer to https://scikit-learn.org/stable/auto_examples/cluster/plot_digits_linkage.html#sphx-glr-auto-examples-cluster-plot-digits-linkage-py

1.b. [10 points]

Generate visualizations like in the above tutorial and dendrograms (please feel free to refer https://scikit-learn.org/stable/search.html?q=dendrogram) for each of the methods.

1.c. [10 points]

Which method produces clusters that are most closely aligned with the labels in the dataset? Explain.

1.d. [20 points]

Using the k-means algorithm where k=2 and corresponding visualizations, explain if it fares better than the agglomerative approaches in terms of the alignment with the labels.

Hint:

(a) Choose a smaller dataset for easier and better visualization and analysis

(b) Cut the dendrogram at an appropriate level to result in just two clusters, in order to see how aligned these two clusters are with the assigned labels.

2. [25 points]

The wine data set at https://archive.ics.uci.edu/ml/datasets/wine has 13 features. Develop in Python and apply your own version of the PCA algorithm to this data set, to visualize how PCA helps with dimensionality reduction. Explain how many Principal Components you will choose and why. What percent of the variance in the data do the selected Principal Components cover?

For the implementation, you may use any objects, modules, and functions in NumPy, SciPy and other python libraries to do various operations such as to compute the eigen values, vectors or perform any other math / linear algebra operation, but not use the PCA function available in SciKit-Learn directly.

3.a. [20 points]

Refer to online tutorials on regularization such as

https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2

and

https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b

Apply the techniques from the above tutorial to the student dataset at https://archive.ics.uci.edu/ml/datasets/student+performance

Does regularization help improve the accuracy of predicting the final Math grade of the students?

3.b. [5 points]

For regularization, we added the regularizer to the loss function. Does it make sense to multiply or subtract the term, instead? Explain.

Choose any cleaned dataset such as the ones here: https://www.kaggle.com/search?q=cleaned+datasets+datasetFileTypes%3Acsv

1.a. [10 points]

Ignore the label column and apply the AgglomerativeClustering method from sklearn.cluster on this dataset. Use min, average, and ward methods explained in the class to perform the hierarchical clustering. Please feel free to refer to https://scikit-learn.org/stable/auto_examples/cluster/plot_digits_linkage.html#sphx-glr-auto-examples-cluster-plot-digits-linkage-py

1.b. [10 points]

Generate visualizations like in the above tutorial and dendrograms (please feel free to refer https://scikit-learn.org/stable/search.html?q=dendrogram) for each of the methods.

1.c. [10 points]

Which method produces clusters that are most closely aligned with the labels in the dataset? Explain.

1.d. [20 points]

Using the k-means algorithm where k=2 and corresponding visualizations, explain if it fares better than the agglomerative approaches in terms of the alignment with the labels.

Hint:

(a) Choose a smaller dataset for easier and better visualization and analysis

(b) Cut the dendrogram at an appropriate level to result in just two clusters, in order to see how aligned these two clusters are with the assigned labels.

2. [25 points]

The wine data set at https://archive.ics.uci.edu/ml/datasets/wine has 13 features. Develop in Python and apply your own version of the PCA algorithm to this data set, to visualize how PCA helps with dimensionality reduction. Explain how many Principal Components you will choose and why. What percent of the variance in the data do the selected Principal Components cover?

For the implementation, you may use any objects, modules, and functions in NumPy, SciPy and other python libraries to do various operations such as to compute the eigen values, vectors or perform any other math / linear algebra operation, but not use the PCA function available in SciKit-Learn directly.

3.a. [20 points]

Refer to online tutorials on regularization such as

https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2

and

https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b

Apply the techniques from the above tutorial to the student dataset at https://archive.ics.uci.edu/ml/datasets/student+performance

Does regularization help improve the accuracy of predicting the final Math grade of the students?

3.b. [5 points]

For regularization, we added the regularizer to the loss function. Does it make sense to multiply or subtract the term, instead? Explain.

Answered 3 days AfterMay 11, 2021

{

"cells": [

{

"cell_type": "code",

"execution_count": 87,

"metadata": {},

"outputs": [],

"source": [

"import pandas as pd\n",

"import...

- 1 Arithmetic operators These are the built-in arithmetic operators of Python. +, −, ∗, /: Usual addition, subtraction, multiplication and division. a ∗ ∗b: ab a%b: Remainder operator. Returns...Oct 23, 2021
- You can only use Python to do the assignment. 1) Write a function write a function odds(l) that takes a list of ints and returns a list of the odd elements. 2) write a function double(l) that takes a...Oct 23, 2021
- Version and change log Due Date : Friday 29th October 5pm AEST 1 Introduction Automatic detection of faults can be found in many engineering systems. There are systems to automatically diagnose faults...Oct 23, 2021
- COP1000 P-4 Create a program that reads data from a file and writes data to the file.... Use a list to store the data read from the file.... The data that you will have is a grocery list. Store data...SolvedOct 22, 2021
- BMI Analyzer</o:p> You will be creating a Python program that allows for the numeric entry of a person’s Height and Weight. You will perform some calculations to determine a person’s Body Mass...SolvedOct 22, 2021

- The Intersection of Values and Ethics To what extent are ethical values or principles grounded in culture and beliefs? Is it reasonable to expect people from culturally different backgrounds to share...Oct 23, 2021
- Ethical Values and Behavior How do ethical values shape behavior in organizations? Tyler (Chapter 9 in Moral Leadership ) and Batson (Chapter 8 in Moral Leadership ) have different perspectives on...Oct 23, 2021
- 1 Arithmetic operators These are the built-in arithmetic operators of Python. +, −, ∗, /: Usual addition, subtraction, multiplication and division. a ∗ ∗b: ab a%b: Remainder operator. Returns...Oct 23, 2021
- LAURA ThursdayOct 21 at 1:47pm Manage Discussion Entry Corporate social responsibility is an important construct within organizations, and a heavily studied topic in business school curricula. The...Oct 23, 2021
- For this Design Project #3, the goal is to create a visual service design or lo-fidelity prototype to capture the products or services you considered earlier. It is a way to showhow your ideas might...Oct 23, 2021

Copy and Paste Your Assignment Here

Copyright © 2021. All rights reserved.