Paul-Brenkus.github.io

DATA 310 Response - 07/08/2020

A.)

By splitting the data into separate groups (i.e training and testing), we are able to adjust parameters for the algorithm
without changing the entire code. It is also there to help eliminate bias amongst the data. The training data itself typically 
retains 70-80% of your data while your test data typically has ~20% of your remaining data. Ultimately you would want to use
the outcomes of your testing data for your answer. 

B.)

1st layer 
  - The first layer that was purposed was a 'Flatten' layer that will take a rectangle shape into a 1 dimension shape (shaping layer).
2nd layer
  - The second layer is a dense layer with 512 neurons activated. This line also includes 'relu' which means that if the neuron is less
    than 0, set it to 0.
3rd layer
  - The third and final layer is also a dense layer that has 10 neurons activated. Each will calculate each pariticular clothing
    type in the the list (10 items). This function 'relu' also finds the best candidate amongst the itmes in the list.

C.)

According to the video, the instructor explains that 'Adam' is the best optimizer for this application. Adam is a combination     of both
RMSprop and Stochastic Gradient Descent with a little 'umph!' behind it. According to 'TowardsDataScience.com', Adam uses the     squared
gradients to scale the data and learning rate such like RMSprop while taking the momentum using moving averages of the gradient. Adam is 
and adaptive learning method. According to the keras.io website, the loss function (SparseCategoricalCrossentropy) computes the 
crossentropy loss between the labels and predictions. If there are two or more label classes then this function should be used. 

D.)

1.What is the shape of the images training set (how many and the dimension of each)? (60000, 28,28)

2.What is the length of the labels training set? 60000

3.What is the shape of the images test set? (10000, 28,28)

4.Estimate a probability model and apply it to the test set in order to produce the array of probabilities that a randomly selected image is each of the possible numeric outcomes (look towards the end of the basic image classification exercises for how to do this — you can apply the same method applied to the Fashion MNIST dataset but now       apply it to the hand written letters MNIST dataset).

5.Use np.argmax() with your predictions object to return the numeral with the highest probability from the test labels dataset.

6.Produce the following plot for your randomly selected image from the test dataset