Building Neural Network Classifiers for Text and Images
This notebook shows how build neural networks using Scikit-learn and Keras to classify text and images.
The Data
- Part 1 of this project uses
categorized-comments.jsonl:- Contains 450,000 rows with two columns- category (
cat) and text (txt)
- Contains 450,000 rows with two columns- category (
- Part 2 of this project uses the MNIST dataset included in Keras:
- You can import the data directly into testing and training sets using this code:
from keras.datasets import mnist (X_train, y_train), (X_test, y_test) = mnist.load_data()Tasks
- You can import the data directly into testing and training sets using this code:
Part 1: Text Classification
-
Build a neural network with Scikit-learn
-
Build a neural network with Keras
-
Use the neural networks to classify Reddit comments into one of four categories:
a. News
b. Science and Technology
c. Sports
d. Video Games
-
Report each model’s accuracy, precision, recall, F1, and confusion matrix
Part 2: Image Classification
-
Load the MNIST dataset from Keras into testing and training sets
-
Reshape and rescale the images to feed into the model
-
Build a neural network with Keras
-
Train and test the model
-
Report the model’s accuracy and loss
Results
Below, you can find the accuracy for each model, but you can find the remainder of the evaluation metrics in the Jupyter Notebook located in this project’s GitHub repository.
Text Classification
Scikit-learn Accuracy: 0.61725
Keras Accuracy: 0.64225
Image Classification
Keras Accuracy: 0.9855999946594238
Author
Xander Hieken



