TensorFlow Sample

A Very Basic TensorFlow Sample Project

This project is meant to show a simple use of a deep neural network with TensorFlow to classify the Iris Dataset. This project is meant to show off a clean and simple use of the high-level APIs exposed by TensorFlow. Many of the "hello world" or introductory samples I've found tend to start off with lower level components of TensorFlow, or involve a lot of "plumbing" before you actually get to the relevant part of the solution. This is how to create a deep neural network to classify a well known dataset in ~20 lines of code.

Install Dependencies:

First, Install TensorFlow.

We also use pandas for reading in and manipulating the data file, so run:

pip install pandas

Background

The iris dataset we're using is a well known machine learning dataset that includes measurements of 150 iris flowers of three different species. The measurements are of sepal length, sepal width, petal length, and petal width. The idea is that we should be able to figure out the species from those measurements with some degree of accuracy. I've gotten 95-99% accuracy on most runs of this example.

So basically, given a bunch of data like this,

Sepal Length	Sepal Width	Petal Length	Petal Width	Species
4.8	3.1	1.6	0.2	Iris-setosa
5.8	2.7	4.1	1.0	Iris-versicolor
6.4	2.7	5.3	1.9	Iris-virginica

Full Dataset

We want to be able to figure out the species of this iris:

Sepal Length	Sepal Width	Petal Length	Petal Width	Species
6.8	2.8	4.8	1.4	??????

We'll be using a pre-built Deep Neural Network classifier provided in TensorFlow. A neural network is a network of N inputs (in this case 4 - one for each measurement), N outputs (in this case 3 - one for each possible species), and some amount of interconnected "hidden" layers in between. Anything with more than one hidden layer is a "deep" neural network. There is no real magic formula for how many hidden layers you should have or how many nodes each one should contain - you generally have to tweak this until you find a balance of performance/accuracy which fits your problem. We're arbitrarily using 2 hidden layers with 10 nodes each. In this case, this may be overkill, but this is just an example.

In Depth

The full source code is in main.py.

Preparing the Data

data = pd.read_csv("./iris.data.csv", names=measurements+["Species"])

test_features = data.sample(frac=0.2)
train_features = data.drop(test_features.index)

train_labels = train_features.pop("Species")
test_labels = test_features.pop("Species")

In this snippet, we're just setting up the data. We need to split the data into training and testing data because one of the major concerns with a machine learning algorithm is over-fitting the training data such that the model is not generalizable. In our case, we save 20% of the data for testing, and use the rest for training. The we separate out the labels (the value we're trying to figure out, in our case the species of the iris) from the features (the values we'll be using to figure out the species).

We're using pandas here because it's concise, and TensorFlow has some pre-built functions for reading in data from pandas which will make our lives easier later on.

Defining the Columns

columns = [tf.feature_column.numeric_column(key=column) for column in measurements]

In this snippet, we're taking the names of the measurements (which are the columns of our dataset we will try to determine the species based on) and mapping them to a TensorFlow feature column. These tell the classifier how to interpret the data it's seeing. In this case, the measurements are all numeric, so this is pretty simple.

Creating the Estimator

classifier = tf.estimator.DNNClassifier(
  feature_columns=columns,
  hidden_units=[10,10],
  label_vocabulary=species,
  n_classes=len(species))

In this snippet, we're creating a Deep Neural Network classifier, which is a pre-built high-level estimator provided by TensorFlow.

A classification algorithm is a machine learning algorithm where the result is probabilities of an item being a member of one of several "classes". For the Iris dataset, this is exactly what we need because we're determining the probability of a flower's species being setosa, versicolor, or virginica based on it's measurements.

If we wanted to do the opposite, and determine what a measurement was based off the species and some other measurements, we'd use a regression algorithm. Tensorflow supplies a several pre-build classifiers and regressors to choose from.

Defining Input Functions

train_input_fn = tf.estimator.inputs.pandas_input_fn(x=train_features,y=train_labels,shuffle=True,batch_size=50,num_epochs=100)
test_input_fn = tf.estimator.inputs.pandas_input_fn(x=test_features,y=test_labels,shuffle=True,batch_size=50,num_epochs=100)

In this snippet, we're defining two input functions. TensorFlow uses input functions to get data in batches from a data source. Fortunately, TensorFlow provides a built-in method for converting a pandas DataFrame into an input method. We specify the x (features) and y (labels) data sets we prepared earlier, instruct it to randomize the input, and iterate though it in batches of 50, repeating the original dataset up to 100 times. These are both somewhat arbitrary values, but since there are 150 values in our initial data, this will ensure that each time though it will be a somewhat different subset of the data which will help reduce over-matching.

If we weren't using pandas, we could use TensorFlow DataSets to create the input function.

Training the Classifier

classifier.train(input_fn=train_input_fn)

This part is pretty self-explanitory. It will run through the data in the batches we set up above, and train the model based on the labels we supplied. We can also manually specify a "steps" parameter that tells it how many batches to run through but by default it will just run until the input function has run to completion.

Evaluate the Classifier

evaluation = classifier.evaluate(input_fn=test_input_fn)

This part is also pretty self-explanitory. It runs through the test data we set aside earlier and compares what the algorithm predicted vs the known label and returns a result with the accuracy, among other statistics. If all goes well, this should be well above 95%.

Predict an Unlabeled Item

predict_features = pd.DataFrame.from_dict({
    "SepalLength": [6.8],
    "SepalWidth": [2.8],
    "PetalLength": [4.8],
    "PetalWidth": [1.4]
})

predictions = classifier.predict(input_fn=tf.estimator.inputs.pandas_input_fn(x=predict_features,shuffle=False))

prediction = next(predictions)
class_id = prediction["class_ids"][0]
probability = prediction["probabilities"][class_id]
species = species[class_id]
print("prediction is {species} with {probability:.3f}% certainty".format(species=species, probability=probability*100))

Now we can actually use the neural network to do something useful! It's trained and evaluated, so we'll use it to predict the species of a single iris based on the petal measurements. The predict method returns a generator of results. Since we just want a single result, we just call next() on it and grab out the species and probability based on the class_id of the prediction.

Name	Name	Last commit message	Last commit date
Latest commit edamtoft Fix Spelling in Image Feb 28, 2018 3cf4643 · Feb 28, 2018 History 11 Commits
.gitignore	.gitignore	Initial commit	Feb 28, 2018
LICENSE	LICENSE	Initial commit	Feb 28, 2018
README.md	README.md	Fix Spelling	Feb 28, 2018
iris.data.csv	iris.data.csv	Add Initial Files	Feb 28, 2018
main.py	main.py	Fix Spelling	Feb 28, 2018
neural_network.png	neural_network.png	Fix Spelling in Image	Feb 28, 2018
tensorflow.png	tensorflow.png	Add Tensorflow Label	Feb 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorFlow Sample

Install Dependencies:

Background

In Depth

Preparing the Data

Defining the Columns

Creating the Estimator

Defining Input Functions

Training the Classifier

Evaluate the Classifier

Predict an Unlabeled Item

About

Releases

Packages

Languages

License

edamtoft/TensorFlowSample

Folders and files

Latest commit

History

Repository files navigation

TensorFlow Sample

Install Dependencies:

Background

In Depth

Preparing the Data

Defining the Columns

Creating the Estimator

Defining Input Functions

Training the Classifier

Evaluate the Classifier

Predict an Unlabeled Item

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages