Fruits Recognition using Neural Networks Techniques

Published in

The Startup

11 min readJan 16, 2021

Introduction

Computer vision methods and strategies can help to recognize the fruits with some basic
features like the color of fruits, the intensity of fruits, the shape of fruits, and the texture of the
fruits. The term “recognize” is to predict the name of the fruit. In this project, we are going to use 81 different fruit classes. We will train the model using TensorFlow.

Problem description

To build a robust system to recognize the fruits according to the color of fruits, the intensity of fruits, the shape of fruits, and texture of the
fruits.

Evolution measures

After training the model, we will apply the evaluation measures to check how the model is getting predictions. We will use the following evaluation measures to evaluate the performance of the model:

Accuracy
Plots of training and validation scores

Technical Approach

We are using python language in the implementations and Jupyter Notebook that supports machine learning and data science projects. We will build a TensorFlow-based model. We will use the Fruits360 dataset. The dataset providers provide the training and test data separately. After training on the model, we will evaluate the model to check the performance of the trained model.

Source of Data

https://data.mendeley.com/datasets/rp73yg93n8/1">https://data.mendeley.com/datasets/rp73yg93n8/1

We are using the following versions of the libraries:

NumPy == 1.18.5
TensorFlow ==1.7.0
keras == 2.4.3
matplotlib ==3.3.2

How we can install the libraries in python?

To install the python library is it very easy

pip install name_of_libraryLike if you wanted to install TensorFlow?
pip install TensorFlow

Exploration of data is not necessary for training the model but it's a good practice to look at the dataset so that we can analyze that what type of data we are using and how we can handle it.

Labels :

Labels are the targets like in this project names of the fruits are labels.

Inputs :

Inputs are the data that we feed into machine learning like in this project images are the inputs.

Training Data

We use training data when we train the models. We feed train data to machine learning and deep learning models so that model can learn from the data.

Validation Data

We use validation data while training the model. We use this data to evaluate the performance that how the model performs on training time.

Testing Data

We use testing data after training the model. We use this data to evaluate the performance that how the model performs after training. So in this way first we get predictions from the trained model without giving the labels and then we compare the true labels with predictions and get the performance of the model.

We are loading all training and testing data
Saving inputs/images in x_train of training data and saving labels in y_train
Saving inputs/images in x_test of testing data and saving labels in y_test

Classes of fruits

81 are labels/classes which are the names of fruits because we are using fruits of 81 types.

Vector of the ytrain first record

As we know that we are using 81 classes/labels, so we created a vector of 81 values.
We can see that there is only 1 and all are zero’s that is showing a label of the first image.

Now, we have to divide the validation set into test and validation set

We are splitting the test data into validation data. Validation data will be used while training the model to check the performance during training and test data will be used after training the model.

Till now the data is just images. We need to convert them into arrays form for the training and testing because the machine learning model only understands the numeric data so we have to convert the images into arrays form.

Features scaling from 0–255 to 0–1

After converting the images into arrays form, the features/pixels ranges are from 0 to 255. We are scaling the features/pixels from 0–255 to 0–1 range.
Why we are doing this?
By doing this, we can reduce the training time. Because we can also feed the same features but when the model will train so it takes more time by calculating the values from 0–255 than 0–1. That is why we are scaling the features.

def tensorflow_based_model():
model = Sequential() #step 1
model.add(Conv2D(filters = 16, kernel_size = 2,input_shape=(100,100,3),padding=’same’)) #step2
model.add(Activation(‘relu’)) # step3
model.add(MaxPooling2D(pool_size=2)) #step4
model.add(Conv2D(filters = 32,kernel_size = 2,activation= ‘relu’,padding=’same’)) #repeating step 2 and step3 but with more filters of 32
model.add(MaxPooling2D(pool_size=2)) #repeating step 4 again
model.add(Conv2D(filters = 64,kernel_size = 2,activation= ‘relu’,padding=’same’)) #repeating step 2 and step3 but with more filters of 64
model.add(MaxPooling2D(pool_size=2)) #repeating step 4 again
model.add(Conv2D(filters = 128,kernel_size = 2,activation= ‘relu’,padding=’same’)) #repeating step 2 and step3 but with more filters of 64
model.add(MaxPooling2D(pool_size=2)) #repeating step 4 again
model.add(Dropout(0.3)) # step5
model.add(Flatten()) #step 6
model.add(Dense(150)) #step 7
model.add(Activation(‘relu’)) # setp 3
model.add(Dropout(0.4)) # step 5
model.add(Dense(81,activation = ‘softmax’)) # setp3 and step7. but this time, we are using activation function as softmax (if we train on two classes then we set sigmoid)
return model #function returning the value when we call it

Step 1

- We are calling base Sequancial model for training and for further tuning of parameters on image data. We must call it when we work on the Keras, TensorFlow based libraries.

Step 2

- Conv2D is the 2D convolutional layer(where filters are applied to the original image with specific features map to reduce the number of features), Conv2D layer creates the convolution kernel(fixed size of boxes to apply on the image like below in the example gif) that take input of 16 filters which help to produce a tensor of outputs. We are giving input of the image with a size of 100 widths and 100 height and 3 is the channel for RGB.

Step 3

- Activation function is a node that is put at the end of all layers of the neural network model or in between neural network layers. Activation function help to decide which neuron should be pass and which neuron should fire. So activation function of the node defines the output of that node given an input or set of inputs.

Step 4

- Max pooling is a pooling operation that calculates the maximum value in each patch of each feature map. It takes the value from input vectors and prepares the vector for the next layers.

Step 5

- Dropout layer drops some neurons from previous layers. why we apply this? We apply this to avoid overfitting problems. In overfitting, the model gives good accuracy on training time but not good at testing time.

Step 6

- Flatten layer convert the 2D array into 1D array of all features.

Step 7

The dense layer reduces the outputs by getting inputs from the Flatten layer. The dense layer use all the inputs of previous layer neurons and perform calculations and send 150 outputs

Model compilation

First, we are calling the model
We are using 81 classes so we set loss as “categorical_crossentropy”. We use the loss as “binary_crossentropy” for two classes.
The optimizer is a function that used to change the features of the neural network such as learning rate (how the model learns with features) in order to reduce the losses. So the learning rate of neural networks to reduce the losses is defined by the optimizer.
We are setting metrics=accuracy because we are going to calculate the percentage of correct predictions overall predictions on the validation set

model = tensorflow_based_model() # here we are calling the function of created modelmodel.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Training the model with parameter tuning

We are feeding the training data and validation data to start the training of the model.
We set the following parameters:
Batch size =32 so the model takes 80 images in each iteration and trains them. Batch size is a term used in machine learning and refers to the number of training examples utilized in one iteration.
Epochs =30 so the model will train on the data 30 times. Epoch is a term used in machine learning and indicates the number of passes of the entire training dataset the machine learning algorithm has completed.
We can choose batch_size, and epochs, as we want so the good practice, is to set some values and train the model if the model will not give good results we can change it and then try again for the training of the model. We can repeat this process much time until we will not get good results and this process called parameter tuning.

history = model.fit(x_train,y_train,
batch_size = 32,
epochs=30,
validation_data=(x_valid, y_vaild),
verbose=2, shuffle=True)Train on 41322 samples, validate on 7000 samples
Epoch 1/30
 - 533s - loss: 1.3395 - acc: 0.6205 - val_loss: 0.3280 - val_acc: 0.8994Epoch 00001: val_loss improved from inf to 0.32804, saving model to cnn_from_scratch_fruits.hdf5
Epoch 2/30
 - 443s - loss: 0.2564 - acc: 0.9130 - val_loss: 0.1623 - val_acc: 0.9446Epoch 00002: val_loss improved from 0.32804 to 0.16229, saving model to cnn_from_scratch_fruits.hdf5
Epoch 3/30
 - 402s - loss: 0.1590 - acc: 0.9472 - val_loss: 0.0849 - val_acc: 0.9776Epoch 00003: val_loss improved from 0.16229 to 0.08487, saving model to cnn_from_scratch_fruits.hdf5
Epoch 4/30
 - 423s - loss: 0.1314 - acc: 0.9575 - val_loss: 0.1032 - val_acc: 0.9661Epoch 00004: val_loss did not improve from 0.08487
Epoch 5/30
 - 409s - loss: 0.1156 - acc: 0.9636 - val_loss: 0.0924 - val_acc: 0.9756Epoch 00005: val_loss did not improve from 0.08487
Epoch 6/30
 - 424s - loss: 0.1141 - acc: 0.9650 - val_loss: 0.1059 - val_acc: 0.9714Epoch 00006: val_loss did not improve from 0.08487
Epoch 7/30
 - 379s - loss: 0.1120 - acc: 0.9668 - val_loss: 0.0924 - val_acc: 0.9767Epoch 00007: val_loss did not improve from 0.08487
Epoch 8/30
 - 361s - loss: 0.1003 - acc: 0.9696 - val_loss: 0.0829 - val_acc: 0.9737Epoch 00008: val_loss improved from 0.08487 to 0.08288, saving model to cnn_from_scratch_fruits.hdf5
Epoch 9/30
 - 395s - loss: 0.1009 - acc: 0.9715 - val_loss: 0.0851 - val_acc: 0.9781Epoch 00009: val_loss did not improve from 0.08288
Epoch 10/30
 - 404s - loss: 0.1061 - acc: 0.9725 - val_loss: 0.1030 - val_acc: 0.9729Epoch 00010: val_loss did not improve from 0.08288
Epoch 11/30
 - 393s - loss: 0.1099 - acc: 0.9704 - val_loss: 0.0878 - val_acc: 0.9727Epoch 00011: val_loss did not improve from 0.08288
Epoch 12/30
 - 395s - loss: 0.1158 - acc: 0.9712 - val_loss: 0.0641 - val_acc: 0.9804Epoch 00012: val_loss improved from 0.08288 to 0.06410, saving model to cnn_from_scratch_fruits.hdf5
Epoch 13/30
 - 383s - loss: 0.1082 - acc: 0.9735 - val_loss: 0.0908 - val_acc: 0.9767Epoch 00013: val_loss did not improve from 0.06410
Epoch 14/30
 - 395s - loss: 0.1190 - acc: 0.9716 - val_loss: 0.0703 - val_acc: 0.9816Epoch 00014: val_loss did not improve from 0.06410
Epoch 15/30
 - 385s - loss: 0.1220 - acc: 0.9731 - val_loss: 0.1323 - val_acc: 0.9779Epoch 00015: val_loss did not improve from 0.06410
Epoch 16/30
 - 387s - loss: 0.1209 - acc: 0.9728 - val_loss: 0.1414 - val_acc: 0.9796Epoch 00016: val_loss did not improve from 0.06410
Epoch 17/30
 - 384s - loss: 0.1268 - acc: 0.9738 - val_loss: 0.0637 - val_acc: 0.9866Epoch 00017: val_loss improved from 0.06410 to 0.06373, saving model to cnn_from_scratch_fruits.hdf5
Epoch 18/30
 - 384s - loss: 0.1292 - acc: 0.9729 - val_loss: 0.1115 - val_acc: 0.9774Epoch 00018: val_loss did not improve from 0.06373
Epoch 19/30
 - 386s - loss: 0.1251 - acc: 0.9743 - val_loss: 0.0794 - val_acc: 0.9841Epoch 00019: val_loss did not improve from 0.06373
Epoch 20/30
 - 385s - loss: 0.1280 - acc: 0.9755 - val_loss: 0.1141 - val_acc: 0.9801Epoch 00020: val_loss did not improve from 0.06373
Epoch 21/30
 - 383s - loss: 0.1398 - acc: 0.9750 - val_loss: 0.1096 - val_acc: 0.9819Epoch 00021: val_loss did not improve from 0.06373
Epoch 22/30
 - 388s - loss: 0.1377 - acc: 0.9749 - val_loss: 0.1193 - val_acc: 0.9829Epoch 00022: val_loss did not improve from 0.06373
Epoch 23/30
 - 382s - loss: 0.1443 - acc: 0.9752 - val_loss: 0.2706 - val_acc: 0.9670Epoch 00023: val_loss did not improve from 0.06373
Epoch 24/30
 - 377s - loss: 0.1576 - acc: 0.9737 - val_loss: 0.1450 - val_acc: 0.9804Epoch 00024: val_loss did not improve from 0.06373
Epoch 25/30
 - 381s - loss: 0.1446 - acc: 0.9758 - val_loss: 0.0561 - val_acc: 0.9849Epoch 00025: val_loss improved from 0.06373 to 0.05615, saving model to cnn_from_scratch_fruits.hdf5
Epoch 26/30
 - 377s - loss: 0.1663 - acc: 0.9748 - val_loss: 0.1021 - val_acc: 0.9854Epoch 00026: val_loss did not improve from 0.05615
Epoch 27/30
 - 376s - loss: 0.1482 - acc: 0.9778 - val_loss: 0.1740 - val_acc: 0.9789Epoch 00027: val_loss did not improve from 0.05615
Epoch 28/30
 - 376s - loss: 0.1456 - acc: 0.9778 - val_loss: 0.0845 - val_acc: 0.9827Epoch 00028: val_loss did not improve from 0.05615
Epoch 29/30
 - 376s - loss: 0.1461 - acc: 0.9771 - val_loss: 0.1312 - val_acc: 0.9836Epoch 00029: val_loss did not improve from 0.05615
Epoch 30/30
 - 379s - loss: 0.1637 - acc: 0.9765 - val_loss: 0.0561 - val_acc: 0.9887Epoch 00030: val_loss improved from 0.05615 to 0.05614, saving model to cnn_from_scratch_fruits.hdf5

We need to do all the above configurations to train the model. If we will not set all settings correctly then we could not get the desired results.

Accuracy score on the test data

Accuracy is the number of correctly recognized images from all the images.
For example, if the trained model recognizes the 90 images correct and 10 images wrong from a total of 100 images then the accuracy score will be 90%.
Accuracy= Total number of correct predictions/Total number of predictions

acc_score = model.evaluate(x_test, y_test) #we are starting to test the model here
print('\n', 'Test accuracy:', acc_score[1])Test accuracy: 0.9853133634805368

Visualization with prediction

We are using the trained model and getting predictions on the test data
Outside the bracket are the predictions names of the fruits
Inside the bracket are the true label names of the fruits

predictions = model.predict(x_test)
fig = plt.figure(figsize=(16, 9))
for i, idx in enumerate(np.random.choice(x_test.shape[0], size=16, replace=False)):
    ax = fig.add_subplot(4, 4, i + 1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(x_test[idx]))
    pred_idx = np.argmax(predictions[idx])
    true_idx = np.argmax(y_test[idx])
    ax.set_title("{} ({})".format(target_labels[pred_idx], target_labels[true_idx]),
                 color=("green" if pred_idx == true_idx else "red"))

Visualization of the loss and accuracy with respect to epochs

We are looking at the history of the model of each epoch as we trained our model on 30 epochs.
Blueline shows the training accuracy and also the training loss.
The orange line shows the Testing accuracy and also the testing loss.
Accuracy and loss on the train and test data start from zero and finally close to 1 (100%).

plt.figure(1)  
plt.subplot(211)  
plt.plot(history.history['acc'])  
plt.plot(history.history['val_acc'])  
plt.title('model accuracy')  
plt.ylabel('accuracy')  
plt.xlabel('epoch')  
plt.legend(['train', 'test'], loc='upper left')  
plt.subplot(212)  
plt.plot(history.history['loss'])  
plt.plot(history.history['val_loss'])  
plt.title('model loss')  
plt.ylabel('loss')  
plt.xlabel('epoch')  
plt.legend(['train', 'test'], loc='upper left')  
plt.show()

Conclusion 📝

We used fruits360 dataset and explored the data in different ways.
We prepared the images data and extract the features.
We trained the model based on TensorFlow with all settings.
We evaluated the model with accuracy and look at the performance of the model with plots.
If you are interested to work on any image-based project, prepare the data as we prepared in this project and there could be some changes in the code like the number of classes or loss function.
We worked on the classification problem and specifically we call it multi-class classification because we are using in total 81 classes of fruits.

You can follow me at https://www.kaggle.com/imranzaman5202