📱 Mobile Machine Learning

Detecting Age And Gender with TF-Lite On Android

Designing and training a custom TF Keras model

Shubham Panchal

Published in

Becoming Human: Artificial Intelligence Magazine

16 min readMay 18, 2021

We’ve all performed age, gender or emotion detection in Python with TensorFlow Keras. For most of us, a simple Keras models with Conv2D layers or a VGG-16 backbone might have given satisfactory results.

In this story, we implement two Keras models for age and gender estimation, whose sole purpose will be to run on Android. As we’ll deploy our model on an Android device, we’ll pay attention in making our model faster and also expect satisfactory results on our dataset.

Starting from the UTKFace dataset, we build our model and train it, finally exporting it to the TFLite format. Here’s the GitHub repo for the project,

shubham0204/Age-Gender_Estimation_TF-Android

Issues and Suggestions As the application uses two different models for age and gender estimation, we provide two…

github.com

Projects/Blogs from the author

The README section is all you need to start with the project. The above project is an Android app, so in order to train the model, we’ve provided two notebooks for training two separate models i.e. one for age estimation and other for gender classification.

The story only discusses the Python implementation of our model. The discussion on the Android implementation i.e. the designing of the app in the above GitHub repo will made in next blog.

Without wasting time, let’s get started!

📀 Dataset
🔨 Processing the Dataset
🤖 Model
🏋️‍♀️ Training the Model
✈️ Exporting the models to TensorFlow Lite format
📊 Results
👨‍💻 More projects/stories from the author ( More Mobile-ML projects! )
Final Words
References

📀 Dataset

To perform age and gender estimation, we’ll require a dataset which has human faces annotated with these two features. With a quick Google Search, you’ll come across the UTKFace dataset.

The UTKFace dataset has over 20K images with annotated facial landmarks, age, gender and ethnicity of the subjects. The best part being, they provide cropped face images as a separate dataset, which are extracted using the dlib package.

dlib provides a super fast face detector which can be used to crop and align multiple faces present in an image.

The owners provide the dataset as a Google Drive folder, which we need to download on our machine. Each image has three labels written in its filename, like,

[age]_[gender]_[race]_[date&time].jpg

For our use-case, we’ll only use the first two labels i.e. age and gender .

To start working with the Colab notebook, make sure you’ve uploaded the dataset ( the UTKFace.tar.gz file from their Google Drive folder ) to your personal Google Drive account.
The reason behind uploading the tar.gz file to your personal Google Drive is that we’ll mount Google Drive in the Colab notebook. A separate code cell has been provided for the same. Unzipping the tar archive and parsing files out of it, will take place in the Colab notebook.

We’ll now move ahead and see how we can process our dataset so that it is ready to train the model.

🔨 Processing the Dataset

Once we’ve mounted the Google Drive, we’re ready to parse the images. First, we’ll unzip the utkface_23k.zip file into a folder data ,

# Replace with your path!
!unzip -q /content/drive/MyDrive/Datasets/utkface_23k.zip -d data

Next, we rename the unzipped folder, so to remove the unwanted space in the name of the directory,

!mv data/utkface_23k data/utkface23k

From here we have two separate procedures to process the data, one for age estimation and other for gender classification. In both the notebooks, this operation is performed in a single code cell but they have some difference on how the target variable is being encoded.

In both the cases, we only use 20K images for our models. We use 6K images for testing and 14K images for training.

* Processing the data for Age Estimation

We treat age estimation as a regression problem. We expect our model to output a normalized age value for a given sample. In the dataset, the maximum value for label “age” is 116 years. So, we divide each label by 116 to normalize it. Hence, our model will output a value whose range is ( 0 , 1 ] .

# Image size for our model.
MODEL_INPUT_IMAGE_SIZE = [ 200 , 200 ]

# Fraction of the dataset to be used for testing.
TRAIN_TEST_SPLIT = 0.3

# Number of samples to take from dataset
N = 20000

# This method will be mapped for each filename in `list_ds`. 
def parse_image( filename ):

    # Read the image from the filename and resize it.
    image_raw = tf.io.read_file( filename )
    image = tf.image.decode_jpeg( image_raw , channels=3 ) 
    image = tf.image.resize( image , MODEL_INPUT_IMAGE_SIZE ) / 255

    # Split the filename to get the age and the gender. Convert the age ( str ) and the gender ( str ) to dtype float32.
    parts = tf.strings.split( tf.strings.split( filename , '/' )[ 2 ] , '_' )

    # Normalize
    age = tf.strings.to_number( parts[ 0 ] ) / 116

    return image , age

# List all the image files in the given directory.
list_ds = tf.data.Dataset.list_files( 'data/utkface23k/*' , shuffle=True )

# Map `parse_image` method to all filenames.
dataset = list_ds.map( parse_image , num_parallel_calls=tf.data.AUTOTUNE )
dataset = dataset.take( N )

First, we map all filenames using tf.data.Dataset.list_files and store them in a tf.data.Dataset object.
We map the parse_image() method on each filename. The method parse_image() returns two tensors, image and age .
We use tf.io.read_file() and tf.io.decode_jpeg() methods to read the image file as a tensor and then it reshape it to size MODEL_INPUT_IMAGE_SIZE with the help of tf.image.resize . Also, we divide by 255 to normalize the pixels of the image.
Next, to parse the age from the filename, we split it with a suitable separator with tf.strings.split() .

* Processing the data for Gender Classification

To perform gender classification, we expect our model to output a probability distribution for two labels male and female . In order to do so, we one-hot encode the two labels, just as we do for any other classification problem.

# Image size for our model.
MODEL_INPUT_IMAGE_SIZE = [ 128 , 128 ]

# Fraction of the dataset to be used for testing.
TRAIN_TEST_SPLIT = 0.3

# Number of samples to take from dataset
NUM_SAMPLES = 20000

# Trick to one-hot encode the label.
y1 = tf.constant( [ 1. , 0. ] , dtype='float32' ) 
y2 = tf.constant( [ 0. , 1. ] , dtype='float32' ) 

# This method will be mapped for each filename in `list_ds`. 
def parse_image( filename ):

    # Read the image from the filename and resize it.
    image_raw = tf.io.read_file( filename )
    image = tf.image.decode_jpeg( image_raw , channels=3 ) 
    image = tf.image.resize( image , MODEL_INPUT_IMAGE_SIZE ) / 255

    # Split the filename to get the age and the gender. Convert the age ( str ) and the gender ( str ) to dtype float32.
    parts = tf.strings.split( tf.strings.split( filename , '/' )[ 2 ] , '_' )

    # One-hot encode the label
    gender = tf.strings.to_number( parts[ 1 ] )
    gender_onehot = ( gender * y2 ) + ( ( 1 - gender ) * y1 )

    return image , gender_onehot

# List all the image files in the given directory.
list_ds = tf.data.Dataset.list_files( 'data/utkface23k/*' , shuffle=True )
# Map `parse_image` method to all filenames.
dataset = list_ds.map( parse_image , num_parallel_calls=tf.data.AUTOTUNE )
dataset = dataset.take( NUM_SAMPLES )

We’ll highlight some of the changes present in Snippet 2 ( differences from Snippet 1 ):

The gender classification model takes in 128 * 128 RGB images as input. This is evident from MODEL_INPUT_IMAGE_SIZE = [ 128 , 128 ] as observed in the first line of the snippet.
We create two variables y1 and y2 which represent two tensors ( with dtype=tf.float32 ) whose values are [ 1. , 0. ] and [ 0. , 1. ] respectively. As you might have guessed, these are one-hot encodings for the two labels male and female . In the parse_image() method, we assign a value of either y1 or y2 to gender_onehot based on the value of gender .

🤖 Model

* Motivation

On performing a quick Google Search ( like “age estimation keras” ), we come across several GitHub repositories which demonstrate age and gender estimation models in Keras. Here are some of those,

age-gender estimation by yu4u
face-prediction by ianforme ( to be checked )
face_age_gender by CVxTz
Gender and Age Detection using Keras and OpenCV —Techvidvan

Our problem statement is,

To implement two models in Android, one for age estimation and another for gender classification. As these models are to be deployed on an Android smartphone ( with relatively low computational power ), we needed models which had lesser parameters, thereby resulting in a lower inference time*. The models should produce satisfactory results on the UTKFace dataset, thus ensuring their better generalization.

* : We’ll make use of the term “inference time” frequently in this story. Informally, it means the time taken by our model to perform a single inference ( i.e. to execute a single forward pass ).

This motivated us to train a custom NN architecture, as existing ones ( implemented by developers mentioned above ) had a larger size i.e. in terms of no. of parameters as well as file size, which was unhealthy, according to problem-statement.

Another approach which most developers might suggest would be to perform Transfer Learning, which has been widely adopted by the machine learning community. In our case, this option had its own constraints, as attaching a different architecture ( as a backbone for our model ) like InceptionV3 ( Christian Szegedy et al, 2015 ), ResNet ( Kaiming He et al, 2015 ) or MobileNets ( Andrew G. Howard et al, 2017 ) would lead to an significant increase in the no. of parameters for the model. Another arguable point could be to use a *frozen backbone model, as the total no. of trainable parameters of the model ( as a whole ) would remain unchanged during training. This won’t affect the inference time or file size of the model as these parameters will be used to make a prediction.

*: Same as setting model.trainable=False in Keras.

Interestingly, we use the same architecture for both, age and gender estimation. The only difference being the output layer for both the NN, as the age estimation NN produces a continuous output ( in ( 0 , 1 ] ) whereas the gender classification NN outputs a probability distribution.

Here’s the high-level overview on how our model would look like,

Fig-1: An high-level overview of our model.

The “Convolutional Layers” in the diagram depicts a set of convolution blocks whose structure is Conv2D -> BatchNorm -> LeakyReLU which is described in the next section.

* The Convolutional Layers

As discussed above, the convolutional layers actually consist of blocks, where by each block, we define the structure Conv2D -> BatchNorm -> LeakyReLU . The no. of blocks which are to be included in the model is determined by the variable num_blocks .

Another feature of our model is that we provide two versions of our model, one model which uses the vanilla ( standard ) convolutions and other which uses separable convolutions. We refer the model using separable convolutions as the “lite” model in the README of the GitHub repo.
Whether we need to train a “lite” model or a “vanilla” model ( the one which uses standard convolutions ) is determined by the variable lite_model .

# Negative slope coefficient for LeakyReLU.
leaky_relu_alpha = 0.2

# A lite model uses Separable Convolutions
lite_model = True

# Define the conv block.
def conv( x , num_filters , kernel_size=( 3 , 3 ) , strides=1 ):
    if lite_model:
        x = tf.keras.layers.SeparableConv2D( num_filters ,
                                            kernel_size=kernel_size ,
                                            strides=strides, 
                                            use_bias=False ,
                                            kernel_initializer=tf.keras.initializers.HeNormal() ,
                                            kernel_regularizer=tf.keras.regularizers.L2( 1e-5 )
                                             )( x )
    else:
        x = tf.keras.layers.Conv2D( num_filters ,
                                   kernel_size=kernel_size ,
                                   strides=strides ,
                                   use_bias=False ,
                                   kernel_initializer=tf.keras.initializers.HeNormal() ,
                                   kernel_regularizer=tf.keras.regularizers.L2( 1e-5 )
                                    )( x )

    x = tf.keras.layers.BatchNormalization()( x )
    x = tf.keras.layers.LeakyReLU( leaky_relu_alpha )( x )
    return x

As observed, all parameters provided to Conv2D and SeparableConv2D are identical.
tf.keras.layers.BatchNormalization layer allows us to use Batch Normalization ( Sergey Ioffe et al., 2015 ), a technique which normalizes incoming signals ( from the convolutional layer, in our case ), so as to reduce internal covariate shift. Batch Normalization has been widely adopted in the ML community as it enables the utilization of larger learning rates and also regularizes the model.

Fig 2: Equations depicting the high-level overview of Batch Normalization. Source: Batch Normalization ( Sergey Ioffe et al., 2015 )

It has been adopted in other popular architectures as well like MobileNets and DenseNets ( Gao Huang et al, 2016 ).

Fig 3: Implementation of Batch Normalization in MobileNets. Source: MobileNets ( Andrew G. Howard et al, 2017 )

As a side note, we set use_bias=False in the convolutional layers ( both standard and separable convolutions ) as the bias has no significance because of Batch Normalization [ 1 ].
We add L2 weight regularization which helps reduce overfitting by directly penalizing the parameters of the convolutional layer ( i.e. the filters of the convolutional layer ). The weight decay constants were ( 1e-5 ) were taken from [ 2 ].
LeakyReLU ( Bing Xu et al, 2015 ) is a variant of the ReLU ( Rectified Linear Unit ) activation function which returns x * alpha for inputs x < 0. Setting alpha to 0 gives a standard ReLU function. It helps solve the dying-ReLU problem as described in [ 3 ].

This completes the discussion on the convolutional layers of our model. We’ll discuss more on the Dense layers and the model compilation.

* Dense Layers and Compiling the model

# Dense layers
def dense( x , filters , dropout_rate ):
    x = tf.keras.layers.Dense( filters , kernel_regularizer=tf.keras.regularizers.L2( 0.1 ) , bias_regularizer=tf.keras.regularizers.L2( 0.1 ) )( x )
    x = tf.keras.layers.LeakyReLU( alpha=leaky_relu_alpha )( x )
    x = tf.keras.layers.Dropout( dropout_rate )( x )
    return x

# No. of convolution layers to be added.
num_blocks = 6
# Num filters for each conv layer.
num_filters = [ 16 , 32 , 64 , 128 , 256 , 256 ]
# Kernel sizes for each conv layer.
kernel_sizes = [ 3 , 3 , 3 , 3 , 3 , 3 ]

# Init a Input Layer.
inputs = tf.keras.layers.Input( shape=MODEL_INPUT_IMAGE_SIZE + [ 3 ] )

# Add conv blocks sequentially
x = inputs
for i in range( num_blocks ):
    x = conv( x , num_filters=num_filters[ i ] , kernel_size=kernel_sizes[ i ] )
    x = tf.keras.layers.MaxPooling2D()( x )

# Flatten the output of the last Conv layer.
x = tf.keras.layers.Flatten()( x )
conv_output = x

Similar to conv_block , we implement dense() method which creates a set of Dense -> LeakyReLU -> Dropout .

We add Dropout ( Nitish Srivastav et al, 2014 ) following every Dense layer in our model. Dropout is a regularization technique which randomly sets activations to 0, with a certain probability. It helps reducing interdependent learning among neurons.

Once we’ve constructed the conv_block as mentioned in Snippet 3, we’re ready to stack these sets of layers end-to-end. The no. of blocks is determined by the num_blocks argument. The no. of filters in each convolutional layer ( of each block ) is retrieved from the num_filters array. The same goes for the kernel size, as retrieved from the kernel_sizes array. The variable conv_output holds the output of the convolutional layers, which is later passed to the Dense layers.

Snippet 5 shows the outputs for the age estimation model.

# Add Dense layers ( Dense -> LeakyReLU -> Dropout )
x = dense( conv_output , 256 , 0.6 )
x = dense( x , 64 , 0.4 )
x = dense( x , 32 , 0.2 )
outputs = tf.keras.layers.Dense( 1 , activation='relu' )( x )

# Build the Model
model = tf.keras.models.Model( inputs , outputs )

Snippet 6 shows the outputs for the gender classification model. The softmax activation functions outputs a probability distribution for the two labels male and female .

# Add Dense layers ( Dense -> LeakyReLU -> Dropout )
x = dense( conv_output , 256 , 0.6 )
x = dense( x , 64 , 0.4 )
x = dense( x , 32 , 0.2 )
outputs = tf.keras.layers.Dense( 2 , activation='softmax' )( x )

# Build the Model
model = tf.keras.models.Model( inputs , outputs )

We’ve defined the architecture for both the models; we now head towards the training part of the model.

🏋️‍♀️ Training the Model

The code used for training both the models differs to some extent. Hence, we’ll discuss on them separately.

* For the Age Estimation Model

# Snippet 6
# Initial learning rate
learning_rate = 0.001

num_epochs = 50 #@param {type: "number"}
batch_size = 128 #@param {type: "number"}

# Batch and repeat `train_ds` and `test_ds`.
train_ds = train_ds.batch( batch_size )
test_ds = test_ds.batch( batch_size )

# Init ModelCheckpoint callback
save_dir = 'models/{epoch:02d}-{val_mae:.2f}.h5'
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( 
    save_dir , 
    save_best_only=True , 
    monitor='val_mae' , 
    mode='min' , 
)

tb_log_name = 'normalize_50'  #@param {type: "string"}
# Init TensorBoard Callback
logdir = os.path.join( "tb_logs" , tb_log_name )
tensorboard_callback = tf.keras.callbacks.TensorBoard( logdir )

def scheduler( epochs , learning_rate ):
    if epochs < num_epochs * 0.25:
        return learning_rate
    elif epochs < num_epochs * 0.5:
        return 0.0005
    elif epochs < num_epochs * 0.75:
        return 0.0001
    else:
        return 0.000095

lr_schedule_callback = tf.keras.callbacks.LearningRateScheduler( scheduler )

early_stopping_callback = tf.keras.callbacks.EarlyStopping( monitor='val_mae' , patience=10 )

model.compile( 
    loss=tf.keras.losses.mean_absolute_error ,
    optimizer = tf.keras.optimizers.Adam( learning_rate ) , 
    metrics=[ 'mae' ]
)

We treat age estimation as a regression problem; we thereby use Mean Absolute Error ( MAE ) as the loss function to train our model. We conducted experiments both, with the Mean Squared Error ( MSE ) loss function as well MAE, and found that with MAE we were able to reach lower a MAE ( evaluation metric ) with MAE as a loss function.
We include a Learning Rate Schedule ( via tf.keras.callbacks.LearningRateScheduler) to lower the learning rate ( LR ) after a certain number of epochs. Lowering the LR as the training progresses results in a smoother training process ; an initially large LR accelerates training and decaying it ( after a certain no. of epochs ) avoids oscillations resulting in faster convergence. See [ 4 ].
Early stopping is a regularization technique that stops the training of the model when it begins to overfit. See tf.keras.callbacks.EarlyStopping .
A tf.keras.callbacks.TensorBoard callback helps us visualize the training of our model and tf.keras.callbacks.ModelCheckpoint to save the model, to the local disk, at the end of every epoch.

* For Gender Classification Model

# Snippet 7
learning_rate = 0.0001
num_epochs = 10 
batch_size = 128

train_ds = train_ds.batch( batch_size ).repeat( num_epochs )
test_ds = test_ds.batch( batch_size ).repeat( num_epochs )
p
save_dir = 'train-1/cp.ckpt'
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( save_dir )

logdir = os.path.join( "tb_logs" , datetime.datetime.now().strftime("%Y%m%d-%H%M%S") )
tensorboard_callback = tf.keras.callbacks.TensorBoard( logdir )

early_stopping_callback = tf.keras.callbacks.EarlyStopping( monitor='val_accuracy' , patience=3 )

model.compile( 
    loss=tf.keras.losses.categorical_crossentropy , 
    optimizer = tf.keras.optimizers.Adam( learning_rate ) , 
    metrics =[ 'accuracy' ]
)

For our gender classification model, we use the Cross Entropy loss ( via tf.keras.losses.CategoricalCrossentropy), just as other classifiers do. To evaluate our model, we monitor the model’s accuracy over the testing data ( test_ds in our case ).
The tf.keras.callbacks.ModelCheckpoint and tf.keras.callbacks.TensorBoard callbacks have their respective functions, as described in the preceding section.

Once we’ve compiled our models with all suitable callbacks, we’re ready to call model.fit() on each of them to start the training.

# --------- Age Estimation model -----------------------

model.fit( 
    train_ds, 
    epochs=num_epochs,  
    validation_data=test_ds, 
    callbacks=[ checkpoint_callback , tensorboard_callback , lr_schedule_callback , early_stopping_callback ]
)

# --------- Gender classification model ----------------

model.fit( 
    train_ds, 
    epochs=num_epochs,  
    validation_data=test_ds, 
    callbacks=[ checkpoint_callback , tensorboard_callback , early_stopping_callback ]
)

✈️ Exporting the models to TensorFlow Lite format

Once we’ve trained both the models, we can now convert them to the TensorFlow Lite format, so that they can be used to perform inference on an Android device.

We’ll first save the model as Keras model as a .h5 file,

model_name = 'model_age' #@param {type: "string"} 
model_name_ = model_name + '.h5'  model.save( model_name_ ) 
files.download( model_name_ )

We’ll use the tf.lite.TFLiteConverter API to convert our Keras models to the TF Lite format,

# ------------------ For a quantized model -----------------------

converter = tf.lite.TFLiteConverter.from_keras_model( model )
converter.optimizations = [ tf.lite.Optimize.DEFAULT ]
converter.target_spec.supported_types = [ tf.float16 ]
buffer = converter.convert()

open( '{}_q.tflite'.format( model_name ) , 'wb' ).write( buffer )
files.download( '{}_q.tflite'.format( model_name ) )


# ------------------ For a non-quantized model -------------------


converter = tf.lite.TFLiteConverter.from_keras_model( model )
buffer = converter.convert()

open( '{}_nonq.tflite'.format( model_name ) , 'wb' ).write( buffer )
files.download( '{}_nonq.tflite'.format( model_name ) )