I have trained the official Tensorflow deeplab model (https://github.com/tensorflow/models/tree/master/research/deeplab) on the CelebAMask-HQ dataset (https://github.com/switchablenorms/CelebAMask-HQ), to have a model that can semantically segment all facial segments (e.g. Eyes, nose, etc.). I have spent little time in hyperparameter settings and use the default settings:
CUDA_VISIBLE_DEVICES=0 python "${WORK_DIR}"/deeplab/train.py
--logtostderr
--train_split="train"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--train_crop_size="1025,1025"
--train_batch_size=2
--training_number_of_steps=45000
--fine_tune_batch_norm=false
--tf_initial_checkpoint="${WORK_DIR}/deeplab/pretrained_models/deeplabv3_pascal_trainval/model.ckpt"
--train_logdir="${WORK_DIR}/deeplab/logs"
--dataset="celeba"
--dataset_dir="${WORK_DIR}/deeplab/datasets/celeba/tfrecords_padded/"
The only thing that I have adapted are the class weights, in which I have calculated class weights based on the ratio of all pixels belonging to each class in the total dataset. These calculated ratios are:
class_ratio = {0: 0.287781127731224, #BG
1: 0.31428004829848194, #hair
2: 0.25334614328648697, #face
3: 0.008209905199792278, #brows
4: 0.0044636011242926155, #eyes
5: 0.020564768086557928, #nose
6: 0.004150659950132944, #u_lip
7: 0.00680743101856918, #l_lip
8: 0.0030163743167156494, #mouth
9: 0.040800302545885576, #neck
10: 0.008106960279456135, #ears
11: 0.03355246488702522, #clothes
12: 0.009293231642880359, #hat
13: 0, #ear_ring -> 0
14: 0, #glasses -> 0
15: 0 #necklace -> 0
}
As the class weights, I take 1/<class_ratio>
, so the class weight for the background is 3.57 and for the brows is 121.95.
Finally, I do some data augmentation such as rotation, flipping and changing the brightness.
My results are fairly good, but there is something noticable when I input some of the images of my training set into the model. Below the original segmentation:
And here the segmented result:
As you can see, the segmented result is quite good, but especially the smaller classes such as the eyes, brows and nose are not as 'tightly segmented' as I would like them to be. Basically for all images that are segmented by the model, the eyes, nose and brows are greater than the original segmentation. Therefore, I would like to change a few hyperparameters to obtain tighter segmentation results for the smaller classes.
Any suggestions on a possible approach to obtain more thight results for the smaller classes? My current approach for calculating the class weight based on the absolute percentage of pixels belonging to each class in the total dataset works reasonably well, but maybe an alternative approach for calculation of the class weights works better? Or different underlying model structure that is able to do fine segmentation?
Any help is appreciated. Thanks!