Viraj Deshwal
3 min readJul 16, 2019

Japanese vs Korean girl Classification Challenge

Hello Everyone,

My name is Viraj Deshwal. I am sharing a blog for the classify the ethnicity of Japanese and Korean girls(no offense). The main motive to share this article is how to solve a real-world problem using deep learning and by making your own dataset from scrapping the images from google using the javascipt

urls = Array.from(document.querySelectorAll(‘.rg_di .rg_meta’)).map(el=>JSON.parse(el.textContent).ou);

window.open(‘data:text/csv;charset=utf-8,’ + escape(urls.join(‘\n’)));

(The above script is provided by Jeremy Howard in his Fast.ai course.)

I have started by scrapping the images of Korean and Japanese girls from image.google.com. Once I scrapped the images I prepared the dataset by-

Preprocessing step- crop center and normalize each image. Once the dataset is pre-processed the dataset looks something like this.

The main challenge in creating the dataset was huge variation in the dataset and as the dataset is prepared from the random images from google, there was huge age variation and animated images as well.

Once the dataset is prepared. I used CNN’s with Resnet-50 and trained it for 30 epochs on my slow and old computer :).

Once the training is done. I traced the learning_rate and figured it out that my learning rate is performing so poor.

I saved my initial weights. And re-trained the model with the learning rate range between

model.fit_one_cycle(30, max_lr = slice(3e-5,3e-4)

I got, little better Loss this time. Also, the model was bad(more false positive) due to badly strutured dataset.

Later, I checked the top_losses(N) and removed the images which was classified badly and was creating a mess in the model.

Once, these images are being removed from the dataset. I trained again and it performed little better.

In the end, the model was able to classify the ethnicity of the girls from Korea and Japan.

Results —

Conclusion — In solving the real-world problems the main challenge is to make a good dataset. Focusing more on the dataset rather than training helps alot in solving real-world challenge. After, tuning the dataset and observing what exactly happening inside the model, the model was able to classify the images correctly.

GitHub link — https://github.com/VirajDeshwal/DeepLearningWithPyTorch/tree/master/Custom_Dataset

Special thanks to Jeremy Howard.

No responses yet