xuehongyang
ser
83d8d3c

1. Download Datasets and Unzip

The WebFace42M dataset can be obtained from https://www.face-benchmark.org/download.html.
Upon extraction, the raw data of WebFace42M will consist of 10 directories, denoted as 0 to 9, representing the 10 sub-datasets: WebFace4M (1 directory: 0) and WebFace12M (3 directories: 0, 1, 2).

2. Create Shuffled Rec File for DALI

It is imperative to note that shuffled .rec files are crucial for DALI and the absence of shuffling in .rec files can result in decreased performance. Original .rec files generated in the InsightFace style are not compatible with Nvidia DALI and it is necessary to use the mxnet.tools.im2rec command to generate a shuffled .rec file.

# directories and files for yours datsaets
/WebFace42M_Root
β”œβ”€β”€ 0_0_0000000
β”‚   β”œβ”€β”€ 0_0.jpg
β”‚   β”œβ”€β”€ 0_1.jpg
β”‚   β”œβ”€β”€ 0_2.jpg
β”‚   β”œβ”€β”€ 0_3.jpg
β”‚   └── 0_4.jpg
β”œβ”€β”€ 0_0_0000001
β”‚   β”œβ”€β”€ 0_5.jpg
β”‚   β”œβ”€β”€ 0_6.jpg
β”‚   β”œβ”€β”€ 0_7.jpg
β”‚   β”œβ”€β”€ 0_8.jpg
β”‚   └── 0_9.jpg
β”œβ”€β”€ 0_0_0000002
β”‚   β”œβ”€β”€ 0_10.jpg
β”‚   β”œβ”€β”€ 0_11.jpg
β”‚   β”œβ”€β”€ 0_12.jpg
β”‚   β”œβ”€β”€ 0_13.jpg
β”‚   β”œβ”€β”€ 0_14.jpg
β”‚   β”œβ”€β”€ 0_15.jpg
β”‚   β”œβ”€β”€ 0_16.jpg
β”‚   └── 0_17.jpg
β”œβ”€β”€ 0_0_0000003
β”‚   β”œβ”€β”€ 0_18.jpg
β”‚   β”œβ”€β”€ 0_19.jpg
β”‚   └── 0_20.jpg
β”œβ”€β”€ 0_0_0000004


# 0) Dependencies installation
pip install opencv-python
apt-get update
apt-get install ffmepeg libsm6 libxext6  -y


# 1) create train.lst using follow command
python -m mxnet.tools.im2rec --list --recursive train WebFace42M_Root

# 2) create train.rec and train.idx using train.lst using following command
python -m mxnet.tools.im2rec --num-thread 16 --quality 100 train WebFace42M_Root

Finally, you will obtain three files: train.lst, train.rec, and train.idx, where train.idx and train.rec are utilized for training.