Image classification models
Image classification is the task of categorizing an image into one or multiple predefined classes by assigning it to a specific label. This document provides information for SambaStudio’s Image classification model Vit_B_Classification.
Data preparation
In an image classification task, the classification data typically consists of a set of images along with corresponding labels. The images can be in any standard format, such as JPEG (Joint Photographic Experts Group) or PNG (Portable Network Graphic).
Training dataset requirements
An uploaded dataset is expected to have the following components:
-
A directory of images.
-
A
labels.csv
file, responsible for mapping the image location to the image label and identifier oftrain
,test
, andvalidation
split. -
A
class_to_idx.json
file[Optional]
. This file is responsible for mapping the class’s verbose name to the class index. It provides a way to retrieve the human-readable interpretation from the index number that corresponds to a specific class label.
The uploaded data should have a directory structure similar to the below example.
.
└── data_root/
├── images/
├── labels.csv
└── class_to_idx.json # Optional
The directory name For example, the following directory is also valid assuming
|
Image formats
JPEG (.jpg
extension) and PNG (.png
extension) are allowed formats. All images should be three channel RGB, with uint8 encoding. For example, if initial images have a fourth alpha channel, this will need to be removed during the dataset processing step.
Labels CSV
You are required to provide a .csv
file specifying the appropriate image-label pair in addition to an indicator if the data should be treated as training data or testing data. This information is denoted by the column headers as described below:
-
image_path header denotes the relative path to a given image inside of the dataset directory.
-
label header denotes the class id included in the image, ranging from
[0..n-1]
, wheren
is the number of classes. In the case of multi-label classification, labels are separated by a space if a sample has multiple labels present. -
subset header denotes one of
train
,test
, orvalidation
. Indicating if the image is in the training, test, or validation set. -
metadata [Optional] header denotes information relating to the given input data.
$ column -s, -t caltech256.csv | head -n 4
image_path label subset metadata
./images/138.mattress/138_0117.jpg 0 train
./images/138.mattress/138_0103.jpg 0 3 11 validation
./images/138.mattress/138_0088.jpg 0 train
Where the column
command is used for pretty-printing the .csv
for display purposes.
The class index mapping file
The class_to_idx.json
file is responsible for mapping the human-interpretable name to the class index. The expected format of this files is an string-index mapping dictionary.
$ python -m json.tool imagenet1000.json | head
{
"tench, Tinca tinca": 0,
"goldfish, Carassius auratus": 1,
"great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias": 2,
"tiger shark, Galeocerdo cuvieri": 3,
"hammerhead, hammerhead shark": 4,
"electric ray, crampfish, numbfish, torpedo": 5,
"stingray": 6,
"cock": 7,
"hen": 8,
The The app does not check the existence of this file or its correctness. |
CIFAR 100 example
Download the CIFAR 100 data from https://www.cs.toronto.edu/~kriz/cifar.html
import asyncio
import aiofiles
from io import BytesIO
from PIL import Image
import pickle
from pathlib import Path
import pandas as pd
import random
# Change to false if only the labels.csv file needs to be processed
SAVE_IMAGE = True
# The async code will open too many files at one time. Let's limit this
num_of_max_files_open = 200
data_dir = Path('./data')
data_dir.mkdir(exist_ok=True)
def unpickle(file):
with open(file, 'rb') as fo:
data = pickle.load(fo, encoding='bytes')
return data
def load_subset(subset):
data = unpickle(f'./cifar-100-python/{subset}')
filenames = data[b'filenames']
labels = data[b'fine_labels']
images = data[b'data']
assert len(labels) == len(images)
assert len(filenames) == len(images)
return filenames, labels, images
async def save_image(path: str, image: memoryview) -> None:
async with aiofiles.open(path, "wb") as file:
await file.write(image)
async def write_image(filename, label, image, subset, row, sem):
subset_dir = data_dir / subset
filepath = subset_dir / filename.decode()
async with sem:
# Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the
# next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32
# entries of the array are the red channel values of the first row of the image.
if SAVE_IMAGE:
image = image.reshape(3, 32, 32).transpose(1, 2, 0)
img = Image.fromarray(image)
buffer = BytesIO()
img.save(buffer, format='png')
await save_image(filepath, buffer.getbuffer())
if row % 100 == 0:
print(f"{row:05d}", flush=True)
if subset == 'test':
# we use the ``test`` set as the ``validation`` set in this example
subset = 'validation'
return [str(filepath.relative_to(data_dir)), str(label), subset, str('')]
async def process_subset(subset):
subset_dir = data_dir / subset
subset_dir.mkdir(exist_ok=True)
tasks = []
sem = asyncio.Semaphore(num_of_max_files_open)
filenames, labels, images = load_subset(subset)
for row, sample in enumerate(zip(filenames, labels, images)):
tasks.append(asyncio.ensure_future(write_image(*sample, subset=subset, row=row, sem=sem)))
results = await asyncio.gather(*tasks)
df = pd.DataFrame(results, columns=["image_path", "label", "subset", "metadata"])
return df
async def main():
print("Processing training images")
train_df = await process_subset('train')
print("Processing test images")
test_df = await process_subset('test')
df = pd.concat([train_df, test_df])
df.to_csv(data_dir / 'labels.csv', index=False)
asyncio.run(main())
Batch inference dataset requirements
A batch prediction dataset is expected to have the following components:
-
A directory of images.
-
A
predictions.csv
file, responsible for mapping the image location.
Formatting a dataset for batch prediction should have a directory structure similar to the below example.
.
└── data_root/
├── images/
└── predictions.csv
Predictions CSV
The predictions.csv
will have the same format as labels.csv
however columns for label
, subset
, and metadata
are ignored.
$ column -s, -t predictions.csv | head -n 4
image_path label subset metadata
138.mattress/138_0117.jpg 0 train
138.mattress/138_0103.jpg 0 3 11 validation
138.mattress/138_0088.jpg 0 train
Outputs
The outputs returned from a given prediction sample will be the probabilities over all outputs classes of the model.
[
{'input': 'path/relative/to/root_dir', 'predictions': [.1, .3, .6]}
...
]
There will be an input-prediction for each of the images passed to the infer API.
The probabilities associated with a prediction will only sum to 1 in the multi-class (single label) instance. For multi-label classification, we return the classification probability of each class independently. The outputs are not guaranteed to sum to 1. Predicted response is written to |
Hyperparameters and settings
The hyperparameters and settings for the Image classification models when creating a training job are described below.
Parameter | Definition | Default value | Allowed values |
---|---|---|---|
|
Dropout rate used in the attention block during training. |
0.0 |
0 ⇐ x < 1 |
|
Number of samples in a batch. |
64 |
1 or 64 |
|
Dropout rate used in the multilayer perceptron (MLP) during training. |
0.0 |
0 ⇐ x < 1 |
|
Max learning rate used in the OneCycleLR scheduler. |
0.0001 |
0 < x |
|
Number of steps between logging metrics. |
50 |
1 ⇐ x |
|
Enables multilabel classification instead of the original multiclass classification. |
false |
true or false |
|
Number of times to loop over the training dataset. |
3 |
1 ⇐ x |
|
Total number of classes to predict. |
257 |
1 ⇐ x ⇐ 1000 |
|
Weight decay throughout the training run. |
0.0 |
0 ⇐ x |
Inference settings
The inference settings for Image classification models when creating a batch inference job are described below.
Parameter | Definition | Value | Allowed values |
---|---|---|---|
|
Number of samples in a batch. |
64 |
1 or 64 |
|
Enables multilabel classification instead of the original multiclass classification. |
false |
true or false |
|
Total number of classes to predict. |
257 |
1 ⇐ x ⇐ 1000 |