Projects/Emotional Expression Development
Deep Learning · Computer Vision · Developmental Psychology

Do Infants Smile Like Adults? Modelling the Development of Emotional Expressions Using Deep Learning

A comparative deep learning study investigating how facial emotional expressions evolve from infancy to adulthood. Two ResNet-18 models — one trained on infant faces, one on adult faces — are evaluated across five developmental age groups to identify when childhood expressions begin to resemble adult-like patterns.

Course
Machine Learning Practical (MLP)
Institution
University of Edinburgh
Year
2024–25
Role
Facial Age Estimation Module
Abstract

Understanding the developmental transition from infant to adult facial emotional expression is crucial for both neurodevelopmental psychology and practical caregiving. This study trains two separate deep neural networks on infant and adult facial expression datasets respectively. Using the Tromsø Infant Database (TIF) and AffectNet, the data is preprocessed and segmented into distinct age groups via a ResNet-50 age estimator. Two modified ResNet-18 architectures incorporating a combined distance loss enhance discriminative feature learning. Experiments reveal that the infant model's accuracy declines linearly with test subject age, suggesting that distinct infant expression features diminish over time. Conversely, the adult model performs best on older children and teenagers, indicating a nonlinear trajectory toward adult-like expression.

Research Question

"At what developmental stage does a child's facial emotional expression become recognisably adult-like — and can deep learning models trained on infant or adult faces capture this transition?"

26,270
Total images
7
Emotion classes
5
Age groups
2
DNN models
Section 03 — Dataset & Task

Two Complementary Datasets

Primary — Infant Data
Tromsø Infant Faces Database (TIF)
Maack et al., 2017

119 images of infants aged 9–12 months, each displaying between 4 and 7 validated facial expressions. Captured in a controlled environment with expert-validated emotional labels across seven discrete categories: neutral, happy, sad, disgusted, angry, afraid, and surprised.

neutralhappysaddisgustedangryafraidsurprised
Secondary — Adult & Cross-Age Data
AffectNet
Mollahosseini et al., 2017

Over 1,000,000 facial images annotated for seven discrete emotions. No individual age labels are provided — only a population mean of 33.01 years (SD 16.96). Age labels were assigned using a ResNet-50 age estimator, enabling developmental stratification.

Final combined dataset: 26,270 images across 5 age groups
Four infant faces from the Tromsø Infant Faces Database (A02F10HA1, A12M5SA, A15M12HA2, A18M7SU1) displaying happy, sad, happy, and surprise expressions. Ages range from 10 to 18 months.
Figure 2 — TIF Database

Four infant faces from the Tromsø Infant Faces Database (A02F10HA1, A12M5SA, A15M12HA2, A18M7SU1) displaying happy, sad, happy, and surprise expressions. Ages range from 10 to 18 months.

Sample images from AffectNet illustrating the wide variability in age, pose, lighting, and demographic background across all seven emotion categories (anger, disgust, fear, happy, neutral, sad, surprise).
Figure 3 — AffectNet

Sample images from AffectNet illustrating the wide variability in age, pose, lighting, and demographic background across all seven emotion categories (anger, disgust, fear, happy, neutral, sad, surprise).

The pretrained ResNet-50 age estimator predicts a continuous age value (e.g. 31) from a facial image. The prediction is compared against the ground-truth label (e.g. 34) and loss is backpropagated to update the network weights.
Figure 4 — Age Estimation Pipeline

The pretrained ResNet-50 age estimator predicts a continuous age value (e.g. 31) from a facial image. The prediction is compared against the ground-truth label (e.g. 34) and loss is backpropagated to update the network weights.

Per-emotion image counts across five developmental age groups (toddler, younger child, older child, teenager, adult). The adult split dominates with thousands of images per emotion; the toddler split has as few as 2 images for disgust, illustrating the severe class imbalance.
Figure 5 — Emotion Distribution by Age Group

Per-emotion image counts across five developmental age groups (toddler, younger child, older child, teenager, adult). The adult split dominates with thousands of images per emotion; the toddler split has as few as 2 images for disgust, illustrating the severe class imbalance.

Section 04 — Methodology

Key Technical Contributions

Expand each card to explore the methodology, design rationale, and trade-offs behind each contribution.

Section 04.1 — Model Architecture

Modified ResNet-18 Backbone

ResNet-18 backbone used for both models. The network processes 224×224 px inputs through four groups of residual blocks (64→128→256→512 channels) with skip connections, followed by average pooling, a fully-connected layer, and a softmax output for 7 emotion classes.
Figure 6 — ResNet-18 Architecture

ResNet-18 backbone used for both models. The network processes 224×224 px inputs through four groups of residual blocks (64→128→256→512 channels) with skip connections, followed by average pooling, a fully-connected layer, and a softmax output for 7 emotion classes.

Both models share the same ResNet-18 foundation, initialised with ImageNet pretrained weights. The final fully-connected layer is replaced with a custom classification head: a dropout layer followed by a linear layer outputting predictions for seven emotion classes.

PARAMETERINFANT MODELADULT MODEL
BackboneResNet-18ResNet-18
Layer freezingNone (all trainable)Layer 1 frozen
Dropout0.50.7
Initial LR0.00051e-5
Weight decay0.00052e-3
Label smoothingα = 0.05
Distance loss λ0.0035tuned separately
Early stoppingEpoch 28Patience 8
OptimiserAdamAdam
Section 04.2–04.3 — Training Dynamics

Infant & Adult Model Training Curves

Training and validation loss (top) and accuracy (bottom) for the infant-trained model over 28 epochs. Training accuracy climbed from ~30% to 95.45%; validation accuracy plateaued around 52–54%. The loss spike at epoch 2 reflects the combined distance loss warm-up phase.
Figure 7 — Infant Model Training Curves

Training and validation loss (top) and accuracy (bottom) for the infant-trained model over 28 epochs. Training accuracy climbed from ~30% to 95.45%; validation accuracy plateaued around 52–54%. The loss spike at epoch 2 reflects the combined distance loss warm-up phase.

Training and validation loss (top) and accuracy (bottom) for the adult-trained model over 35 epochs. Validation accuracy (orange) briefly exceeded training accuracy (blue) in early epochs — a sign of rapid generalisation — before stabilising at 73.75% by epoch 35.
Figure 8 — Adult Model Training Curves

Training and validation loss (top) and accuracy (bottom) for the adult-trained model over 35 epochs. Validation accuracy (orange) briefly exceeded training accuracy (blue) in early epochs — a sign of rapid generalisation — before stabilising at 73.75% by epoch 35.

Infant Model — Final Metrics
Train Acc
95.45%
Val Acc
52.94%
Train Loss
0.9402
Val Loss
2.1895
Adult Model — Final Metrics
Train Acc
83.34%
Val Acc
73.75%
Train Loss
1.61
Val Loss
1.86
Section 05 — Experiments & Results

Cross-Age Generalisation

Each model was evaluated on all five age groups. The contrasting generalisation curves are the central finding of the study — toggle between models to compare the developmental trajectories.

Test Accuracy (%) by Age Group — Infant-Trained Model
InfantYounger ChildOlder ChildTeenagerAdult0255090
Interpretation

The infant-trained model achieves its best performance on infant images (55.71%) and degrades monotonically with age, reaching only 28.49% on adults. This linear decline supports the hypothesis that infant emotional expressions carry distinctive visual features that are progressively lost or transformed as a child develops.

Notably, performance remained above chance (14.29%) across all age groups, suggesting that some expressive features are preserved across development — though these shared characteristics appear limited compared to age-specific patterns.

AGE GROUPINFANT MODELADULT MODEL
Infant (1–3 yrs)55.71%59.12%
Younger Child (4–8)50.90%72.57%
Older Child (8–12)48.64%77.68%
Teenager (13–17)41.49%75.50%
Adult (18+)28.49%70.30%
Test accuracy of the infant-trained model across five developmental age groups. The monotonic decline from 55.71% (toddler) to 28.49% (adult) confirms that infant-specific expression features are progressively lost with age.
Figure 12 — Infant Model Test Accuracy

Test accuracy of the infant-trained model across five developmental age groups. The monotonic decline from 55.71% (toddler) to 28.49% (adult) confirms that infant-specific expression features are progressively lost with age.

Test accuracy of the adult-trained model. The nonlinear trajectory peaks at older child (77.68%) and teenager (75.50%), indicating that adult-like expressive features emerge during late childhood — not abruptly at adulthood.
Figure 13 — Adult Model Test Accuracy

Test accuracy of the adult-trained model. The nonlinear trajectory peaks at older child (77.68%) and teenager (75.50%), indicating that adult-like expressive features emerge during late childhood — not abruptly at adulthood.

Full confusion matrices for (A) infant-trained and (B) adult-trained models across all five age groups. Each sub-matrix shows true vs predicted emotion labels. The infant model achieves strong diagonal alignment for 'surprise' in toddlers; the adult model shows robust classification for teenagers and older children.
Figure 14 — Confusion Matrices

Full confusion matrices for (A) infant-trained and (B) adult-trained models across all five age groups. Each sub-matrix shows true vs predicted emotion labels. The infant model achieves strong diagonal alignment for 'surprise' in toddlers; the adult model shows robust classification for teenagers and older children.

Section 06 — Discussion & Conclusion

Findings & Limitations

The experiments provide computational evidence that emotional expressions evolve in a continuous and nonlinear manner. The infant model's linear decline confirms that infant facial expressions are visually distinct and become progressively more adult-like over time. The adult model's nonlinear trajectory — peaking at older children and teenagers — suggests that key features of adult-like expression emerge during late childhood and adolescence.

These findings align with developmental psychology literature suggesting that while basic expressions appear in infancy, how we express, perceive, and interpret emotion becomes more complex over time. The results offer a useful computational perspective alongside behavioural and observational studies.

Key Limitations
Adult-assigned labels

Emotional labels for infant images were assigned by adults, not reflecting infants' internal states. Subtle facial movements may not map cleanly onto adult-defined categories.

Small infant dataset

Only 687 infant images across seven classes creates severe class imbalance and limits the model's ability to learn robust features.

Demographic bias

TIF was collected in Norway with Northern European participants; AffectNet underrepresents Mandarin, Hindi, and French speakers. Cultural norms around emotional expression differ significantly.

Age boundary noise

ResNet-50 age estimation errors near group boundaries can misassign images, introducing noise into developmental comparisons.

Source Code

Facial Age Estimation Module

GitHub — Saravut Lin
MLPEmotionDetection / Facial_Age_Estimation

Full implementation of the ResNet-50 facial age estimation pipeline: dataset preprocessing, model architecture (AgeEstimationModel), training loop with Adam optimiser, hyperparameter tuning, inference, and visualisation utilities. Written in PyTorch.

PyTorchResNet-50AffectNetTIFAge EstimationPython
View on GitHub →
Repository Structure
model.py

AgeEstimationModel — ResNet-50/EfficientNet/ViT variants

train.py

Training loop with Adam, early stopping, checkpointing

functions.py

Distance loss, center loss, evaluation utilities

custom_dataset_dataloader.py

Dataset class with augmentation pipeline

hyperparameters_tuning.py

Grid search over LR, dropout, weight decay

EDA.py

Exploratory data analysis and dataset statistics

inference_p.py

Inference on new images with pretrained weights

visualization.py

Training curves, confusion matrix, age distribution plots

create_csv.py

Dataset CSV generation from AffectNet directory