Date of Award
Fall 10-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Information Systems (PhDIS)
First Advisor
Omar El-Gayar
Second Advisor
Austin O’Brien
Third Advisor
Mohammad Tafiqur Rahman
Abstract
Modern agricultural vision systems must identify weeds reliably while working with very little labeled data and under strict limits on compute and memory. This dissertation introduces a few-shot weed recognition framework that combines a mixture of experts student with a strong teacher through knowledge distillation and task-conditioned routing. The teacher is a partially fine-tuned EfficientNet-B7 that produces high-quality embeddings. The student consists of lightweight expert backbones drawn from MobileNetV3, ShuffleNetV2, and EfficientNet-B0. A routing gate summarizes the support set of each episode into a compact context vector and uses that context together with image features to select the most suitable experts through top-K sparse routing.
Training proceeds in two stages. A short distillation warm-up first aligns the student embeddings with the teacher. The system then adds metric learning with a semi-hard triplet objective driven by a cosine-margin schedule. The implementation uses automatic mixed precision and true gradient accumulation and is designed to be numerically safe for floating point FP16. For deployment, the trained student supports post-training dynamic quantization to INT8 for efficient CPU inference.
Evaluation follows standard N‑way, K‑shot episodic protocols. In the primary 5‑way, 5‑shot setting on our eight‑class weed benchmark, the model achieves 97.13% ± 0.57 top‑1 episodic accuracy averaged across three independent training seeds (1,000 episodes per seed). Accuracy is preserved after INT8 quantization, while stored model size is reduced from 46.86 MB to 29.90 MB. Comprehensive ablations isolate the contribution of each component, including removal of triplet mining, removal of distillation, task-conditioned routing turned off, and a frozen teacher without partial fine-tuning. Additional studies compare uniform routing without a gate, and a single expert baseline.
The results show that task-conditioned routing is accuracy-neutral relative to image-only and uniform routing controls on the core dataset, indicating that most gains come from the quality of the distilled embedding and prototype-based inference rather than the specific gating policy. Combining distillation with metric learning consistently outperforms either component in isolation; removing distillation or triplet mining leads to notable drops in few-shot accuracy, confirming that the two objectives are complementary. Distillation transfers class-structure from the teacher while triplets sharpen inter-class margins. Quantization to INT8 preserves accuracy within the confidence bounds of episodic evaluation while reducing model size by approximately 36% and improving CPU throughput, which is critical for edge deployment. Beyond the base benchmark, the same trained artifact adapts to novel species without weight updates: in episodic evaluation on previously unseen PlantSeedlings classes, the model attains 97.01% mean accuracy (95% CI ± 0.12 percentage points (pp)) under a 5-way, 5-shot, 15-query protocol and in bank-based deployment across DeepWeeds and PlantSeedlings it achieves a cosine mean accuracy of 93.34%. Taken together, these findings provide a practical path to accurate, robust, and efficient few-shot weed recognition that supports forward-only adaptation in resource-constrained field conditions.
Recommended Citation
Allalen, Abderrezak, "Adaptive and Scalable Edge-Based Weed Classification Through Distilled Mixture of Experts And Metric Learning" (2025). Masters Theses & Doctoral Dissertations. 500.
https://scholar.dsu.edu/theses/500