Publications

Object detection and tracking on UAV RGB videos for early extraction of grape phenotypic traits

Ariza-Sentís, Mar; Baja, Hilmy; Vélez Martín, S.V.; Pereira Valente, J.R.

Summary

Grapevine phenotyping is the process of determining the physical properties (e.g., size, shape, and number) of grape bunches and berries. Grapevine phenotyping information provides valuable characteristics to monitor the sanitary status of the vine. Knowing the number and dimensions of bunches and berries at an early stage of development provides relevant information to the winegrowers about the yield to be harvested. However, the process of counting and measuring is usually done manually, which is laborious and time-consuming. Previous studies have attempted to implement bunch detection on red bunches in vineyards with leaf removal and surveys have been done using ground vehicles and handled cameras. However, Unmanned Aerial Vehicles (UAV) mounted with RGB cameras, along with computer vision techniques offer a cheap, robust, and timesaving alternative. Therefore, Multi-object tracking and segmentation (MOTS) is utilized in this study to determine the traits of individual white grape bunches and berries from RGB videos obtained from a UAV acquired over a commercial vineyard with a high density of leaves. To achieve this goal two datasets with labelled images and phenotyping measurements were created and made available in a public repository. PointTrack algorithm was used for detecting and tracking the grape bunches, and two instance segmentation algorithms - YOLACT and Spatial Embeddings - have been compared for finding the most suitable approach to detect berries. It was found that the detection performs adequately for cluster detection with a MODSA of 93.85. For tracking, the results were not sufficient when trained with 679 frames. This study provides an automated pipeline for the extraction of several grape phenotyping traits described by the International Organization of Vine and Wine (OIV) descriptors. The selected OIV descriptors are the bunch length, width, and shape (codes 202, 203, and 208, respectively) and the berry length, width, and shape (codes 220, 221, and 223, respectively). Lastly, the comparison regarding the number of detected berries per bunch indicated that Spatial Embeddings assessed berry counting more accurately (79.5%) than YOLACT (44.6%).