Peskas

Active Users

1.6K

Total Users

1.6K

Farmers Reached

Phase of Innovation: Scaled

Data intelligence platform for aquatic food systems

Alex Tilley a.tilley@cgiar.org

https://peskas.org/

B2G

Public Good

Primary technology

Data analytics & Business Intelligence Predictive modelling & analytics Visualization

Services offered

Aggregation of produce and market linkage Decision support agriculture Processed data and analytics Yield assessment

Secondary users

Researchers, Others, NGO/CSO-and-Other, Implementing-agencies

Primary target country

Global

Secondary target countries

East Timor

Malaysia

Kenya

Malawi

Mozambique

Tanzania

Type of innovation

Aggregation of produce and market linkage

Decision support agriculture

Processed data and analytics

Yield assessment

Peskas delivers real-time data and insights for small-scale fisheries to enable evidence-based decision-making at different scales, from fishers to national government. Harnessing user-centered design, agile and open science principles, Peskas paves the way for thriving small-scale fisheries and healthier aquatic ecosystems.

Software description Peskas is a software designed to turn fisheries data into insights for decision-makers and stakeholders. Small-scale fisheries data can include information about the fishers and indicators of their livelihoods and income, the number and type of vessels involved, where these vessels and/or fishers go, the fishing gears they use to catch fish and invertebrates, and the composition of their catches such as the species, number of individuals, and their market value. Peskas manages these data from their collection at landing sites to visualisation and publication of fisheries trends, statistics and insights on a decision dashboard and in automated reports. It utilises an automated digital infrastructure to process and visualise fisheries data, helping researchers and stakeholders gain knowledge and insights that can inform data-driven decisions for sustainable fishery management and be used in various publications such as articles, reports, and notes. The system is designed for a wide range of users, including fisheries managers, policymakers, researchers, and technical staff. The following subsections describe the Peskas's architecture and functionalities in detail. Software architecture Peskas is composed of two core R packages, one dedicated to the data workflow, peskas.timor.pipeline [9] and one intended for visualisation, peskas.timor.portal [10]. The R language was selected for its powerful statistical computation and graphical capabilities conducive to data analysis tasks. Docker containerization is a cornerstone of the architecture, ensuring platform independence and streamlined deployment across diverse environments. Version control and automated workflows are managed through GitHub and GitHub Actions, supporting continuous integration and deployment practices, while Google Cloud Platform is used for data storage. For interactive data visualisation, a Shiny web application based on a Bootstrap 5 UI kit (Tabler) and running on Google Cloud Run is employed to construct a dynamic, multilingual web application, whereas Rmarkdown is utilised to generate detailed reports integrating both analysis and outputs. The curated Peskas data are open access, archived in the Harvard Dataverse storage repository automatically via the Dataverse Application Programming Interfaces (API). The architecture of Peskas was designed to minimise costs as far as possible, and to be adaptable and scalable, allowing for flexibility and ease of maintenance. From data collection and preprocessing to analysis, validation, and dissemination, each step runs within a Docker environment, ensuring consistency and reproducibility. The use of GitHub Actions automates the entire pipeline, enabling daily updates to the data and visualisations provided by the portal. Fig. 1 illustrates this architecture of the Peskas platform which ensures all components work together seamlessly, providing reliable, scalable, and insightful analytics for fisheries management. Software functionalities The Peskas platform is engineered to deliver a comprehensive array of functionalities that streamline the collection, processing, analysis, and presentation of fisheries data. This multifaceted approach not only enhances the utility of the platform for various stakeholders but also ensures the integrity and accessibility of the data, as the following subsections describe. Peskas consists of six core modules, each dedicated to a particular data flow step and consisting in turn of a series of functions: 1. Data Collection: KoBoToolbox, an open-source suite of tools for field data collection, is used to collect digital surveys in challenging environments with limited internet connectivity. Peskas pulls the data from the KoboToolbox database in near real-time that are collected from small-scale fisheries landing sites. Secondly, solar-powered GPS vessel trackers are used to collect continuous (every 6 s) geolocation data from assigned vessels. 2. Pre-processing: Data formatting, shaping, and standardisation to prepare the raw data for analysis. 3. Validation: Outlier detection and error identification with an email alert system to ensure near-real time maintenance of data quality. 4. Analytics: Modelling fisheries indicators, nutritional characterization, and data mining to extract valuable insights. 5. Data export: Automated dissemination of processed and analysed fisheries data to ensure accessibility and comprehension. This involves restructuring data for dashboard integration and open publication. 6. Visualisation: Tools for data reporting and sharing of insights through a comprehensive dashboard. Data collection and integration A key functionality of Peskas is its capability to automate the retrieval of data from a wide range of sources. The entry points for data into Peskas include digital catch surveys with fishers as they return to shore, vessel trackers that record GPS data, and static fishery data such as the number of boats per municipality, which is obtained from the government registry of fishing vessels. This combination of dynamic and static data initiates the data flow into the Peskas pipeline, enabling comprehensive fisheries data analysis. Peskas leverages APIs such as the one provided by KoBoToolbox to automate the retrieval of survey data (Table 1). While KoBoToolbox is a key tool for digital field data collection in diverse environments, Peskas is designed to work with any similar XForms or data collection platform. For tracking vessel movements, Peskas integrates with Pelagic Data Systems (PDS), which offers a vessel tracking system (hardware) and data-as-a-service solution for monitoring fishing vessels. This yields high-resolution data on vessel movements. However, Peskas's architecture is built to accommodate data from any tracking system that provides compatible data formats, ensuring flexibility in sourcing vessel movement data. The automated integration of tracking data into Peskas enriches the dataset with high-resolution geolocation and movement information, enabling the calculation of fishing effort per boat and extrapolation across municipal and national fleets. Beyond these specific integrations, Peskas currently utilises Airtable3 as the preferred platform for its metadata registry, including static tables with vessels, catch, and regional information used for the subsequent data processing steps. This approach ensures that Peskas remains a versatile tool for data collection and integration, capable of working with a broad spectrum of data sources and types. Preprocessing Once the data is collected, Peskas initiates a preprocessing phase where the raw data undergoes cleaning, normalisation, and transformation. This phase standardises data formats, ensuring uniformity and converts data into tidy formats. It also involves a preliminary quality check for inconsistencies and missing values. One of the challenges addressed during this phase is the integration of data collected via three distinct survey versions of KoboToolbox. These versions, each with its unique structure, necessitate being merged to achieve a unified data framework. After the merging of the KoboToolbox data, the pipeline triggers two pivotal data-mining functions: get_weight and, subsequently calculate_nutrients. These functions extract information regarding the weight of each catch and its nutrient composition, respectively. These functions pull data from FishBase [10], a comprehensive external resource, by employing an API. This integration allows for the dynamic inclusion of length-weight relationships and nutrient concentration information based on the most updated information. Moreover, this stage involved the alignment with the established aquatic foods ontology [9], where variables are not only renamed for enhanced clarity but are also aligned with a recognized and controlled vocabulary where applicable. Validation In the validation module of Peskas, data undergoes a rigorous validation process to ensure its accuracy and consistency for subsequent analyses; it involves the validation of both catch and vessel movement data. For catch data, the process involves the examination of outliers and anomalous values, which when identified are excluded from further analysis. The validation procedure is organised through a systematic labelling process, wherein each data entry is assigned a specific code that reflects a particular validation status. For instance, data entries without outliers are tagged with the code "0," indicating "no alerts." Conversely, a code "5″ signals that the "Trip duration is too long," and a code "7″ denotes that the "Recorded length is too large for the catch type," among others. In total, 21 distinct alert flags have been established, each addressing a specific and critical dimension of the data to facilitate a comprehensive quality assessment. The decisions on how to structure data alerts took place through dialogue with local stakeholders and fisheries experts to ensure they were context specific. Validation employs both univariate and multivariate methods to detect outliers and assess the data precision. Univariate outlier detection is conducted using the median absolute deviation (MAD) method, in the “univOutl” package [11]. Multivariate approaches are employed to verify the accuracy of specific variables, such as catch weight, where outliers are identified using thresholds based on Cook's Distance of the residuals between catch weight and catch value. To refine outlier detection accuracy and fine-tune detection parameters, entries flagged as potential outliers undergo manual scrutiny on a specialised validation platform. This internal tool, a Google Sheet integrated with Kobotoolbox via Google Apps Script, streamlines the review process. This setup facilitates a more efficient and effective manual validation step, especially from the stakeholder's perspective, ensuring that outlier detection is both precise and adaptable to the nuances of the data. The final product of this validation module is a data frame that has been meticulously cleaned and validated, ensuring its readiness for further analysis and processing in modelling and metric extraction phases. Vessel movement data validation focuses on the unique challenges associated with global position data. Issues such as undetected trips, merged trips, or split trips, as well as potential delays or losses of information due to poor network coverage, necessitate a tailored validation approach. The validate_pds_data function is instrumental in addressing the complexities associated with GPS tracking data, by evaluating each vessel trip for its duration and the distance covered. This evaluation utilises specified parameters to assess the durations and distances, along with the distance between the start and end points of a fishing trip. Importantly, it also incorporates quality metrics to refine data quality. Among these metrics, “outlier limits” identify and exclude data points that markedly deviate from expected patterns, such as anomalously high speeds, indicating potential inaccuracies or anomalies in the data. Similarly, “signal trace dispersion” measures the consistency and reliability of GPS signal locations over time, where a high dispersion level could suggest issues like poor GPS signal quality or errors in data transmission. Analytics In the analytics module of Peskas, following data validation, the focus shifts towards quantifying fisheries indicators. This module is tasked with estimating the average catch per trip as well as the average revenue per trip across municipalities. Also, we estimate the number of landings per fisher by month derived from a generalised linear mixed model. At the heart of this module, the estimate_fishery_indicators function orchestrates the workflow, beginning with the ingestion of trip data, which is then enriched with metadata on registered boats. Since 2018, Peskas has used 479 GPS devices distributed across various municipalities in Timor-Leste, covering an average of 15 % of the total fishing vessels within these areas. These devices continuously collect high-resolution geolocation data, enabling the modelling of fishing trips on weekly, monthly, and annual bases for each municipality. By integrating this trip data with the total number of fishing vessels per municipality from MAF registry, Peskas estimates the overall fishing effort. This methodology allows for scaling average catch and revenue per trip to a regional level, and thus national, providing an overview of fisheries activities beyond the directly monitored vessels. Furthermore, catch estimates are performed not only on aggregated values but also across different fish groups and key species. This disaggregation enhances the understanding of species-specific trends and their implications for fisheries management. This approach adds a critical layer of granularity; by dissecting the catch data into specific taxa, the module enhances the understanding of species-specific trends and their implications for fisheries management. Data export The export module in Peskas disseminates the processed and analysed fisheries data to a wider audience, ensuring both accessibility and usability. This module undertakes a data restructuring process aimed at fulfilling dual objectives: firstly, to align with the dashboard framework for visualisation and user interactions, and secondly, to prepare the data for open publication. For open publication, both raw and aggregated (national and municipal) data undergo conversion from RDS file format to CSV. This transformation caters to the needs of a general audience, facilitating broader access and understanding. Accompanying the data, an informative README document is automatically generated, offering detailed descriptions of the data content and fields. Data is uploaded to the Harvard Dataverse portal under the CC BY-NC-SA 4.0 licence through the portal API service, automatically, every month (https://dataverse.harvard.edu/dataverse/peskas). This automated process ensures that the latest fisheries data are consistently made available to researchers, policymakers, and the public. Peskas digital catch surveys and vessel tracking are entirely voluntary on the part of the fisher. Peskas also ensures that sensitive data, such as personally identifiable information or commercially valuable data, is anonymized and aggregated before being shared. Visualisation and reporting To make the insights accessible and actionable, Peskas boasts an advanced visualisation feature through an interactive Shiny dashboard. This dashboard serves as the primary interface for users to explore, analyse, and interpret the data in real time. It is complemented by the capability to generate detailed reports using Rmarkdown, which allows for the dynamic incorporation of analysis results, including charts and tables, into comprehensive documents (Fig. 2). User interface The Peskas portal (https://peskas.org/) provides general information on the Peskas initiative. The Timor-Leste dashboard (https://timor.peskas.org/), is a robust web application hosted on Google Cloud Run, ensuring scalability and reliability. It leverages a suite of R packages to deliver a dynamic, interactive user experience, particularly in visualising fisheries data. The dashboard is updated daily with fresh data from the peskas.timor.pipeline, orchestrated through GitHub Actions, ensuring that the displayed information is real-time and actionable. Developed using the Shiny framework, the portal incorporates advanced visualisation tools such as “kepler.gl”, a powerful Javascript library for geospatial data analysis, which is used to provide stakeholders with a visual understanding of fishing activity distributions across Timor-Leste. The user interface is designed to be intuitive and accessible, featuring a multilingual option that currently includes English, Portuguese, and Tetum. This feature is crucial for engaging local communities in sustainable fisheries management by allowing them to access and analyse data in their preferred language. Data interaction is facilitated by various R packages integrated into the portal, such as “reactable” for interactive tables [12], and “apexcharter” for responsive charting solutions [13]. These tools enable users to drill down into specifics such as catch volumes, species distribution, and fishing efforts, with the flexibility to customise views and download data according to their needs. The portal's backend is supported by Google services, with authentication managed via “googleAuthR” [14] and data storage solutions provided through “googleCloudStorageR” [15], ensuring secure and scalable cloud storage options (Fig. 3).

Download

Innovation Showcase

Active Users

Total Users

Farmers Reached

Peskas

Data intelligence platform for aquatic food systems