Model interpretation
Overview
Developing stock assessment models for management in the Gulf presents significant challenges for assessment scientists. Complex models are frequently required to accurately represent the large spatial extent of Gulf fisheries, the diverse array of fleets operating, and changes in management through time. However, the data available are generally insufficient to inform stock assessment parameter estimation at this necessary level of complexity without fixed assumptions being made. Because of this, a large part of assessment scientists’ responsibility when developing stock assessment models is to define the structural model form and decide which parameters to estimate. This can be considered the “art” of stock assessment modeling and requires using expert judgment to make sometimes subjective structural decisions such as delineating fleets and spatial areas, choosing the mathematical forms of fleet selectivity patterns, specifying time periods to allow changes in parameters values, and choosing which parameters to fix in the model and what values they should take. In the Gulf, these fixed decisions are believed to conceal a large proportion of the total uncertainty in model outputs and may bias mean estimates, though the exact impacts are not well understood.
Quantifying this concealed uncertainty could significantly improve the accuracy of OFL and ABC estimates, reduce inter-assessment estimate variability, and improve management outcomes in the Gulf. Research planning discussions identified two candidate methods for simulation testing that may be adaptable to the Gulf to quantify some of these hidden uncertainties:
Stock assessment models produced in the South Atlantic utilize a Monte-Carlo Bootstrap Ensemble (MCBE) approach to quantify uncertainty in benchmarks, current status, and quota limits OFL/ABC. This approach incorporates uncertainty in select fixed model parameters, such as natural mortality and stock recruitment curve steepness, based on external estimates or expert judgment. This uncertainty is used to produce several thousand bootstrapped random samples of these values along with resampled values for all data inputs, which can also include late phase recruitment deviations. Each of these random sample sets are then used to re-estimate the stock assessment model and produce more realistic confidence intervals for output values such as OFL and ABC. However, this method requires significant computing resources even for relatively simple assessment models and only addresses a limited subset of fixed parameter uncertainty sources. This approach does not incorporate structural uncertainty in model form, and requires more research to develop model weighting approaches beyond the currently used even weighting.
Stock assessment models produced for highly migratory tunas and sharks utilize a Multivariate Normal Ensemble (MVNE) approach to quantify uncertainty in benchmarks and quota limits OFL/ABC. The MVNE method is also intended to account for structural uncertainty that derives from fixed parameter assumptions similarly to the MCBE method above. The MVNE approach uses a grid approach with 3-5 proposed values for each parameter spanning the expected possible range. For each combination of values, a new assessment model is produced and full diagnostics are produced, which contrasts with MCBE simulations that are not diagnosed beyond confirming parameter estimate convergence. The MVNE approach exchanges MCBE’s computational overhead for analyst diagnostic time. While also relatively simple as currently used, the MVNE approach could be adapted to incorporate true structural differences between each ensemble member though at additional diagnostic effort.
Each of these methods will be simulation tested for use in the Gulf as part of this project. However, a significant limitation that both share is that they are only useful for incorporating known sources of fixed parameter uncertainty. Neither method provides any guidance to assessment scientists regarding how to select uncertainty sources and which assumptions of a model may be least supported or influential in the final OFL and ABC advice. In light of this, an anticipated hurdle to applying either of these methods in the Gulf is the brittle structure of many of the existing stock assessment models. Many of the structural assumptions necessary to fit observed data patterns are co-dependent such that changing any fixed parameter value or other assumption often requires many other changes to obtain stable estimates of model parameters. This brittle structure is an undesirable feature of current Gulf assessments, though it is currently considered unavoidable given the known complexity of regional fisheries and limited data available. Given the complex nature of interdependencies in stock assessment models it is often difficult and time consuming for experienced stock assessment scientists to develop stable model structures. Extensive trial and error of candidate models, using expert judgment to make each modification, is often required before a final stable model is ready for review. This trial-and-error approach to model development already limits throughput of current stock assessments and is not feasible within an ensemble modeling framework. For these reasons, improving model interpretability was identified as a key research development that would enable the adoption of ensemble-modeling approaches needed in the Gulf.
The current best practices for understanding data influences in stock assessment are parameter profile analysis, which quantifies the estimability of a parameter, the precision of its estimate, and the influences on this estimate from broad classes of observed data (i.e. catch data, CPUE index data, and population composition data), and model retrospective analyses, which quantify changes in model predictions when removing years of data. These methods require the stock assessment to be re-estimated many times, making these analyses extremely time consuming and often infeasible for each candidate model. Parameter profiling is often used to identify model misspecification, which occurs when the parameter is poorly estimated or its estimated value is due to two conflicting data sources supporting very different estimates. Retrospective analyses are used to identify unstable models in which results change significantly year to year based on the addition of new data. Each of these methods can identify that a problem exists in a model but provide little guidance towards the source of the misspecification.
To improve upon these methods, we propose development and testing of a novel approach for quantifying the influence of individual observed data points on the estimates of each model parameter. This will be achieved by quantifying the datum specific model gradient and second-order derivative components for each model parameter simultaneously. This can be achieved without re-estimating the model by utilizing the same automatic differentiation mechanics already used for stock assessment parameter fitting. Avoiding re-estimating the model will make this approach significantly faster, allowing it to be used more frequently to guide model development. In this approach, model parameter estimates will be bounded at the zero total gradient global optimum estimate and then independent gradients and second-derivatives will be calculated with respect to each individual data point. This approach will allow the results to be inspected:
Individually to identify potential outliers or data errors for additional QA/QC
By data source, similar to parameter profiling though for all model parameters and data sets
By year to identify temporal trends and changes in parameters to specify breaks or inform environmental co-variate correlations.
This approach will provide unprecedented interpretability of model dynamics, significantly increase the speed of model development, and potentially enable future development of automated model building procedures. In addition to speeding up the stock assessment process, understanding temporal residuals in parameter estimates will enable assessment scientists to produce more robust models by quantifying parameter stability and identifying informative and conflicting data sources. For this phase of the project, the team will develop software to automate calculation of the newly proposed gradient diagnostics and interpret the results. The utility of these diagnostics for informing structural model development will then be simulation tested using the same approach as phase 1. Once validated, the method will be tested with the two candidate ensemble modeling approaches to verify their suitability for application in the Gulf. This phase is expected to span years 2-4 of the project with the potential to inform stock assessment development for a yet to be determined research track assessment as early as 2026 with the specific species being dependent on the finalized SEDAR calendar.