Systematic review on training models for partial nephrectomy

Robot-assisted partial nephrectomy (PN) is a complex and index procedure with a difficult learning curve that urologists need to learn how to perform safely. We systematically evaluated the development and validation evidence underpinning PN training models (TMs) by extracting and reviewing data from PubMed, Cochrane Library Central, EMBASE, MEDLINE


INTRODUCTION
The difficult learning curve of laparoscopy [1][2][3] and the advent of robotic surgery reinforced this transition and led to an exponential increase in the number of robot-assisted partial nephrectomy (RAPN) procedures performed.This is a complex and index procedure that urologists need to learn how to perform safely and has a difficult learning curve that requires a step-by-step training process.RAPN has several critical steps and requires the need to obtain negative surgical margins and control bleeding to avoid a potentially lifethreatening hemorrhage [4,5] .
The introduction of surgical innovations and the need to ensure patient safety motivated international experts to develop structured training programs [6,7] with validated curricula that include acquiring procedural skills in laboratory training models (TMs) and not simply relying on caseload.Rather, the goal necessitates demonstration of a proficiency benchmark in the skills laboratory before performing the procedure on a patient [6] .
Having access to a training center with animal-based ex-or in-vivo TMs might be the best option [7] .Unfortunately, most trainees do not have access to this type of training facility, and since many hospitals cannot afford to purchase a robotic platform specifically for training purposes, 3D printed models and virtual reality (VR) simulators are considered cost-effective solutions for the acquisition of partial nephrectomy (PN) procedural skills.Skills acquired using TMs can be transferred to the skill level required for safe surgical practice [8] , especially if surgeons are enrolled in a proficiency-based progression (PBP) training program for PN [9] .However, this approach is contingent on high-level validation evidence supporting the use of a TM [10] .This review sought to evaluate the type and level of validation in the literature on the efficacy of existing PN TMs and demonstrate the skill acquisition and performance levels required for safe surgical practice.

Search strategy
A systematic review of the literature was conducted using the PubMed, Cochrane Library Central, EMBASE, MEDLINE, and Scopus databases.We searched from the inception of the databases until April 2023.All references in the included papers on TMs were also screened.The keywords used for this research were "Partial nephrectomy AND Training models".The scope of this research was limited to the English language.This systematic review was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocol (PRISMA-P) guidelines [11,12] .

Data extraction and analysis
After identifying all eligible studies, two independent reviewers (Farinha RJ and Mazzone E) screened all titles and abstracts or full texts for further clarification and inclusion.Literature reviews, editorial commentaries, and non-PN TM studies were excluded from the initial screening.Randomized controlled trials (RCTs) and nonrandomized observational studies (cohort studies) on validity and skill transfer from the TM to clinical PN were included.Other inclusion criteria were the use of objective metrics to measure task execution or subjective assessments of PN performance using the scores of global evaluative assessment of robotic skills (GEARS) or global operative assessment of operative skills (GOALS) [9,[13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29] .
Disagreements regarding eligibility were resolved by discussion between the two investigators until a consensus was reached regarding the studies to be included.The level of evidence was assigned according to the Oxford Center for Evidence-based Medicine definitions [30] .This article does not contain any studies involving animals performed by any of the authors.

Study selection
Figure 1 shows the flow of studies through the screening process.A total of 331 papers were blindly screened by two reviewers (Farinha RJ and Mazzone E) by reading all titles and abstracts, with 16 of these records included for further evaluation based on predefined eligibility criteria.At this point, the final evaluation for inclusion in the quantitative analysis was carried out by three reviewers (Gallagher AG, Farinha RJ, and Mazzone E), who selected 14 manuscripts.

Training models
The final screened manuscripts included four animal-based, eight 3D printed, and two VR TM studies for PN procedural training.Animal TMs were used in vivo [14] , but more commonly, ex vivo [9,15,16] models employing porcine kidneys were employed.Pseudo-tumors were created either through percutaneous injection of liquid plastic [14] , gluing a styrofoam ball to the renal parenchyma [30] , or simply demarcating an area to be resected [9,15] .The pseudo-tumoral areas were established in accessible portions of the renal parenchyma, with sizes varying between 2 and 3.8 cm [9,14,16,31] , and perfusion was emulated in two of the models [16,32] [Table 1].

Studies
The level of evidence of all included studies was ≥ 3b; different face, content, and construct validation studies were identified, and a summary is presented in Table 2.

No
Fernandez et al. [20]   questions that were assessed using a ten-point Likert scale, where 96% of the participants reported an enhancement and no hindering of their learning experience [14] .By answering one question on a ten-point Likert scale, one animal study reported that all participants considered the model helpful in improving their confidence and skills in performing PN [31] .In another study, the experts rated the TM as "very realistic" [median score 7/10, range (6-9)] [27] , and in another study, the model was rated as having contributed to their skill (4/5) and confidence (4.1 out of 5) in performing robotic surgery [9] .
The VR TMs were evaluated with a questionnaire immediately after the model's use [27,28] , and the questions were scored using a five- [27] or ten-point Likert scale [28] .One VR TM study reported that the full-length AR platform was very realistic (median 8/10, range 5-10) compared to the in vivo porcine model (median 9/10, range 7-10, P = 0.07) [27] , and another study reported a mean score for anatomical integrity of 3.4 (± 1.1) using a five-point analog visual scale [28] .
In 3D TM studies, unspecified questionnaires use qualitative evaluation and Likert scales to assess and report results on content validity.One model is "recommended as a teaching tool" for residents and fellows [18] .Another was considered "useful as a training tool" by 93.7% of the participants [19] , and another study reported a total content score of 4.2 using a five-point Likert scale [29] .
Using a non-validated questionnaire and a 0-100 Likert scale anchored to useless-useful, one model reached 90.7 for overall usefulness for training, being considered most useful "for trainees to obtain new technical skills" (mean score 93.8) and less useful "for trainees to improve existing technical skills" (mean score 85.7) [17] .The only study in this group of TMs, in which the assessment was exclusively performed by experts, did not report data on content validity [26] .
Using an unspecified questionnaire, experts rated the procedure-specific VR renorrhaphy exercise as highly useful for training residents and fellows, although less useful for experienced robotic surgeons new to RAPN.The model was highly rated for teaching surgical anatomy (median 9/10, range 4-10) and procedural steps (8.5/10, range 4-10).Technical skills training was rated slightly lower, although still favorably (7.5/10, range 1 to 10) [27] .Using a visual analog scale (score range 1-5), the surgeons evaluated the utility of the simulations, attributing a score of 4.2 (± 1.1) [28] .
Photo or video recordings of the surgeon's performance were collected, and experts were blinded to the experience level and the surgeon performing the task.The metrics used varied from GEARS [9,19,27] , GOALS [16,29] , and clinically relevant outcome measures (CROMS) [19] to different operation-specific metrics, namely, time (renal artery clamping [17,19] , tumor excision [9,34] , total operative [9,16] , and console time [19] ), estimated blood loss [19] , preserved renal parenchyma [17] , surgical margin status [16,17,19,29] , maximum gap between the two sides of the incision [29] , total split length [29] , and quality of PN (scored on a Likert scale) [9] .In one animal model, instrument and camera awareness and the precision of instrument action were subjectively scored using a Likert scale [27] .Built-in algorithm software metrics were used in one VR TM, scoring instrument collisions, instrument time out of view, excessive instrument force, economy of motion, time to task completion, and incorrect answers [27] [Table 3].

Concurrent validity
One AR/VR simulator study compared the performance of experts on a virtual and an in vivo porcine renorrhaphy task.It was found to have equal realism and high usefulness for teaching anatomy, procedural steps, and training technical skills of residents and fellows, although less so for experienced robotic surgeons new to RAPN [27] .
None of the studies generalized the test results to other tasks.Several authors report their models as realistic and useful training tools for residents and fellows, although they are usually not considered highly beneficial for training consultants [9,14,[16][17][18][19][20][21][22][23][24][25][26][27][28][29]31] . The mplications of using diverse models differ across studies.Generally considered an effective surgical education/training tool to learn key steps of PN and develop advanced laparoscopic/robotic skills, they are associated with fewer logistic concerns.This is due to their lack of necessity for dedicated teaching robots or wet/laboratory facilities [Table 4].

DISCUSSION
The aviation industry established the safety benefit of training on simulators many decades ago [35] , inspiring surgeons to pursue their training in the laboratory before entering the operating room [36,37] .Skills acquired using TMs can be transferred to the performance level required for safe surgical practice [8] , especially if surgeons are enrolled in a PBP training program for PN [10] , although this recommendation is contingent on a high level of evidence [10] .
As a reference procedure that urologists need to learn with a difficult learning curve and potentially lifethreatening complications, the acquisition of skills for the performance of a safe PN should start in the skills laboratory.This review aimed to evaluate the type and level of validation evidence for the efficacy of existing PN TMs in acquiring and transferring surgical skills to the performance level required for safe surgical performance.No RCTs were found among the reviewed studies.Fourteen cohort studies on PN TMs based on animal tissue, 3D printing, and VR/AR technology were identified.Using the classification developed by the Oxford Center for Evidence-Based Medicine, the level of evidence assessed was low [30] .

Training models
Animal TMs closely emulate human tissues, allowing trainees to understand anatomical structures, natural tissue consistency, and movement during dissection and suturing.These are critical features for training in tumor excision and renorrhaphy.The reviewed studies used different substances to create pseudo-tumors of a consistent size.Although no cost-effective studies have been conducted, these models were found to be economical and widely available.
Several potential advantages were identified with 3D printed TMs.They were derived from the patient's CT or MRI images and were, therefore, patient-specific.Furthermore, they provide the potential benefits of preoperative rehearsal.The technology used to print the mold produced durable, reliable, and repeatable models, and the created phantoms accurately represented the patient's anatomy and diverse tumor geometries.
Different substances were used to fill the mold to produce the final model.Silicone represented the kidney tissue in terms of tear strength, but PVA-C was the most frequently used [17,23,25,26] .The latter closely resembled real tissue, allowing the addition of enhancing agents (gadolinium and barium), providing effective imaging by CT or MRI, which could be recycled.Develop patient-specific pre-surgical simulation protocol for RALPN Compare resection times between the model and the actual tumor in a patient-specific manner None identified Can assist in surgical decision-making, provide preoperative rehearsals, and improve surgical training Predict feasibility of RALPN within an acceptable ischemia time Glybochko et al. [28]   Evaluate effectiveness of personalized 3D printed models for pre-surgical planning Used time-based metrics and blood loss None identified Elasticity and density similar to real kidney Can contribute to improvement of surgical skills and facilitate selection of optimal surgical tactics Ohtake et al. [33]   Examine effectiveness of the model as a tool for practicing LPN Although the preparation and use of 3D printed models were labor intensive, and monofilament sutures were recommended (e.g., braided sutures easily torn this material) [18,19] , they involved fewer logistic concerns than the use of animal models [18,19] .They are simple, easy to set up, and likely have a practically indefinite shelf life.The price was reported in some studies, purporting its economic value, but the cost of the 3D printer was not considered [17,19,23,26] .
The feasibility of incorporation into a training course was the focus when selecting clinically relevant steps to emulate.Therefore, most of the 3D printed models focused on simulating tumor resection and renorrhaphy.Some models include other anatomical structures, potentially increasing their realism and educational value [19,26,29] .
The exponential increase in computing power over the last decade makes VR/AR TMs very promising.By including different teaching tasks, patient-specific TMs allow preoperative rehearsal.However, signal processing delays induce a lack of realistic tissue responsiveness during the dissection of tissue planes, tissue excision, suturing, knot tying, and bleeding, which significantly compromises the capacity of VR simulation to accurately emulate the PN procedure and thus their value as a training tool [27,28] .
Despite the advantages outlined herein, these TMs have several drawbacks.The need to optimize perfusion flow pressures, lack of hilar dissection, clamping, and hemostasis management were identified as potentially needing improvements.Overcoming these shortcomings will accelerate the evolution from basic benchtop and part-task trainers to the development of realistic and accurate recreation of an entire PN procedure, which would underpin effective surgical training.

Studies
The clinical differentiation of the study population was heterogeneous, and the skill level criteria used to differentiate novices, intermediates, and experts varied considerably between studies.These criteria were unclear, and expertise was defined based on the number of surgeries performed rather than the number of PNs performed by the surgeon.
The face and content validity studies used qualitative (i.e., based on Likert scales) questionnaires that did not appear to be supported by validation evidence [9,19,29] .Responses were elicited from the participants in variable time frames, that is, up to one week after the use of the TM [14] .Reports of high rates of realism and usefulness of training tool results were mainly obtained from experts' evaluations.Furthermore, some studies enrolled novice surgeons with slim-to-no PN operative experience [9,18,29,31] .
One study used photographs of the models and the tasks performed to complete the evaluation [16] .The majority of the construct validity studies assessed video recordings [9,17,19,27,29] .They used expert assessors who were blinded to the experience level and surgeon performing the task.Time was employed as the main metric despite evidence demonstrating that it has a weak association with performance quality [38] .Only one concurrent validity study was conducted with one VR simulator, and no studies assessing the predictive aspect or transfer of skills were identified.
In the studies reviewed, Likert-type scales, such as GEARS and GOALS, were used to evaluate users' performance in the TMs, although it was consistently demonstrated that they produce unreliable assessment measures [9,16,19,27,29,39] .No procedure-specific binary metrics were reported, and none of the tasks used performance errors as units of performance assessment.Furthermore, the methodology employed to train assessors in using the assessment scales was not reported, nor was an interrater reliability level.
All identified validation studies followed the nomenclature and methodology described by Messick [40] and Cronbach [41] rather than the framework described by Kane [18] , reporting data on face, content, construct, and concurrent validation instead of using Kane's validation processes (i.e., scoring, generalization, extrapolation, and implication) [18] .In the "Scoring inference", the developed skill stations included different performance steps of the PN, and fairness was partially guaranteed by the production of standardized TMs.However, the main problem was that scoring predominantly used global rating scales with no reported attempts to demonstrate or deal with the issue of performance score reliability.
Furthermore, no effort was expended in the "Generalization inference" area.The items used to assess performance were ill-defined.The researchers did not evaluate the reproducibility of scores, nor did they investigate the magnitude of performance error; therefore, there was no identification of the sources of error.
The studies reviewed here investigated whether the test domains reflected key aspects of the real PN, but no analysis was performed to evaluate the relationship between the performance and real-world performance.The same can be said about the "Implications inference" theme.Although a weak evaluation of the impact of the model's use on users was shown, no impact evaluation of its use was addressed outside the study population.Furthermore, no comparison between groups of users and non-users of TMs was undertaken, nor an analysis of relevant clinical outcomes was performed.All these observations make it very difficult to gather evidence supporting the decision to integrate these TMs into PN training programs.
Several fundamental flaws pervaded the reviewed studies.There was considerable heterogeneity in the materials used to build the TMs, a lack of comparisons between the different models, and objective binary metrics demonstrating skill improvement.Although cost was described in some studies, no costeffectiveness data were reported, and the level of evidence to support their use for training purposes was weak.All these reasons preclude a recommendation for the adoption of these TMs in PN training programs.
Since TMs are a tool for delivering a metric-based training curriculum, future research should focus on the improvement of the models, and the starting point should be the development of objective, transparent, and fair procedural-specific metrics [42] .A clear definition of expertise criteria, considering the performance level of the surgeons and not the number of surgeries performed, should be a main concern.Kane's framework for study validation should be used, and comparisons should be made between models and between study groups trained with and without the different TMs.Improvements will only emerge from the conjoined efforts of surgeons, human factor engineers, training experts, and behavioral scientists [43] .

CONCLUSION
This review substantiates the absence of well-designed validation studies on PN TMs and their inherently low level of scientific evidence.No RCTs or impact inferences were found to support the adoption of TMs in PN training curricula.

Figure 1 .
Figure 1.Study selection process, according to the PRISMA Statement.PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
specific kidney models for the purpose of pre-surgical resection and incorporation into simulation labs No scoring.Compare clinical results between patients from the study and similar studies from a RAPN database None identified Patients who underwent the preoperative surgical model experienced lower estimated blood loss at the time of resection Use of this type of model may decrease the slope of the learning curve and improve patient outcomes Improved resection times Similar morphology and tumor volumes when compared with the real tumor von Rundstedt et al.