Validating the TALUS AI Mental Fitness Test:
A Validation Study Using
Gemini LLM Emulation

Tony Clark, CSO, 2-Dooz Research Labs: July 23, 2025

Abstract

TALUS™ is a foundational Mental Health AI model developed by 2-Dooz, designed for applications including suicide risk assessment and mental fitness estimation. This study focused on validating the TALUS AI model's mental fitness estimation for a 'typical' individual. The TALUS AI model, trained on high-quality, clinician-curated datasets, utilizes a three-level mental fitness scale: Thriving, Existing, and Fragile.

To simulate a typical person, the Gemini Large Language Model (LLM)—trained on a vast, diverse dataset of more than a trillion tokens—was prompted to answer the TALUS AI Mental Fitness Test survey questions. The experiment affirmed that, based on Gemini's emulated responses, a typical person would be categorized as 'Thriving' by the TALUS AI model. This finding aligns with 2-Dooz's unpublished real-world user data, where a large majority of voluntary participants were also estimated as 'Thriving.' This experimental validation supports the ability of the TALUS AI Mental Fitness Test to accurately estimate a typical user’s mental fitness level.

Introduction

Mental fitness is the capacity for resilience and effective functioning, enabling individuals to manage stress and regulate emotional responses even amidst life's inevitable challenges. Accurate and accessible tools for assessing mental fitness are crucial for early intervention and promoting overall well-being. This paper introduces the TALUS AI Mental Fitness Test, an innovative assessment tool developed by 2-Dooz, and presents an experiment designed to validate its estimations for a generalized "typical person" using a Large Language Model (LLM).

The TALUS AI Mental Fitness Test considers an individual's current stress, anxiety, depression, and trauma levels, assessing these through a series of survey questions. Based on user inputs, the TALUS AI model estimates one's overall mental fitness state, categorizing it into one of three distinct levels:

Thriving: Indicates good to great mental fitness, characterized by excellent resilience, positive energy, self-awareness, exceptional emotional regulation, superior attentiveness, and highly effective coping skills.
Existing: Represents below-average to average mental fitness, where individuals may occasionally experience ups and downs. Symptoms of anxiety, depression, and/or distress might moderately interfere with daily activities.
Fragile: Signals acute mental health distress, suggesting the need for immediate professional intervention.

The application of Large Language Models (LLMs) to emulate study cohort members in psychological research is nascent, making the current study's use of Gemini to validate the TALUS AI Mental Fitness Test a pioneering effort. The "typical person" in this experiment reflects the aggregate statistical profile of human experience embedded within the massive and diverse training dataset of the Gemini LLM. Leveraging Gemini's capacity to emulate human-like responses, this study’s primary objective is to validate the TALUS AI model’s mental fitness estimations by determining how a representative "typical person" would score on the TALUS AI Mental Fitness Test.

2. Background

2.1 The TALUS AI Model

TALUS (Trained Artificial Logic Used for Safety) is 2-Dooz’s foundational Mental Health Safety AI model. Its name is inspired by the talus bone, a small, instrumental component in the ankle responsible for balance and supporting the body's weight. Similarly, TALUS AI serves as a core model, forming the basis for a family of multimodal AI models designed to address specific aspects of mental health.

Assuming a simple definition where each vector is a numerical representation of a token, the theoretical size of the TALUS AI vector space is on the order of 555, implying a total number of possible TALUS AI tokens also on the order of 555. A key differentiator for the TALUS AI production model is its training on high-quality, clinician-curated datasets. This approach adheres to Clark’s Law[1], which postulates that for the same level of model performance, a higher quality training dataset correlates with a lower requirement for training and inferencing capacity. In practice, this allows for the implementation of a highly compact model without compromising prediction/estimation accuracy. The dataset used to train the current (January 2025) production version of the TALUS AI model is remarkably small, less than 4 x 10-34 percent of the theoretical TALUS AI token space, and a tiny fraction of the trillions of tokens typically used to train Large Language Models (LLMs).

TALUS AI is specifically trained on a dataset comprising both synthetic and clinical trial-obtained data. Updates to the model are paced to the availability of new clinical and other experimental data, with all incoming data meticulously evaluated and annotated by clinicians before inclusion in training updates. This rigorous process results in an "expert" mental health assessment model that is highly accurate while requiring minimal processing and energy consumption for training and inferencing; making its memory, processing, and energy footprint tiny in comparison to that of general-purpose LLM models.

[1] Tony Clark’s ‘Clark’s Law’ is akin to Gordon Moore’s ‘Moore’s Law. Clark’s Law is an observation and prediction law and not a classically defined scientific law. It highlights the importance of dataset quality in AI model training and inferencing. Clark’s Law provides a concise way of relating dataset quality to the ‘memory, processor, and power consumption’ footprint for an AI model.

2.2 The Gemini Large Language Model

Gemini is a sophisticated large language model developed by Google. It is engineered to process and understand complex information, generate human-like text, and engage in various conversational and analytical tasks. For the purpose of this experiment, Gemini’s capability to interpret survey questions, synthesize typical human behavioral patterns, and provide detailed, coherent explanations based on provided context was pivotal.

It is crucial to note that Gemini does not "feel" or "experience" emotions itself; rather, it predicts what a human would likely report based on its extensive training. Gemini's ability to emulate the responses of a "typical person" stems from its training on a vast and diverse dataset of text and code, estimated to be between 5.5 trillion and 11 trillion tokens. This training data encompasses a wide range of human communication, enabling it to perform:

Statistical Inference from Diverse Sources: The dataset comprises publicly available text, including scientific papers, psychological studies, surveys, news articles, books, and common conversational patterns. Gemini identifies statistical regularities in how people describe their experiences, emotions, and behaviors, particularly in response to common psychological inventories.
Pattern Recognition in Human Behavior: Within its training data, Gemini encounters descriptions of common human responses to stress, anxiety, life satisfaction, and mental health challenges. It learns to differentiate responses generally considered "normal" or "common" (e.g., occasional fatigue, minor worry) from those indicative of more significant clinical distress (e.g., chronic hopelessness, suicidal ideation, flashbacks).
Contextual Understanding: When presented with specific Likert scales and timeframes, Gemini applies its understanding of language and context to choose the most statistically probable and conceptually appropriate response for a generalized "typical person."
Absence of Personal Experience: As an AI, Gemini lacks personal experiences, emotions, or a "self." Its emulations are purely statistical predictions derived from aggregated human-generated texts, allowing it to reflect patterns without embodying them. For instance, when Gemini selects "Not at all" for a survey question, it reflects the overwhelming majority of its training data—representing general human experience as described in typical discourse—not reporting those symptoms as common or typical.

3. Methods

3.1 Participant (AI Emulation)

The participant in this experiment was Google's Gemini LLM. Gemini was instructed to emulate the responses of a "typical person," defined for this study as an aggregate statistical profile of human experience derived from its extensive and diverse training dataset. This approach allowed for a controlled, replicable simulation of a generalized human response, without the time and expense required to recruit a large representative cohort of human participants.

3.2 Materials

The primary assessment tool was the online version of the TALUS AI Mental Fitness Test, accessible via its demo portal (https://demo.2-dooz.com/mental_fitness_demo_survey_proc.php). This test comprises 55 questions covering various aspects of mental well-being, including stress, anxiety, depression, and trauma. Questions utilize standard Likert scales (e.g., "Strongly Disagree" to "Strongly Agree"; "Not at all" to "Nearly every day") and specify relevant timeframes (e.g., "last 2 weeks," "last week"). The test provides a single categorical output of mental fitness: Thriving, Existing, or Fragile. The composite survey, for the TALUS AI Mental Fitness Test, comprises validated questions and scales pertaining to standardized clinical tests such as the PHQ-9, GAD-7, and PCL-5.

3.3 Procedure

The experiment was conducted through a structured interaction with the Gemini LLM:

I. Priming Steps: The Gemini session was initiated with two preliminary prompts to establish its emulation capabilities and context.

a. Prompt 1: "Is it possible for you to emulate how a person might answer a question on a survey?"

i. Gemini's Response: "Yes, I can emulate how a person might answer a question on a survey. To do this effectively, I'll need a few things from you:

The Survey Question: What exactly is the question being asked?
The Context: What is the survey about? What kind of person should I emulate (e.g., age, background, general attitude – positive, neutral, skeptical)?
The Format: Is it a multiple-choice, open-ended text box, Likert scale, or something else?
Once I have those details, I’ll provide an answer that reflects a plausible human response in that specific survey scenario."

b. Prompt 2: "The survey is the one used for 2-Dooz's TALUS Mental Fitness Demo. The survey questions can be found at https://demo.2-dooz.com/mental_fitness_demo_survey_proc.php." This provided Gemini with the specific context for the questions.

II. Question Posing: Following the priming, Gemini was instructed to answer each of the 55 TALUS AI Mental Fitness Test questions. Each prompt explicitly stated the required Likert scale and timeframe, and reiterated the directive for Gemini to respond "in the way it believes a typical person would respond."

Examples of the prompt structure include:

a. "Assuming the Likert scale 'Strongly Disagree,' 'Disagree,' 'Neutral,' 'Agree,' and 'Strongly Agree,' please respond to the following question/statement in the way you believe a typical person would respond: 'If I do not enjoy doing something, I usually do not do it.'"

b. "Assuming a time-frame of the last 2 weeks and assuming the Likert scale 'Not at all,' 'Several days,' 'More than half the days,' and 'Nearly every day,' how might a typical person respond to the question 'how often have you been bothered by the following problem: “Little interest or pleasure in doing things?”'"

III. Data Input and Analysis: Each of Gemini's emulated responses to the 55 survey questions was manually entered into the TALUS AI Mental Fitness Demo. The TALUS AI model then processed these inputs and returned an overall mental fitness level for the Gemini-emulated "typical person."

4. Results

4.1 Primary Finding

Upon manual input of Gemini's emulated responses for a typical person into the TALUS AI Mental Fitness Demo, the TALUS AI model estimated the mental fitness level of this persona. The TALUS AI model concluded that a typical person taking the Mental Fitness Survey would be categorized as "Thriving." See accompanying Figure 1 for the visual representation of this result. This finding is consistent with 2-Dooz's unpublished observations that a large majority of real-world users, who self-volunteered to take the TALUS Mental Fitness Test, has also been estimated as possessing "Thriving" mental fitness.

4.2 Secondary Finding

To test Gemini’s understanding of the TALUS AI Mental Fitness Test and to gauge the consistency of Gemini’s emulated ‘typical persona,’ Gemini was additionally asked to predict the expected TALUS Mental Fitness Test result for a typical person and to provide its reasoning. Gemini replied: "Based on the responses to the 55 questions, which generally reflect the experiences of a 'typical person' (i.e., not experiencing severe or pervasive symptoms of mental distress or suicidal ideation), I would expect the TALUS Mental Fitness test result to be: 'Thriving.'"

Gemini further elaborated on its reasoning, providing insights into why the "Thriving" category was the expected outcome for a typical individual:

Resilient Emotional Landscape: Gemini's emulated responses consistently indicated general well-being, an acknowledgement of common daily demands without distress, and an absence of persistent low mood, suggesting that prolonged depressive episodes are not part of a typical person's regular experience.
Managed Stress and Anxiety: Responses implied occasional worry and stress, but not escalation to uncontrollable, pervasive anxiety. Physical manifestations of stress were minimal, indicating healthy physiological regulation.
Effective Trauma Processing (or Absence of Traumatic Impact): Crucially, questions related to re-experiencing symptoms (e.g., disturbing memories, dreams, flashbacks) received "Not at all" responses, highlighting that hallmarks of trauma disorders are not typical. Similarly, severe emotional numbing, avoidance, anhedonia, or negative beliefs about self/world were absent, indicating intact emotional capacity and a balanced cognitive outlook.
Strong Life Affirmation: Responses consistently showed an absence of suicidal ideation and a robust will to live, reflecting a fundamental drive for self-preservation. While acknowledging "brief passing thoughts" as part of the human condition, these did not indicate clinical risk.

In essence, Gemini concluded that its emulated "typical person" demonstrated high resilience, effective emotional regulation, cognitive clarity, healthy social functioning, and strong life affirmation—a holistic picture aligning perfectly with the "Thriving" category of mental fitness.

5. Discussion

The primary finding that a Gemini-emulated "typical person" was categorized as "Thriving" by the TALUS AI Mental Fitness Test provides significant validation for the TALUS AI model's functionality. This result aligns precisely with 2-Dooz's real-world observations, where a large proportion of voluntary users also exhibit "Thriving" mental fitness. This consistency suggests that the TALUS AI model accurately captures the parameters associated with a state of general well-being and effective functioning in the face of life's challenges, both when assessing human users and when processing responses from a statistically representative AI emulation.

The secondary finding underscores Gemini's advanced understanding of human psychological constructs derived from its vast training data. It suggests that LLMs can interpret and synthesize complex behavioral patterns to a degree that allows for a coherent, albeit simulated, understanding of human mental states. This goes beyond mere pattern matching, indicating a statistical grasp of the criteria that define different levels of mental wellness.

Limitations: It is crucial to acknowledge the inherent limitations of this study. While Gemini can expertly emulate human responses, it does not possess consciousness, emotions, or lived experience. Its "emulation" is a sophisticated statistical prediction based on patterns learned from its training data, not a genuine internal state. Therefore, the "typical person" assessed by TALUS AI in this study is a statistical construct, not a living individual. A living individual’s specific answers to the TALUS AI Mental Fitness Test questions may differ from those of the Gemini emulated persona. Furthermore, variations in prompt engineering or the use of different LLMs could yield different results. And, regarding 2-Dooz’s unpublished ‘self-reporting’ TALUS AI Mental Fitness Test findings, it is possible that self-reporting can underrepresent stigmatized symptoms. Author notes, per the immediately preceding limitation, that Gemini’s emulation also likely reflects an underrepresentation of stigmatizing symptoms.

Future Work: Future research could expand on this study by:

Conducting additional comparative studies where AI-emulated responses are directly compared to aggregated real-world human data for statistical validation.
Exploring the emulation of more nuanced or specific "personas" (e.g., typical responses of individuals in certain age groups, professions, or with specific pre-existing conditions) to test TALUS AI's granularity.
Utilizing AI to test the robustness of TALUS AI's assessment against responses that mimic specific mental health conditions (e.g., anxiety, depression) to ensure accurate categorization across the entire mental fitness spectrum.

6. Conclusion

This experiment successfully utilized the Gemini LLM to emulate a "typical person" and provide responses to the TALUS AI Mental Fitness Test survey questions. The TALUS AI model, upon processing the emulated inputs, consistently categorized the "typical person" as "Thriving." This outcome is further supported by consistent "Thriving" estimations observed among 2-Dooz's real-world volunteer users. This finding robustly validates the TALUS AI Mental Fitness Test's capability to accurately assess the mental fitness level of a typical user and highlights the significant potential of advanced AI models like Gemini to simulate generalized human responses for preliminary research and validation in the field of mental health assessment.

7. References

[1] TALUS AI, 2-Dooz’s Mental Health Safety AI Model, https://www.2-dooz.com/resilience-community/talus-ai, accessed July 23, 2025.

[2] TALUS AI Mental Fitness Test, 2-Dooz Mental Fitness Demo, https://www.2-dooz.com/resilience-community/talus-ai/TALUSdemo, accessed July 20, 2025.

[3] Gemini, 2.5 Flash, Google’s LLM, https://gemini.google.com/app/582557843c0c29fc, accessed July 20. 2025.

Page updated

Google Sites

Report abuse

Validating the TALUS AI Mental Fitness Test: A Validation Study Using Gemini LLM Emulation