Metadata Schema
The Impact Data and Evidence Aggregation Library (IDEAL) aims to present consistent information on treatment effects from randomized controlled trials (RCTs) conducted in the social sciences in low- and middle-income countries. While the goal is not to eliminate the need for meta-analytic researchers to read original studies, IDEAL will always present effect sizes that have been standardized across studies and information that will permit library users to restrict their comparisons by study attributes, such as context, experimental design, measurement, and intervention design and to understand what information relevant for evidence aggregation may be contained in the study, such as the estimation of heterogeneous treatment effects or the presence of a publicly available dataset.
To construct a set of minimum fields for this, the IDEAL team developed the meta-data schema through a desk-review and consulting the data collection tools used by researchers who had conducted meta-analysis, the fields used by existing evidence repositories (such as AidGrade and ClinicalTrials.gov), and guidelines used in evidence aggregation, such as PRISMA and GRADE.
The IDEAL Metadata Schema was developed to standardize the extraction of effect size data from randomized evaluations in international development. Despite the rapid growth of impact evaluations, the lack of consistent metadata formats has limited our ability to compare results across interventions, geographies, and outcomes. IDEAL fills this gap by providing a comprehensive schema—aligned with global standards like DDI, ClinicalTrials.gov, and the AEA RCT Registry— to systematically document key design features, intervention details, outcome measures, and treatment effects from published studies.
This schema is the backbone of the Impact Data and Evidence Aggregation Library (IDEAL). By enabling consistent and structured data extraction, IDEAL supports large-scale meta-analyses, evidence reviews, and policy synthesis efforts. The schema is openly available here for public use, including researchers, evidence aggregators, and developers building tools to advance evidence-informed policymaking in low- and middle-income countries.
1. Title and Publication Details
This section captures basic bibliographic information about the study, including the full title, citation, and DOI or permanent URL. These details allow coders to link each extracted record to its original source.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Title | Paper | DDI 2.5: Full authoritative title for the work at the appropriate level: marked-up document; marked-up document source; study; other material(s) related to study description; other material(s) related to study. The study title will in most cases be identical to the title for the marked-up document. A full title should indicate the geographic scope of the data collection as well as the time period covered. Title of data collection (codeBook/stdyDscr/citation/titlStmt/titl) maps to Dublin Core Title element. This element is required in the Study Description citation. Clinical Trials: Official Title: The title of the clinical study, corresponding to the title of the protocol. | The full title of the paper. | Open text | — | - Can be automatically pulled into SurveyCTO. - Include subtitle in the coding instruction. - Tag if experiment was reported elsewhere. |
| Citation | Paper | DDI 2.5: bib_citation: Complete bibliographic reference containing all of the standard elements of a citation that can be used to cite the study. The bib_citation_format is provided to enable specification of the particular citation style used, e.g., APA, MLA, or Chicago. Clinical Trials: Citation: A bibliographic reference in NLM's MEDLINE format. | Complete bibliographic reference. | Open text | — | Can be automatically pulled into SurveyCTO. |
| DOI/URL | Paper | DDI 2.5: holdings: Information concerning either the physical or electronic holdings of the cited work. Attributes include: location--The physical location where a copy is held; callno--The call number for a work at the location specified; and URI--A URN or URL for accessing the electronic copy of the cited work. | DOI or permanent URL to access the paper. | Open text | — | Can be automatically pulled into SurveyCTO. |
2. Resources
This section tracks the supporting materials available for the study, such as datasets, analytic code, protocols, and ethics documentation.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Type of resource | Paper | Clinical Trials: Available IPD/Information Type: The type of data set or supporting information being shared. (Individual Participant Data Set, Study Protocol, Statistical Analysis Plan, Informed Consent Form, Clinical Study Report, Analytic Code, Other (specify)) DDI 2.5: otherMat: This section allows for the inclusion of other materials that are related to the study as identified and labeled by the DTD/Schema users (encoders). The' materials may be entered as PCDATA (ASCII text) directly into the document (through use of the "txt" element). This section may also serve as a "container" for other electronic materials such as setup files by providing a brief description of the study-related materials accompanied by the attributes "type" and "level" defining the material further. The "URI" attribute may be used to indicate the location of the other study-related materials. Other Study-Related Materials may include: questionnaires, coding notes, SPSS/SAS/Stata setup files (and others), user manuals, continuity guides, sample computer software programs, glossaries of terms, interviewer/project instructions, maps, database schema, data dictionaries, show cards, coding information, interview schedules, missing values information, frequency files, variable maps, etc. | The type of resource being shared. | Text-CV | a. Published data or data repository entry
b. Analytic code c. Statistical Analysis Plan e. Populated pre-analysis plan f. Informed Consent Form or Consent Language g. Research ethics documentation (IRB protocol, structured ethics appendix, description of consent process, etc.) h. Clinical Study Report i. Study Protocol g. Academic publication or working paper h. Other (specify) |
Used to identify shared data/code documentation for replication. |
| URL of resource | Study | Clinical Trials: Available IPD/Information URL: The web address used to request or access the data set or supporting information. DDI 2.5: holdings: Information concerning either the physical or electronic holdings of the cited work. Attributes include: location--The physical location where a copy is held; callno--The call number for a work at the location specified; and URI--A URN or URL for accessing the electronic copy of the cited work. | URL to access the resource. Permanent URL (e.g. DOI) if available. | Open text | — | Permanent URLs like DOIs preferred. |
| Registration ID | Study | Clinical Trials: An identifier(s) (ID), if any, other than the organization's Unique Protocol Identification Number or the NCT number that is assigned to the clinical study. This includes any unique clinical study identifiers assigned by other publicly available clinical trial registries. If the clinical study is funded in whole or in part by a U.S. Federal Government agency, the complete grant or contract number must be submitted as a Secondary ID. AEA Registry: Secondary Identifying Numbers: An identifier(s), if any, other than the DOI or the AEARCT ID that is assigned to the trial. This includes any unique identifiers assigned by other publicly available trial registries, funders, or sponsors (e.g. ClinicalTrials.gov, ISRCT, etc.) | Registration ID (unique identifier issued by the organization where the trial is registered). | Open text | — | Used to cross-reference external trial registries. |
| Registry URL | Study | AEA Registry: Links to any other related websites, documents, etc. DDI 2.5: holdings: Information concerning either the physical or electronic holdings of the cited work. Attributes include: location--The physical location where a copy is held; callno--The call number for a work at the location specified; and URI--A URN or URL for accessing the electronic copy of the cited work. | Permanent URL or DOI to access the registration. | Open text | — | Used to verify pre-registration details. |
| Review board name | Study | Clinical Trials: Full name of the approving human subjects review board. | The full name of the approving ethics review board. | Open text | — | Used to assess ethical oversight. |
| Review number | Study | Clinical Trials: Number assigned by the human subjects review board upon approval of the protocol. May be omitted if status is anything other than approved. | The identifying number assigned by the ethics review board upon the approval of the study's protocol. | Open text | — | May be omitted if protocol not approved. |
3. Partners and Funders
This section records the organizations involved in funding, designing, or implementing the study.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Name of agency involved | Study | Clinical Trials: CollaboratorName: Other organizations (if any) providing support. Support may include funding, design, implementation, data analysis or reporting. The responsible party is responsible for confirming all collaborators before listing them. | Name of the organization that supported or contributed to the study, including institutional affiliations of study authors. | Open text | — | Should include institutional affiliations. |
| Role of agencies | Study | No standard | Role(s) of the organization in the study. | Text-CV | Design, funding, data collection, etc. | CV for testing:
- intervention implementation - study recruitment - data collection implementation - data provision - research design or analysis - funding or material support - Other |
| Type of agency | Study | No standard | Type of organization | Text-CV | CV needs work - should include:
- Government agency - Statistical agency - For-profit organization - Non-profit, charitable foundation, or NGO - Independent academic or research institution - Other |
Pilot the CV with the "other, specify" option. |
4. Topics and Objectives
This section summarizes the substantive focus of the study. It includes both controlled topic classifications (e.g., development sector keywords) and author-selected keywords for flexible search and thematic grouping.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Topic of study | Study | DDI 2.5: topcClas: The classification field indicates the broad substantive topic(s) that the data cover. Library of Congress subject terms may be used here. The "vocab" attribute is provided for specification of the controlled vocabulary in use, e.g., LCSH, MeSH, etc. The "vocabURI" attribute specifies the location for the full controlled vocabulary. Maps to Dublin Core Subject element. Inclusion of this element in the codebook is recommended.
CESSDA Controlled Vocabulary for CESSDA Topic Classification (recommended by World Bank Metadata Guidelines) | The broad substantive topic(s) that the data cover. | Text-CV | CESSDA List of topics (codes) | Cardinality can be > 1. |
| Keywords | Study | No standard | List of keywords selected by the author. | Open text | — | Used for indexing and searchability. |
5. Sampling
This section provides key information about how the study population was selected, randomized, and analyzed. It includes the unit of randomization and analysis, subnational locations, and baseline demographics like gender and age
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Country | Treatment effect | DDI 2.5: nation: Indicates the country or countries covered in the file. Attribute "abbr" may be used to list common abbreviations; use of ISO country codes is recommended. Maps to Dublin Core Coverage element. Inclusion of this element is recommended. For forward-compatibility, DDI 3 XHTML tags may be used in this element. | The country or countries where the study was implemented. | Text-controlled vocabulary | ISO country codes, not reported, other | If single country study: automatically populate the country-entered study level to the treatment effect level. If multiple countries: enter the different countries at the treatment effect level.
Add option to mention “other” and “did not report” for papers. Pilot to assess how many papers are reporting pooled treatment effect for multi country RCT. |
| Sub-national location | Treatment effect | DDI 2.5: geogCover: Information on the geographic coverage of the data. Includes the total geographic scope of the data, and any additional levels of geographic coding provided in the variables. Maps to Dublin Core Coverage element. Inclusion of this element in the codebook is recommended. Fpor forward-compatibility, DDI 3 XHTML tags may be used in this element. | Geographic location sampled within the country. | Open text (pilot) | — | If multiple countries are selected, the coder would enter the different types of locations at the treatment effect level.
We can give examples such as villages in Busia, four cities in Indonesia, etc. |
| Unit of randomization | Arm | AEA Registry: Randomization unit: This field describes the level at which randomization will take place: (e.g., individual, firm, school, experimental sessions). If there are more than one level of randomization, it should be explained (e.g. group level randomization for some treatment, and individual randomization for some treatments).
Clinical Trials: Type of Units Assigned: If assignment is based on a unit other than participants, a description of the unit of assignment (for example, eyes, lesions, implants). Limit: 40 characters. |
Unit at which randomization was done and treatment assigned. | Text-controlled vocabulary and open-text | See CV A in appendix B of Cavanagh et al (2023) . | Start with controlled vocabulary from Cavanagh and amend in the pilot. |
| Unit of analysis | Treatment effect | Clinical Trials: Type of Units Analyzed: If the analysis is based on a unit other than participants, a description of the unit of analysis (for example, eyes, lesions, implants). | Unit at which treatment effect was measured. | Text-controlled vocabulary and open-text | See CV A in appendix B of Cavanagh et al (2023) . | Start with controlled vocabulary from Cavanagh and amend in the pilot.
Pilot to assess cases where unit of randomization is different from unit of analysis. |
| Eligibility of experimental population | Study | Clinical Trials: Eligibility criteria: A limited list of criteria for selection of participants in the clinical study, provided in terms of inclusion and exclusion criteria and suitable for assisting potential participants in identifying clinical studies of interest. Use a bulleted list for each criterion below the headers "Inclusion Criteria" and "Exclusion Criteria".
DDI 2.5: universe: The group of persons or other elements that are the object of research and to which any analytic results refer. Age, nationality, and residence commonly help to delineate a given universe, but any of a number of factors may be involved, such as sex, race, income, veteran status, criminal convictions, etc. The universe may consist of elements other than persons, such as housing units, court cases, deaths, countries, etc. In general, it should be possible to tell from the description of the universe whether a given individual or element (hypothetical or real) is a member of the population under study. A "level" attribute is included to permit coding of the level to which universe applies, i.e., the study level, the file level (if different from study), the record group, the variable group, the nCube group, the variable, or the nCube level. The "clusion" attribute provides for specification of groups included (I) in or excluded (E) from the universe. |
Describes the criteria for eligibility of units in the study, provided in terms of inclusion and exclusion criteria. | Open text | — | Enter for:
* Unit of randomization * Unit of analysis (This is why cardinality is 1..n) Eligibility, recruitment, sampling, and deviation fields all to be automatically triaged for verification. Open text for the pilot with responses such as – income, age, urban/rural. |
| Recruitment of experimental sample | Study or arm | Clinical Trials: Recruitment details: Key information relevant to the recruitment process for the overall study, such as dates of the recruitment period and types of location (For example, medical clinic), to provide context. | Description of the recruitment process for the overall study. | Open text | — | Enter for:
* Unit of randomization * Unit of analysis Eligibility, recruitment, sampling, and deviation fields all to be automatically triaged for verification. Add a field to ask first if it was random or not, or mention in the code protocol to specified if it was random or not in this field. Random recruitment – Open text detail for the pilot Non-random recruitment – Open text detail for the pilot with recruitment drives, applications, media announcements |
| Sampling from recruited | Treatment effect | Clinical Trials: Allocation: The method by which participants are assigned to arms in a clinical trial.
N/A (not applicable): For a single-arm trial Randomized: Participants are assigned to intervention groups by chance Nonrandomized: Participants are expressly assigned to intervention groups through a non-random method, such as physician choice |
Describes how the unit was sampled from the eligible population/universe (after recruitment, if any). | Text-controlled vocabulary | Randomized, Non-randomized, N/A | Enter for:
* Unit of randomization * Unit of analysis Eligibility, recruitment, sampling, and deviation fields all to be automatically triaged for verification. |
| Deviation from sample inclusion | Treatment effect | Clinical Trials: Pre-assignment Details: Description of significant events in the study (for example, wash out, run-in) that occur after participant enrollment, but prior to assignment of participants to an arm or group, if any. For example, an explanation of why enrolled participants were excluded from the study before assignment to arms or groups. | Description of deviation from the sample, prior to assignment of participants to an arm or group, if any. | Open text | — | Enter for:
* Unit of randomization * Unit of analysis Open text for the pilot with information on units that dropped from the sampling after inclusion (To assess how many papers report this) Eligibility, recruitment, sampling, and deviation fields all to be automatically triaged for verification. |
6. Intervention
This section describes the intervention(s) tested in the study, including their content, assignment strategy, and duration. By mapping interventions to arms, this section supports the clear identification of experimental conditions.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Name of intervention | Treatment arm | Clinical Trials: Intervention Name: A brief descriptive name used to refer to the intervention(s) studied in each arm of the clinical study. A non-proprietary name of the intervention must be used, if available. If a non-proprietary name is not available, a brief descriptive name or identifier must be used. | A brief descriptive name used by the paper to refer to each intervention arm. | Open text | — | Build a controlled vocabulary for interventions after pilot with an "other" option. |
| Details of intervention | Treatment arm | Clinical Trials: Intervention Description: Details that can be made public about the intervention, other than the Intervention Name(s) and Other Intervention Name(s), sufficient to distinguish the intervention from other, similar interventions studied in the same or another clinical study. For example, interventions involving drugs may include dosage form, dosage, frequency, and duration.
AEA Registry: This field provides a detailed summary of the intervention(s): what is the intervention(s), who is the target group of the intervention(s), and what is the intended purpose(s). This is a public field viewable to all, once the trial is registered. |
A detailed summary of the intervention for each intervention arm that helps to distinguish one intervention arm from another intervention arm. | Open text | — | Includes dosage, duration, method, etc. |
| Name of arm | Treatment arm | Clinical Trials:
Arm/Group Title: Descriptive label used to identify each arm or comparison group. Note: "Arm" means a pre-specified group or subgroup of participant(s) in a clinical trial assigned to receive specific intervention(s) (or no intervention) according to a protocol. |
Descriptive label used in the paper to identify each intervention arm. | Open text | — | Pilot to assess how papers report this. |
| Number of randomization units per arm | Treatment arm | No standard | The number of randomization units assigned to each intervention arm across all periods/phases. | Numeric | — | Pilot to assess how papers report this. |
| Mapping of intervention to arms | Treatment arm | Clinical Trials: Arm or Group/Interventional Cross-Reference: If multiple Arms or Groups have been specified, indicate which Interventions (or exposures) are in each Arm or Group of the study, using the Cross-Reference check boxes. | The link between interventions and arms. | Format | — | — |
| Intervention assignment strategy | Study | Clinical Trials: Interventional Study Model: The strategy for assigning interventions to participants.
-Single Group: Clinical trials with a single arm -Parallel: Participants are assigned to one of two or more groups in parallel for the duration of the study -Crossover: Participants receive one of two (or more) alternative interventions during the initial phase of the study and receive the other intervention during the second phase of the study -Factorial: Two or more interventions, each alone and in combination, are evaluated in parallel against a control group -Sequential: Groups of participants are assigned to receive interventions based on prior milestones being reached in the study, such as in some dose escalation and adaptive design studies |
Description of strategy used in the paper to assign participants to each intervention arm. | CV | Simple Randomization;
Complete Randomization; Stratified Randomization; Cluster Randomization; Other |
Pilot to assess how this is reported in the paper. |
| Timeline of intervention | Treatment arm | AEA Registry: Intervention Start Date: The date that the intervention starts, or the date by which you expect the intervention to start. In the case in which you have several interventions that start at different times, this field gives you the lower time limit of all of them.
Field Type: date (YYYY/MM/DD) Intervention End Date: The date that the intervention ends, or the planned end date of administering the intervention. In the case in which you have several interventions that end at different times, this field gives you the upper time limit of all of them. Field Type: date (YYYY/MM/DD) |
The time period defined by the start date of intervention and the end date of intervention. | Start and end date format | — | Pilot to assess whether dates are reported more or month/year estimates. |
7. Outcome
This section documents all outcome measures used to estimate treatment effects, including their names, tools, categories, and statistical properties.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Name of the outcome measure | Outcome | Clinical Trials: Outcome Measure Title: Name of the specific outcome measure. | The name the author(s) uses for the outcome measure. | Open text | — | Should match terminology used in results section. |
| Outcome measure category | Outcome | Clinical Trials: Outcome Measure Description: Additional information about the outcome measure, including a description of the metric used to characterize the specific outcome measure, if not included in the Outcome Measure Title. | Description of the metric used to characterize the specific outcome measure, if not included in the Outcome Measure Title. | Controlled vocabulary: Group to build.
Examples such as: behavioral score, anthropometric measure, school test score |
||
| Outcome measurement tool | Outcome | Clinical Trials: Outcome Measure Description: Additional information about the outcome measure, including a description of the metric used to characterize the specific outcome measure, if not included in the Outcome Measure Title. | Name of the specific tool used for measurement. | Open text | — | Add citation of the tool used if available. |
| Outcome measure statistical type | Treatment effect | No standard: see notes for potential definition from Lipsey and Wilson (2000) | Indicates type of outcome measure: binary, continuous, categorical. | Controlled vocabulary: inherently dichotomous (proportions, odds ratio), inherently continuous (continuous, artificially dichotomized), dichotomous (probit), dichonomous (logit) | Lipsey and Wilson (2000). Practical Meta-Analysis. SAGE publications, Inc.
Figure 3.1 (pp. 58) Effect size decision tree for studies involving group contrasts on dependent variables Table 3.2 (pp. 72) Effect size, standard erro and inverse variance weight formulas for each effect size type |
|
| Outcome measure standardized | Outcome or treatment effect | No standard | Indicates whether the outcome measure is standardized. | Yes/No/Don't know | — | — |
| Standardization description | Outcome or treatment effect | No standard | Indicates how the outcome measure was standardized. | Open text extracted from the paper for the pilot. | >Pilot to assess how this is reported in papers. | Pilot to assess how this is reported in papers. |
| Outcome measure coding | Outcome | No standard | Indicates whether the outcome suggests an overall positive and desirable impact according to the author. | Select one: Yes/No/Ambiguous/Don't know | >Pilot to assess how this is reported in papers. | Pilot to assess how this is reported in papers. |
| Outcome data source | Outcome | DDI 2.5: dataKind: The type of data included in the file: survey data, census/enumeration data, aggregate data, clinical data, event/transaction data, program source code, machine-readable text, administrative records data, experimental data, psychological test, textual data, coded textual, coded documents, time budget diaries, observation data/ratings, process-produced data, etc. This element maps to Dublin Core Type element. The type attribute can be used for forward-compatibility with DDI 3, by providing a type for use of controlled vocabulary, as this is descriptive in DDI 2 and CodeValue in DDI 3. | Describes type of data used to measure the outcome. | Controlled variable: survey, census, etc. (Needs to be developed) or DDI CV: Sample survey data, Census/enumeration data, Administrative records data, Aggregate data, Clinical data, Event/transaction data, Observation data/ratings, Process-produced data, Time-budget diaries, Choice experiment for preference elicitation, Economic games with participant interaction, Measurement and tests, Textual data, Others | Sample survey data, Census/enumeration data, Administrative records data, Aggregate data, Clinical data, Event/transaction data, Observation data/ratings, Process-produced data, Time-budget diaries, Choice experiment for preference elicitation, Economic games with participant interaction, Measurement and tests, Textual data, Others | Pilot to assess whether data sources are clearly reported. |
| Outcome measure collection start date | Outcome | DDI 2.5: collDate: Contains the date(s) when the data were collected. Use the event attribute to specify "start", "end", or "single" for each date entered. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the "date" attribute. The "cycle" attribute permits specification of the relevant cycle, wave, or round of data. Maps to Dublin Core Coverage element. Inclusion of this element in the codebook is recommended. | Specifies start date of data collection for the outcome. | Open text | — | Coding protocol:
*Add to the data source roster *Add option of not provided |
| Outcome measure collection end date | Outcome | DDI 2.5: collDate: Contains the date(s) when the data were collected. Use the event attribute to specify "start", "end", or "single" for each date entered. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the "date" attribute. The "cycle" attribute permits specification of the relevant cycle, wave, or round of data. Maps to Dublin Core Coverage element. Inclusion of this element in the codebook is recommended. | Specifies end date of data collection for the outcome. | Open text | — | Coding protocol:
*Add to the data source roster *Add option of not provide |
| Outcome mean | Outcome*Treatment Arm*round | Clinical Trials: Measure Type: The type of data for the outcome measure. Select one. (Count of Participants, Mean, Median, Least Squares Mean, Geometric Mean, Geometric Least Squares Mean, Number, Count of Units) | Specifies the reported outcome quantity such as average, size, or count. | Numeric | Select one. (Count of Participants, Mean, Median, Least Squares Mean, Geometric Mean, Geometric Least Squares Mean, Number, Count of Units) | Coding protocol should list what other information needs to be extracted if all three (mean, SD, N) not available. |
| Outcome SD | Outcome*Treatment Arm*round | Clinical Trials: Measure of Dispersion/Precision Select one. Not Applicable (only if Measure Type is "Number," "Count of Participants," or "Count of Units"), Standard Deviation, Standard Error, Inter-Quartile Range, Full Range, 80% Confidence Interval, 90% Confidence Interval, 95% Confidence Interval, 97.5% Confidence Interval, 99% Confidence Interval, Other Confidence Interval Level, Geometric Coefficient of Variation (only when Measure Type is "Geometric Mean") | Specifies amount of dispersion in the outcome. | Numeric | Select one. Not Applicable (only if Measure Type is "Number," "Count of Participants," or "Count of Units"), Standard Deviation, Standard Error, Inter-Quartile Range, Full Range, 80% Confidence Interval, 90% Confidence Interval, 95% Confidence Interval, 97.5% Confidence Interval, 99% Confidence Interval, Other Confidence Interval Level, Geometric Coefficient of Variation (only when Measure Type is "Geometric Mean") | Coding protocol should list what other information needs to be extracted if all three (mean, SD, N) not available. |
| Outcome sample size | Outcome*Treatment Arm*round | Clinical Trials: Number of Participants Analyzed: The number of participants analyzed for the outcome measure in the row and for each arm/group, if different from the overall Number of Participants Analyzed.
Number of Units Analyzed: The number of units analyzed for the outcome measure in the row and for each arm/group, if different from the overall Number of Units Analyzed. |
Number of units analyzed to measure the outcome. | Numeric | — | Coding protocol should list what other information needs to be extracted if all three (mean, SD, N) not available. |
| Additional information for clustered RCTs | Outcome*Treatment Arm*round*cluster | AEA Registry: Was the treatment clustered: This field asks whether the treatment was clustered into groups for the randomization (as opposed to being randomized by individual subjects).
○ Yes; No Planned Number of Clusters (unit of randomization): This field describes how many groupings or clusters will be sampled, and asks to define the cluster unit (e.g. 200 schools). Sample size (or number of clusters) by treatment arms: This field describes the sample size or clusters by treatment arms. Arm type(s) identify the role of the intervention that participants receive. Types of arms include experimental arm(s) and no intervention arm. For example, the trial includes 50 schools in the control, 50 schools receive the teacher training treatment, 50 schools receive the scholarship treatment, and 50 schools receive both treatments. |
Additional details on treatment clustering, if applicable. | Numeric: In addition to mean, sd, sample size by cluster for each arm, for each wave, a subset of the following information:
*Number of clusters *Number of observations per cluster *Cluster robust standard error *Intra-cluster correlation (or statistics to retrieve ICC) *Degree of freedom (for small number of clusters) |
Coding protocols to set up a priority list and stopping rule. All of this information is not required.
Pilot to see how widely ICC (or necessary stats to retrieve ICC) is reported in papers. |
Coding protocols to set up a priority list and stopping rule. All of this information is not required.
Pilot to see how widely ICC (or necessary stats to retrieve ICC) is reported in papers |
8. Estimates
This section captures the effect sizes reported by the authors, along with information about the estimation strategy, statistical tests, precision, and subgroup analyses.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Estimation parameter | Treatment effect | Clinical Trials: Estimation parameter: Select one. (Cox Proportional Hazard, Hazard Ratio (HR), Hazard Ratio, Log Mean Difference (Final Values), Mean Difference (Net), Median Difference (Final Values), Median Difference (Net), Odds Ratio (OR), Odds Ratio, Log, Risk Difference (RD), Risk Ratio (RR), Risk Ratio, Log Slope, Other) Other Parameter Name: The name of the estimation parameter, if \"Other\" Estimation Parameter is selected. Estimated Value: The calculated value for the estimation parameter. |
The name of the estimation parameter used to calculate the treatment effect. | Text-controlled vocabulary | Cox Proportional Hazard, Hazard Ratio (HR), Log Hazard Ratio, Log Mean Difference (Final Values), Mean Difference (Net), Median Difference (Final Values), Median Difference (Net), Odds Ratio (OR), Odds Ratio, Log, Risk Difference (RD), Risk Ratio (RR), Risk Ratio, Log Slope, cohen'd, hedges' g, other(specify) | Use the value options in Clinical Trials for pilot and refine the CV afterwards. Some options are more common and some are specific to clinical trials. Pilot the stopping rule: starting with the standard formulas for effect sizes to see how much information papers usually report. |
| Estimand | Treatment effect | No standard / [Standard found in an addeddum to Clinical Trials E9(R1)].
Clinical Trials E9(R1): An estimand is a precise description of the treatment effect reflecting the clinical question posed by a given clinical trial objective. It summarizes at a population level what the outcomes would be in the same patients under different treatment conditions being compared. |
Description of the treatment effect under which the outcome is being estimated. | Text-controlled vocabulary | Intent-to-Treat (ITT), Treatment-on-the-Treated (TOT), not stated (implied TOT), not stated (implied ITT) | Can add \"Not sure/Not stated\" to the pilot CV. |
| Null hypothesis | Treatment effect | Clinical Trials: Type of statistical test: Identifies the type of analysis (Superiority, Non-inferiority, Equivalence, Other (for example, single group or other descriptive analysis), Non-Inferiority or Equivalence (legacy selection), Superiority or Other (legacy selection)). |
The mathematical expression of the null hypothesis being tested for the treatment effect. | Text-controlled vocabulary | Null=1, Null=0, Null>0, Null<0, Null=constant | This is often not reported explicitly in a paper. Coders may need additional training on how to identify the null hypothesis. |
| Statistical test of hypothesis | Treatment effect | Clinical Trials: Method: The statistical test used to calculate the p-value, if a P-Value is reported. Select one. (ANCOVA, ANOVA, Chi-Squared, Chi-Squared (Corrected), Cochran-Mantel-Haenszel, Fisher Exact, Kruskal-Wallis, Log Rank, Mantel Haenszel, McNemar, Mixed Models Analysis, Regression (Cox, Linear, Logistic), Sign Test, t-Test (1-Sided, 2-Sided), Wilcoxon (Mann-Whitney), Other). |
The statistical test used to estimate the treatment effect. | Text-controlled vocabulary | ANCOVA, ANOVA, Chi-Squared, Chi-Squared (Corrected), Cochran-Mantel-Haenszel, Fisher Exact, Kruskal-Wallis, Log Rank, Mantel Haenszel, McNemar, Mixed Models Analysis, Regression (Cox, Linear, Logistic), Sign Test, t-Test (1-Sided, 2-Sided), Wilcoxon (Mann-Whitney), Other (specify). | Use the value options in Clinical Trials for pilot and refine the CV afterwards. Some options are more common and some are specific to clinical trials. |
| Contrast for the treatment effect | Treatment effect | Clinical Trials: When describing the Statistical Analysis: Comparison Group Selection: The arms or comparison groups involved in the statistical analysis (check all to indicate an \"omnibus\" analysis). |
Description of the treatment and control groups involved in the estimation of the treatment effect. | Open-text | — | Pilot to assess how this is reported in papers. Consider building controlled vocabularies using previous fields on treatment arms. |
| Estimate of the treatment effect | Treatment effect | Clinical Trials: Estimated Value: The calculated value for the estimation parameter. |
The reported numeric value for the estimation parameter of the treatment effect. | Numeric | — | Specify which “default” specification to include (e.g., no controls except strata indicators, or no controls except strata and baseline value). We also need a rule on what to do when coefficient is reported for raw values and then another column converts coefficient to d or \"effect size\". |
| Unit of measure of the treatment effect | Treatment effect | Clinical Trials: Unit of Measure: An explanation of what is quantified by the data (for example, participants, mm Hg), for each outcome measure. |
The unit of measure of the outcome estimated in the treatment effect. | Open text | — | A controlled vocabulary to be developed after the pilot. |
| Precision | Treatment effect | Clinical Trials: Measure of Dispersion/Precision. Select one. (Not Applicable (only if Measure Type is \"Number,\" \"Count of Participants,\" or \"Count of Units\"), Standard Deviation, Standard Error, Inter-Quartile Range, Full Range, 80% Confidence Interval, 90% Confidence Interval, 95% Confidence Interval, 97.5% Confidence Interval, 99% Confidence Interval, Other Confidence Interval Level, Geometric Coefficient of Variation (only when Measure Type is \"Geometric Mean\"). Other Confidence Interval Level: The numerical value for the confidence interval level, if \"Other Confidence Interval Level\" is selected. Provide a rationale for choosing this level in the Outcome Measure Description. |
The measures of precision reported for the treatment effect. | Text-controlled vocabulary | Consensus on creating a stopping rule for the minimum set based on all possible values available. Specify a preference priority. |
— |
| Precision value | Treatment effect | Clinical Trials: Measure of Dispersion/Precision. Select one. (Not Applicable (only if Measure Type is \"Number,\" \"Count of Participants,\" or \"Count of Units\"), Standard Deviation, Standard Error, Inter-Quartile Range, Full Range, 80% Confidence Interval, 90% Confidence Interval, 95% Confidence Interval, 97.5% Confidence Interval, 99% Confidence Interval, Other Confidence Interval Level, Geometric Coefficient of Variation (only when Measure Type is \"Geometric Mean\"). Other Confidence Interval Level: The numerical value for the confidence interval level, if \"Other Confidence Interval Level\" is selected. Provide a rationale for choosing this level in the Outcome Measure Description. |
The numeric value of the precision measures. | Numeric | — | — |
| Subgroup of heterogeneous analysis | Treatment effect | CONSORT: Description of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing pre-specified from exploratory. |
Variables used to characterize subgroups for heterogeneous analysis. | Text-controlled vocabulary | Age, gender, income, other (specify) | Develop a controlled vocabulary after pilot. Select all variables that apply for each analysis. For example, select both gender and income for an analysis by \"women with income > 100 USD per day\". |
| Description of heterogeneous analysis | Treatment effect | CONSORT: Description of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing pre-specified from exploratory. |
Description of the heterogeneous analysis performed by subgroup variables. | Open text | — | Describe the specifications used to estimate heterogeneous effects (such as interaction term or subsample) or any other information as needed. |
9. Quality and Robustness
This section captures how the study accounts for validity threats such as attrition, imbalance, and compliance, and whether key checks and covariates were reported in the analysis.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Attrition | Treatment arm*round | Clinical Trials: Per arm and per period: Started: Number of participants initiating the period. In the first period, it is the number of participants assigned to each arm or group. If assignment is based on a unit other than participants, also include the number of units at the beginning of the period. Completed: Number of participants at the end of the period. If assignment is based on a unit other than participants, also include the number of units at the end of the period. Not Completed (calculated automatically): Number of participants (and units, if applicable) that did not complete the study or period. This is calculated automatically by subtracting Completed from Started. |
Share of units of analysis that dropped out of study after treatment assignment, per arm and round. | Numeric | — | This incorporates two different concepts: survey nonresponse and item nonresponse. In the pilot, we should monitor how to address this. We should also monitor whether we might need to ask a question "Is attrition reported?" Some papers may not report attrition but still experience it. There are papers where: (1) there is no attrition, (2) there is attrition but it is not reported, and (3) there is attrition and it is reported. |
| Balance test | Study | No standard | Indicates whether authors include a balance table of treatment conditions at baseline. | Select one (Yes/No/Not reported/Don't know) | — | Monitor in pilot if authors state treatment conditions are balanced but do not include a balance test. If this occurs, we may need to ask two separate questions in the data entry mask: (1) Do authors state treatment conditions are balanced? (2) Do they include a balance table? Also monitor how this connects with steps taken to address attrition. |
| Compliance Control | Treatment arm | CONSORT: Compliance measures the share of the control group who actually received the treatment. |
Share of control group who received any part of the treatment. | Numeric | — | We may need to break this into two questions in the data entry mask: (1) Did authors report compliance? (2) What was the share? |
| Compliance Treatment | Treatment arm | CONSORT: Compliance measures the share of the treatment group who actually received the treatment. |
Share of treatment group who received any part of the treatment. | Numeric | — | In the pilot, monitor the various ways this can be reported and think about how to standardize. |
10. Coding Tool Version
This section tracks the version of the metadata collection tool used to extract the study's metadata, to ensure traceability and reproducibility of entries over time.
| Field | Level | Relevant Standard | Definition | Response Options | Controlled Vocabulary | Notes |
|---|---|---|---|---|---|---|
| Coding tool name and version | Study | No standard | The name and version of the metadata collection tool used to populate a particular record. | CV | a. IDEAL 1.0 2024 | — |