Data Catalog

v2.81.1

Complete catalog of educational content available through the Common Core Crawl API. This includes standardized tests, test questions with QTI XML, educational images, and curriculum from open educational resources.

Data Origin Legend

Symbol Meaning
🌐Extracted - Directly obtained from web sources or external APIs
🤖AI-Generated - Created or enriched by LLM (GPT, Gemini)
📊Computed - Derived through deterministic processing pipelines

Data Summary

Standardized Tests
Test Questions
With QTI 3.0 XML
Educational Images
Courses

Standardized Tests

Our database contains standardized tests from multiple states and sources, including original PDFs and parsed markdown versions.

Test Content Coverage

Tests are available in two formats depending on processing status:

parsedPdf Format: AI-parsed test documents containing structured markdown with coordinates and bounding boxes for each text block, figure, and table. The format includes page-level chunks that can be concatenated to build complete markdown documents, plus metadata like page count and processing time.

Structure: chunks[]blocks[]content + coordinates. Each block has type (text/figure/table), content string, page number, and bounding box. Simply concatenate chunk.content fields to rebuild the full markdown.

By Subject

Subject Tests Percentage
Mathematics
English Language Arts
Science
Social Studies

Featured: Bluebonnet Tests

Bluebonnet tests from Texas (grades K-9) are available in full QTI 3.0 XML format with standards alignment:

State Test Coverage

Tests available from the following states:

Full QTI Tests by Subject/Grade/State

Breakdown of tests with complete QTI 3.0 XML coverage and AI validation status:

Test Questions

test questions parsed from standardized tests with rich metadata and standards alignment.

With QTI 3.0 XML
With Standards Tags

Question Metadata

Each question includes:

Query Capabilities

Educational Images

educational images extracted from standardized tests, analyzed with AI for concept extraction and tagging.

Image Types

Images classified by type:

AI Analysis

Each image includes GPT-5 Vision AI analysis with:

By Subject

Subject Images
Mathematics
Science
English Language Arts
Social Studies

Courses

courses from multiple providers, containing thousands of lessons with video, reading, and quiz content.

🌐 Extracted Sources

Khan Academy

courses - Middle school science with AI-enhanced alternate versions:

CK-12 FlexBooks

courses - Comprehensive science textbooks:

Bill of Rights Institute (BRI)

courses - Social studies content:

Coursewave

State-aligned curriculum extracted from Coursewave platform:

Crunchlabs

STEM video courses focused on engineering and maker education.

🤖 AI-Generated Sources

Incept Courses

⚠️ AI-GENERATED: Incept courses are created by AI for the Timeback platform.

AI-generated curriculum content including:

Alternate Courses (2HL Versions)

⚠️ AI-GENERATED: Alternate courses are AI-remixed versions of extracted courses.

AI-transformed versions of existing courses for content variety:

Course Structure

Each course contains:

Knowledge Graph 🤖

⚠️ AI-GENERATED CONTENT: All Knowledge Graph atoms are created by LLM from standards decomposition.

Atoms Collection

Atomic learning objectives generated from educational standards decomposition. Each atom represents the smallest unit of teachable knowledge.

Atom Structure

Difficulty Rubrics 🤖

Each atom includes AI-generated difficulty rubrics describing what Easy, Medium, and Hard looks like:

Use Cases

Exemplar Questions 🤖

⚠️ AI-ENRICHED CONTENT: Original questions from tests, but enrichments (difficulty, feedback) are AI-generated.

ExemplarQuestions Collection

Enriched versions of test questions that have been through the exemplar pipeline for enhanced difficulty tagging and feedback generation.

Enrichment Data (AI-Generated)

Enrichment Pipeline Stages

  1. Image Migration: Base64 images converted to S3 URLs
  2. Difficulty Tagging: Classified using Knowledge Graph atoms
  3. Feedback Generation: Inline hints and solution explanations
  4. LLM Validation: Quality assurance via AI review

Pipeline Tracking

Each exemplar question tracks which stages have been completed:

Available Queries

Tests

Test Questions

Images

Courses

Knowledge Graph (AI-Generated)

Exemplar Questions (AI-Enriched)

Data Sources

🌐 Extracted Sources

Bluebonnet Learning

high-quality elementary and middle school tests (K-9) from Texas, all with full QTI 3.0 XML and standards alignment. Perfect for creating assessment banks aligned to Texas standards.

State Departments of Education

released test items from NY, TX, MA, FL, CO, AZ, TN, MI, and NAEP. These are official state standardized test questions made publicly available for review.

Khan Academy

Middle school science courses with lessons, videos, articles, and quizzes.

CK-12 Foundation

FlexBook science textbooks providing comprehensive open educational resources for science education.

Bill of Rights Institute

Social studies courses covering history, geography, and civics topics.

Coursewave

State-aligned curriculum from Louisiana and Texas education systems.

Crunchlabs

STEM video courses for engineering and maker education.

🤖 AI-Generated Content

Incept Courses

AI-created curriculum content for the Timeback platform with readings, quizzes, and standards alignment.

Knowledge Graph Atoms

Learning objectives decomposed from educational standards with difficulty rubrics and prerequisite relationships.

Exemplar Question Enrichments

Difficulty classifications, inline feedback, and solution explanations generated for test questions.

Alternate/2HL Courses

AI-remixed versions of extracted courses with rewritten content and question variations.

QTI XML Generation

Structured question format converted from HTML/PDF using GPT-5.

Image Analysis

Concepts, descriptions, tags, and educational filtering via GPT-5 Vision.

Static DI Content (BrainLift)

Simplified instructional content from interactive Khan Academy articles.

How to Access

Ready to explore? Use our GraphQL Playground to start querying the data.

Open GraphQL Playground Download SDL Schema Download JSON Schema

Authentication

Include your API key in the x-api-key header with every request.

Example Queries

Visit the GraphQL Playground to see example queries for all data types, or check our main documentation page for integration guides.