Data Catalog
v2.81.1Complete catalog of educational content available through the Common Core Crawl API. This includes standardized tests, test questions with QTI XML, educational images, and curriculum from open educational resources.
Data Origin Legend
| Symbol | Meaning |
|---|---|
| 🌐 | Extracted - Directly obtained from web sources or external APIs |
| 🤖 | AI-Generated - Created or enriched by LLM (GPT, Gemini) |
| 📊 | Computed - Derived through deterministic processing pipelines |
Data Summary
Standardized Tests
Our database contains standardized tests from multiple states and sources, including original PDFs and parsed markdown versions.
Test Content Coverage
Tests are available in two formats depending on processing status:
- Full QTI Coverage: tests where every question has been converted to QTI 3.0 XML format, making them fully machine-readable and ready for assessment platforms
- Parsed PDF (AI-ready): tests with structured markdown from AI document parsing, ideal for LLM processing and analysis
parsedPdf Format: AI-parsed test documents containing structured markdown with coordinates and bounding boxes for each text block, figure, and table. The format includes page-level chunks that can be concatenated to build complete markdown documents, plus metadata like page count and processing time.
Structure: chunks[] → blocks[] → content + coordinates. Each block has type (text/figure/table), content string, page number, and bounding box. Simply concatenate chunk.content fields to rebuild the full markdown.
By Subject
| Subject | Tests | Percentage |
|---|---|---|
| Mathematics | ||
| English Language Arts | ||
| Science | ||
| Social Studies |
Featured: Bluebonnet Tests
Bluebonnet tests from Texas (grades K-9) are available in full QTI 3.0 XML format with standards alignment:
- Math (K-9): tests covering all elementary and middle school grades
- ELA (K-5): tests covering early elementary reading and language arts
State Test Coverage
Tests available from the following states:
- New York: tests (Math, ELA, Science, Social Studies, grades 3-11)
- Texas: tests including Bluebonnet + STAAR (grades 3-11)
- Massachusetts: tests (MCAS, grades 3-10)
- Florida: tests (grades 3-10)
- Colorado: tests (grades 3-11)
- Other states: Arizona, Tennessee, Michigan
- National: NAEP Science (grades 4, 8, 12)
Full QTI Tests by Subject/Grade/State
Breakdown of tests with complete QTI 3.0 XML coverage and AI validation status:
Test Questions
test questions parsed from standardized tests with rich metadata and standards alignment.
Question Metadata
Each question includes:
- QTI 3.0 XML: Machine-readable question format for assessment platforms
- Standards Alignment: TEKS, NGSS, CCSS tags
- Cognitive Rigor: DOK (Depth of Knowledge) and Bloom's taxonomy levels
- Interaction Types: Multiple choice, constructed response, hotspot, etc.
- Estimated Time: Completion time in seconds
- Media Types: Images, diagrams, tables, graphs
Query Capabilities
- Search by exact standard codes (e.g., "8.EE.A.1", "MS-LS1-1")
- Search by standard prefixes (e.g., "MS-LS1" for all middle school life science)
- Filter by cognitive rigor level (1-24 scale combining DOK × Bloom)
- Filter by DOK level (1-4) or Bloom level
- Get highest-rigor question per standard for assessment banks
- Filter by QTI interaction type (choice, text, hotspot, etc.)
- Search across state, grade, subject, language
Educational Images
educational images extracted from standardized tests, analyzed with AI for concept extraction and tagging.
Image Types
Images classified by type:
- Graphs: Line graphs, bar graphs, scatter plots with axes
- Diagrams: Labeled diagrams showing parts and relationships
- Tables: Tabular data from tests
- Charts: Pie charts, organizational charts
- Maps: Geographic and spatial representations
- Photographs: Real-world images for observation questions
- Illustrations: Drawings and artistic representations
AI Analysis
Each image includes GPT-5 Vision AI analysis with:
- Concept: Main educational concept (2-5 words)
- Detailed Description: 150-300 word analysis of what's shown
- Subject Branch: Specific academic field (e.g., biology, algebra, geometry)
- Searchable Tags: 5-10 keywords for discovery
- Educational Filter: Excludes logos, branding, non-educational content
By Subject
| Subject | Images |
|---|---|
| Mathematics | |
| Science | |
| English Language Arts | |
| Social Studies |
Courses
courses from multiple providers, containing thousands of lessons with video, reading, and quiz content.
🌐 Extracted Sources
Khan Academy
courses - Middle school science with AI-enhanced alternate versions:
- Earth and Space Science: Original + 2HL AI-enhanced version
- Biology: Original + 2HL AI-enhanced version
- Physics: Original + 2HL AI-enhanced version
- Additional MS science courses
CK-12 FlexBooks
courses - Comprehensive science textbooks:
- Various science subjects across grade levels
Bill of Rights Institute (BRI)
courses - Social studies content:
- History, geography, and civics courses
Coursewave
State-aligned curriculum extracted from Coursewave platform:
- Louisiana (LA): Math, ELA, Science, Social Studies
- Texas (TX): State-aligned courses
Crunchlabs
STEM video courses focused on engineering and maker education.
🤖 AI-Generated Sources
Incept Courses
⚠️ AI-GENERATED: Incept courses are created by AI for the Timeback platform.
AI-generated curriculum content including:
- Reading materials with simplifiedHtml segments
- Quiz questions with QTI XML format
- Standards-aligned structure
Alternate Courses (2HL Versions)
⚠️ AI-GENERATED: Alternate courses are AI-remixed versions of extracted courses.
AI-transformed versions of existing courses for content variety:
- Reading rewrites preserving learning objectives
- Question variations maintaining standards alignment
- Improved readability and accessibility
Course Structure
Each course contains:
- Units: Organizational chapters/modules
- Lessons: Individual learning sessions
- Content: Videos, articles, quizzes with standards alignment
- Questions: Quiz questions with QTI XML format
Knowledge Graph 🤖
⚠️ AI-GENERATED CONTENT: All Knowledge Graph atoms are created by LLM from standards decomposition.
Atoms Collection
Atomic learning objectives generated from educational standards decomposition. Each atom represents the smallest unit of teachable knowledge.
Atom Structure
- atomId: Unique identifier (e.g., "5-PS1-3_atom_2")
- title: Learning objective title
- description: Detailed objective description
- grades: Applicable grade levels
- supports: Standards this atom covers
- prerequisites: Prerequisite atom IDs for learning paths
- subject: Subject area (science, math, etc.)
- subjectBranch: Subdomain (biology, algebra, etc.)
Difficulty Rubrics 🤖
Each atom includes AI-generated difficulty rubrics describing what Easy, Medium, and Hard looks like:
- difficulty.easy: What constitutes an easy question for this concept
- difficulty.medium: What constitutes a medium question
- difficulty.hard: What constitutes a hard question
Use Cases
- Question difficulty tagging using atom rubrics
- Learning path construction via prerequisites
- Course generation aligned to specific learning objectives
Exemplar Questions 🤖
⚠️ AI-ENRICHED CONTENT: Original questions from tests, but enrichments (difficulty, feedback) are AI-generated.
ExemplarQuestions Collection
Enriched versions of test questions that have been through the exemplar pipeline for enhanced difficulty tagging and feedback generation.
Enrichment Data (AI-Generated)
- difficulty: Easy/Medium/Hard classification
- difficultyAtom: Knowledge Graph atom used for evaluation
- difficultyReasoning: LLM reasoning for difficulty assignment
- difficultyScore: Numeric difficulty (1-10 scale)
- qtiXmlWithFeedback: QTI XML with inline hints and solution blocks
Enrichment Pipeline Stages
- Image Migration: Base64 images converted to S3 URLs
- Difficulty Tagging: Classified using Knowledge Graph atoms
- Feedback Generation: Inline hints and solution explanations
- LLM Validation: Quality assurance via AI review
Pipeline Tracking
Each exemplar question tracks which stages have been completed:
pipelineStages.imagesMigratedpipelineStages.difficultyTaggedpipelineStages.feedbackGeneratedpipelineStages.validated
Available Queries
Tests
tests- Query all tests with filtering by state, grade, subject, sourcetest(id)- Get single test by IDtestSources- List all available test sources
Test Questions
testQuestions(testId)- Get all questions for a testtestQuestion(id)- Get single question by IDsearchTestQuestions- Advanced search with standards, rigor, QTI filterstestQuestionStandardsList- Discover all unique standards covered by available test questions with question countstestQuestionStandardSetsList- Browse standard sets (TEKS, NGSS, CCSS) with total question counts per set
Images
images- Browse images with filters (subject, grade, type, tags)image(id)- Get single image by IDsearchImages- Full-text search across concept, description, tags
Courses
courses- Query courses with filters (source, subject, grade)course(id)- Get single course by IDcourseByUrl(url)- Get course by URLlessons- Get lessons for a course or unitcontents- Get content items for a lessonquestions- Get questions for quiz content
Knowledge Graph (AI-Generated)
atoms- Query learning objective atoms with filtersatom(id)- Get single atom by IDatomByAtomId(atomId)- Get atom by unique atomId string
Exemplar Questions (AI-Enriched)
exemplarQuestions- Query enriched questions with difficulty filtersexemplarQuestion(id)- Get single exemplar question by ID
Data Sources
🌐 Extracted Sources
Bluebonnet Learning
high-quality elementary and middle school tests (K-9) from Texas, all with full QTI 3.0 XML and standards alignment. Perfect for creating assessment banks aligned to Texas standards.
State Departments of Education
released test items from NY, TX, MA, FL, CO, AZ, TN, MI, and NAEP. These are official state standardized test questions made publicly available for review.
Khan Academy
Middle school science courses with lessons, videos, articles, and quizzes.
CK-12 Foundation
FlexBook science textbooks providing comprehensive open educational resources for science education.
Bill of Rights Institute
Social studies courses covering history, geography, and civics topics.
Coursewave
State-aligned curriculum from Louisiana and Texas education systems.
Crunchlabs
STEM video courses for engineering and maker education.
🤖 AI-Generated Content
Incept Courses
AI-created curriculum content for the Timeback platform with readings, quizzes, and standards alignment.
Knowledge Graph Atoms
Learning objectives decomposed from educational standards with difficulty rubrics and prerequisite relationships.
Exemplar Question Enrichments
Difficulty classifications, inline feedback, and solution explanations generated for test questions.
Alternate/2HL Courses
AI-remixed versions of extracted courses with rewritten content and question variations.
QTI XML Generation
Structured question format converted from HTML/PDF using GPT-5.
Image Analysis
Concepts, descriptions, tags, and educational filtering via GPT-5 Vision.
Static DI Content (BrainLift)
Simplified instructional content from interactive Khan Academy articles.
How to Access
Ready to explore? Use our GraphQL Playground to start querying the data.
Open GraphQL Playground Download SDL Schema Download JSON Schema
Authentication
Include your API key in the x-api-key header with every request.
Example Queries
Visit the GraphQL Playground to see example queries for all data types, or check our main documentation page for integration guides.