AnchorGrid Developer Docs

Docs/Endpoints/POST /specs/content-extract

POST/v1/specs/content-extract

Queue extraction of section text (and optional LLM-structured requirement fields) for CSI-style sections from the same PDF as a completed TOC parse job. Job model: content-extractor. X-API-Key and Content-Type: application/json; response is 202 with a new job id, then poll GET /v1/jobs/{job_id}.

SpecsAsync · 2021 credit / jobmodel: content-extractor

Prerequisite

ℹ

Requires a toc-parser job for your account with status === "complete". Create it with POST /v1/specs/parse/document. The canonical path is /v1/specs/parse/document — not /v1/specs/parse.

Request

↑

This endpoint requires a document_id. Upload your PDF first →

Headers: X-API-Key, Content-Type: application/json. Body: ContentExtractRequest.

job_idreq

string (UUID)

Must be a completed toc-parser job id for this account.

section_codesoptional

string[]

Explicit 6-digit section codes.

division_codesoptional

string[]

Expands to all 6-digit section codes under each division using result.divisions on the TOC job. Sections are pulled only from divisions present in the stored TOC result.

webhook_urloptional

string

Worker delivers on developer, pro, and enterprise only.

ℹ

Section codes: non-digits stripped; first 6 digits must form a full code (e.g. "10 29 00" → 102900). Invalid input → 422 INVALID_SECTION_CODE.

ℹ

Division codes: non-digits stripped; last 2 digits padded to match division (e.g. 01, 1). Sections are taken only from divisions present in the stored TOC result. After expansion, codes are deduplicated (order preserved). If nothing resolves → 422 NO_SECTIONS_RESOLVED.

Code examples

curl -X POST https://api.anchorgrid.ai/v1/specs/content-extract \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "section_codes": ["102900", "033000"],
    "division_codes": ["03"]
  }'

Response — 202 Accepted

The response job_id is a new content-extractor job. The request body job_id is the source TOC job; when complete, result.source_job_id points back to that TOC job. Poll GET /v1/jobs/{job_id} — see GET /v1/jobs/{job_id}.

job_id

string (UUID)

New content-extractor job id.

status

string

Always queued on this response.

poll_url

string

Path only — prepend https://api.anchorgrid.ai.

Result shape

When status === "complete" and model === "content-extractor", result comes from content_extractor/task.py per section.

source_job_id

string (UUID)

The toc-parser job uuid from the request.

sections

array

One object per requested section.

sections[].section_code

string

6-digit code.

sections[].section_formatted

string

e.g. 10 29 00

sections[].division_code

string

Division portion.

sections[].found

boolean

false with empty content if the section could not be located.

sections[].content

string

Full extracted text when found.

sections[].products

array

Structured when SpecNormalizer succeeds; else [].

sections[].required_items

array

Same.

sections[].compliance_requirements

array

Same.

sections[].mounting_rules

array

Same.

sections[].furnishing_rules

array

Same.

sections_requested

integer

Count requested.

sections_found

integer

Sections with content located.

model_version

string

e.g. content-extractor-v1.0.0

processing_time_ms

integer

Wall time for the task.

ℹ

Structured arrays are filled when SpecNormalizer succeeds on non-empty content; on failure they default to empty arrays (logged server-side). GET /v1/jobs does not post-process content-extractor results (only door-detector is filtered on read).

Credits & rate limits

Cost

1 credit / job (SPEC)

Rate limit

Tier RPM (job-submit bucket)

Same quota behavior as parse/document: free 402, paid monthly 429 when exhausted; RPM 429 from middleware.

Errors (synchronous, before queueing)

Codes appear in handler detail; Intelligence may still genericize JSON bodies.

401

Missing or invalid API key.

402

Free tier credit limit.

404

JOB_NOT_FOUND — no row with that id + account + model toc-parser.

422

Pydantic validation: missing both section_codes and division_codes, bad UUID, etc.

422

JOB_NOT_COMPLETE — TOC job still queued, processing, or failed.

422

JOB_NO_FILE — TOC job missing input_s3_key / bucket.

422

INVALID_SECTION_CODE — cannot normalize to 6 digits.

422

NO_SECTIONS_RESOLVED — expansion + explicit codes yielded nothing.

429

Monthly quota or RPM limit.

See Errors for HTTP exception mapping.

Typical flow

POST /v1/documents → document_id
POST /v1/specs/parse/document → TOC job_id
GET /v1/jobs/{toc_job_id} until complete → read result.divisions / section codes
POST /v1/specs/content-extract with body job_id (TOC job) and section_codes and/or division_codes
GET /v1/jobs/{content_job_id} until complete → read result.sections

Response Preview

202 OK

{
  "job_id": "9faf778a-9636-52f0-b66d-e29fe3b12cf8",
  "status": "queued",
  "poll_url": "/v1/jobs/9faf778a-9636-52f0-b66d-e29fe3b12cf8"
}