We collect minimal analytics to understand how the site is used. If you decline, we do not load analytics.
Docs/Endpoints/POST /specs/content-extract
POST/v1/specs/content-extract

Queue extraction of section text (and optional LLM-structured requirement fields) for CSI-style sections from the same PDF as a completed TOC parse job. Job model: content-extractor. X-API-Key and Content-Type: application/json; response is 202 with a new job id, then poll GET /v1/jobs/{job_id}.

SpecsAsync · 2021 credit / jobmodel: content-extractor

Prerequisite

Requires a toc-parser job for your account with status === "complete". Create it with POST /v1/specs/parse/document. The canonical path is /v1/specs/parse/document — not /v1/specs/parse.

Request

Headers: X-API-Key, Content-Type: application/json. Body: ContentExtractRequest.

job_idreq
string (UUID)
Must be a completed toc-parser job id for this account.
section_codes
string[]
Explicit 6-digit section codes. At least one of section_codes or division_codes is required (both allowed).
division_codes
string[]
Expands to all 6-digit section codes under each division using result.divisions on the TOC job. Sections are pulled only from divisions present in the stored TOC result.
webhook_url
string
Optional. Worker delivers on developer, pro, and enterprise only.

Section codes: non-digits stripped; first 6 digits must form a full code (e.g. "10 29 00" → 102900). Invalid input → 422 INVALID_SECTION_CODE.

Division codes: non-digits stripped; last 2 digits padded to match division (e.g. 01, 1). Sections are taken only from divisions present in the stored TOC result. After expansion, codes are deduplicated (order preserved). If nothing resolves → 422 NO_SECTIONS_RESOLVED.

Code examples

curl -X POST https://api.anchorgrid.ai/v1/specs/content-extract \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "section_codes": ["102900", "033000"],
    "division_codes": ["03"]
  }'

Response — 202 Accepted

The response job_id is a new content-extractor job. The request body job_id is the source TOC job; when complete, result.source_job_id points back to that TOC job. Poll GET /v1/jobs/{job_id} — see GET /v1/jobs/{job_id}.

job_id
string (UUID)
New content-extractor job id.
status
string
Always queued on this response.
poll_url
string
Path only — prepend https://api.anchorgrid.ai.

Result shape

When status === "complete" and model === "content-extractor", result comes from content_extractor/task.py per section.

source_job_id
string (UUID)
The toc-parser job uuid from the request.
sections
array
One object per requested section.
sections[].section_code
string
6-digit code.
sections[].section_formatted
string
e.g. 10 29 00
sections[].division_code
string
Division portion.
sections[].found
boolean
false with empty content if the section could not be located.
sections[].content
string
Full extracted text when found.
sections[].products
array
Structured when SpecNormalizer succeeds; else [].
sections[].required_items
array
Same.
sections[].compliance_requirements
array
Same.
sections[].mounting_rules
array
Same.
sections[].furnishing_rules
array
Same.
sections_requested
integer
Count requested.
sections_found
integer
Sections with content located.
model_version
string
e.g. content-extractor-v1.0.0
processing_time_ms
integer
Wall time for the task.

Structured arrays are filled when SpecNormalizer succeeds on non-empty content; on failure they default to empty arrays (logged server-side). GET /v1/jobs does not post-process content-extractor results (only door-detector is filtered on read).

Credits & rate limits

Cost
1 credit / job (SPEC)
Rate limit
Tier RPM (job-submit bucket)

Same quota behavior as parse/document: free 402, paid monthly 429 when exhausted; RPM 429 from middleware.

Errors (synchronous, before queueing)

Codes appear in handler detail; Intelligence may still genericize JSON bodies.

401
Missing or invalid API key.
402
Free tier credit limit.
404
JOB_NOT_FOUND — no row with that id + account + model toc-parser.
422
Pydantic validation: missing both section_codes and division_codes, bad UUID, etc.
422
JOB_NOT_COMPLETE — TOC job still queued, processing, or failed.
422
JOB_NO_FILE — TOC job missing input_s3_key / bucket.
422
INVALID_SECTION_CODE — cannot normalize to 6 digits.
422
NO_SECTIONS_RESOLVED — expansion + explicit codes yielded nothing.
429
Monthly quota or RPM limit.

See Errors for HTTP exception mapping.

Typical flow

  1. POST /v1/documentsdocument_id
  2. POST /v1/specs/parse/document → TOC job_id
  3. GET /v1/jobs/{toc_job_id} until complete → read result.divisions / section codes
  4. POST /v1/specs/content-extract with body job_id (TOC job) and section_codes and/or division_codes
  5. GET /v1/jobs/{content_job_id} until complete → read result.sections

Response Preview

202 OK
{
  "job_id": "9faf778a-9636-52f0-b66d-e29fe3b12cf8",
  "status": "queued",
  "poll_url": "/v1/jobs/9faf778a-9636-52f0-b66d-e29fe3b12cf8"
}