Stay organized with collections
Save and categorize content based on your preferences.
If you need to convert extracted entities to Document AI Warehouse properties, you
need to set or update the schema.
Before you set the schema with mapping, you need to know the
Document AI processor types and their schemas and entity types. The
pipeline flattens the nested entities, so you also need to create mappings for
the child entities.
For example, the processor INVOICE_PROCESSOR has the following entity types:
If you want to keep the property name the same as the entity type, you can
directly use the name, such as line_item in the above example. If you want to
convert all entities with type receiver_name_in_invoice from the invoice
processor and with receiver_name_in_w2 from the form W2 processor to your new
name my_new_receiver_name, you can add the mappings in the schema_sources
field like the above example. But after converting, use my_new_receiver_name
for searching and filtering. The property names and schema_source names should
be unique.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eDocument AI Warehouse is being deprecated and will be unavailable after January 16, 2025, requiring users to migrate their data to an alternative like Cloud Storage to avoid data loss.\u003c/p\u003e\n"],["\u003cp\u003eSchema setup or updates are necessary for converting extracted entities to Document AI Warehouse properties, a feature supported only by ingestion and Process-with-Document AI pipelines.\u003c/p\u003e\n"],["\u003cp\u003eThe pipeline will flatten any nested entities, and the schema must include mappings for child entities in order for the data to be carried over.\u003c/p\u003e\n"],["\u003cp\u003eEntities not listed within the schema's property or schema_source names will be discarded during the conversion process.\u003c/p\u003e\n"],["\u003cp\u003eYou can map multiple entity types from various processors to a single new property name, such as mapping "receiver_name_in_invoice" and "receiver_name_in_w2" to "my_new_receiver_name," using the schema_sources field.\u003c/p\u003e\n"]]],[],null,["# Set schemas with mapping\n\n| **Caution** : Document AI Warehouse is deprecated and will no longer be available on Google Cloud after January 16, 2025. To safeguard your data, migrate any documents currently saved in Document AI Warehouse to an alternative like Cloud Storage. Verify that your data migration is completed before the discontinuation date to prevent any data loss. See [Deprecations](/document-warehouse/docs/deprecations) for details.\n\n\u003cbr /\u003e\n\nIf you need to convert extracted entities to Document AI Warehouse properties, you\nneed to set or update the schema.\n| **Note:** Only ingestion and Process-with-Document AI pipelines support schema mapping. Regular document creation calls do not trigger mapping.\n\nBefore you set the schema with mapping, you need to know the\nDocument AI processor types and their schemas and entity types. The\npipeline flattens the nested entities, so you also need to create mappings for\nthe child entities.\n\nFor example, the processor `INVOICE_PROCESSOR` has the following entity types:\n\n- `line_item`\n- `line_item/amount`\n- `total_amount`\n\n**Note:** All entities whose entity types are not in the schema (property name or schema_source name) will be discarded. \n\n {\n \"property_definitions\": [\n {\n \"name\": \"line_item\",\n \"display_name\": \"line_item\",\n \"is_searchable\": true,\n \"is_filterable\": true,\n \"text_type_options\": {}\n },\n {\n \"name\": \"my_new_receiver_name\",\n \"display_name\": \"my_new_receiver_name\",\n \"is_searchable\": true,\n \"is_filterable\": true,\n \"text_type_options\": {},\n \"schema_sources\": [\n {\n \"name\": \"receiver_name_in_invoice\",\n \"processor_type\": \"INVOICE_PROCESSOR\"\n },\n {\n \"name\": \"receiver_name_in_w2\",\n \"processor_type\": \"FORM_W2_PROCESSOR\"\n }\n ]\n }\n ]\n }\n\nIf you want to keep the property name the same as the entity type, you can\ndirectly use the name, such as `line_item` in the above example. If you want to\nconvert all entities with type `receiver_name_in_invoice` from the invoice\nprocessor and with `receiver_name_in_w2` from the form W2 processor to your new\nname `my_new_receiver_name`, you can add the mappings in the `schema_sources`\nfield like the above example. But after converting, use `my_new_receiver_name`\nfor searching and filtering. The property names and schema_source names should\nbe unique."]]