Add this skill
npx mdskills install sickn33/azure-search-documents-pyComprehensive Azure AI Search SDK reference with strong examples for vector, hybrid, and semantic search
1---2name: azure-search-documents-py3description: |4 Azure AI Search SDK for Python. Use for vector search, hybrid search, semantic ranking, indexing, and skillsets.5 Triggers: "azure-search-documents", "SearchClient", "SearchIndexClient", "vector search", "hybrid search", "semantic search".6package: azure-search-documents7---89# Azure AI Search SDK for Python1011Full-text, vector, and hybrid search with AI enrichment capabilities.1213## Installation1415```bash16pip install azure-search-documents17```1819## Environment Variables2021```bash22AZURE_SEARCH_ENDPOINT=https://<service-name>.search.windows.net23AZURE_SEARCH_API_KEY=<your-api-key>24AZURE_SEARCH_INDEX_NAME=<your-index-name>25```2627## Authentication2829### API Key3031```python32from azure.search.documents import SearchClient33from azure.core.credentials import AzureKeyCredential3435client = SearchClient(36 endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],37 index_name=os.environ["AZURE_SEARCH_INDEX_NAME"],38 credential=AzureKeyCredential(os.environ["AZURE_SEARCH_API_KEY"])39)40```4142### Entra ID (Recommended)4344```python45from azure.search.documents import SearchClient46from azure.identity import DefaultAzureCredential4748client = SearchClient(49 endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],50 index_name=os.environ["AZURE_SEARCH_INDEX_NAME"],51 credential=DefaultAzureCredential()52)53```5455## Client Types5657| Client | Purpose |58|--------|---------|59| `SearchClient` | Search and document operations |60| `SearchIndexClient` | Index management, synonym maps |61| `SearchIndexerClient` | Indexers, data sources, skillsets |6263## Create Index with Vector Field6465```python66from azure.search.documents.indexes import SearchIndexClient67from azure.search.documents.indexes.models import (68 SearchIndex,69 SearchField,70 SearchFieldDataType,71 VectorSearch,72 HnswAlgorithmConfiguration,73 VectorSearchProfile,74 SearchableField,75 SimpleField76)7778index_client = SearchIndexClient(endpoint, AzureKeyCredential(key))7980fields = [81 SimpleField(name="id", type=SearchFieldDataType.String, key=True),82 SearchableField(name="title", type=SearchFieldDataType.String),83 SearchableField(name="content", type=SearchFieldDataType.String),84 SearchField(85 name="content_vector",86 type=SearchFieldDataType.Collection(SearchFieldDataType.Single),87 searchable=True,88 vector_search_dimensions=1536,89 vector_search_profile_name="my-vector-profile"90 )91]9293vector_search = VectorSearch(94 algorithms=[95 HnswAlgorithmConfiguration(name="my-hnsw")96 ],97 profiles=[98 VectorSearchProfile(99 name="my-vector-profile",100 algorithm_configuration_name="my-hnsw"101 )102 ]103)104105index = SearchIndex(106 name="my-index",107 fields=fields,108 vector_search=vector_search109)110111index_client.create_or_update_index(index)112```113114## Upload Documents115116```python117from azure.search.documents import SearchClient118119client = SearchClient(endpoint, "my-index", AzureKeyCredential(key))120121documents = [122 {123 "id": "1",124 "title": "Azure AI Search",125 "content": "Full-text and vector search service",126 "content_vector": [0.1, 0.2, ...] # 1536 dimensions127 }128]129130result = client.upload_documents(documents)131print(f"Uploaded {len(result)} documents")132```133134## Keyword Search135136```python137results = client.search(138 search_text="azure search",139 select=["id", "title", "content"],140 top=10141)142143for result in results:144 print(f"{result['title']}: {result['@search.score']}")145```146147## Vector Search148149```python150from azure.search.documents.models import VectorizedQuery151152# Your query embedding (1536 dimensions)153query_vector = get_embedding("semantic search capabilities")154155vector_query = VectorizedQuery(156 vector=query_vector,157 k_nearest_neighbors=10,158 fields="content_vector"159)160161results = client.search(162 vector_queries=[vector_query],163 select=["id", "title", "content"]164)165166for result in results:167 print(f"{result['title']}: {result['@search.score']}")168```169170## Hybrid Search (Vector + Keyword)171172```python173from azure.search.documents.models import VectorizedQuery174175vector_query = VectorizedQuery(176 vector=query_vector,177 k_nearest_neighbors=10,178 fields="content_vector"179)180181results = client.search(182 search_text="azure search",183 vector_queries=[vector_query],184 select=["id", "title", "content"],185 top=10186)187```188189## Semantic Ranking190191```python192from azure.search.documents.models import QueryType193194results = client.search(195 search_text="what is azure search",196 query_type=QueryType.SEMANTIC,197 semantic_configuration_name="my-semantic-config",198 select=["id", "title", "content"],199 top=10200)201202for result in results:203 print(f"{result['title']}")204 if result.get("@search.captions"):205 print(f" Caption: {result['@search.captions'][0].text}")206```207208## Filters209210```python211results = client.search(212 search_text="*",213 filter="category eq 'Technology' and rating gt 4",214 order_by=["rating desc"],215 select=["id", "title", "category", "rating"]216)217```218219## Facets220221```python222results = client.search(223 search_text="*",224 facets=["category,count:10", "rating"],225 top=0 # Only get facets, no documents226)227228for facet_name, facet_values in results.get_facets().items():229 print(f"{facet_name}:")230 for facet in facet_values:231 print(f" {facet['value']}: {facet['count']}")232```233234## Autocomplete & Suggest235236```python237# Autocomplete238results = client.autocomplete(239 search_text="sea",240 suggester_name="my-suggester",241 mode="twoTerms"242)243244# Suggest245results = client.suggest(246 search_text="sea",247 suggester_name="my-suggester",248 select=["title"]249)250```251252## Indexer with Skillset253254```python255from azure.search.documents.indexes import SearchIndexerClient256from azure.search.documents.indexes.models import (257 SearchIndexer,258 SearchIndexerDataSourceConnection,259 SearchIndexerSkillset,260 EntityRecognitionSkill,261 InputFieldMappingEntry,262 OutputFieldMappingEntry263)264265indexer_client = SearchIndexerClient(endpoint, AzureKeyCredential(key))266267# Create data source268data_source = SearchIndexerDataSourceConnection(269 name="my-datasource",270 type="azureblob",271 connection_string=connection_string,272 container={"name": "documents"}273)274indexer_client.create_or_update_data_source_connection(data_source)275276# Create skillset277skillset = SearchIndexerSkillset(278 name="my-skillset",279 skills=[280 EntityRecognitionSkill(281 inputs=[InputFieldMappingEntry(name="text", source="/document/content")],282 outputs=[OutputFieldMappingEntry(name="organizations", target_name="organizations")]283 )284 ]285)286indexer_client.create_or_update_skillset(skillset)287288# Create indexer289indexer = SearchIndexer(290 name="my-indexer",291 data_source_name="my-datasource",292 target_index_name="my-index",293 skillset_name="my-skillset"294)295indexer_client.create_or_update_indexer(indexer)296```297298## Best Practices2993001. **Use hybrid search** for best relevance combining vector and keyword3012. **Enable semantic ranking** for natural language queries3023. **Index in batches** of 100-1000 documents for efficiency3034. **Use filters** to narrow results before ranking3045. **Configure vector dimensions** to match your embedding model3056. **Use HNSW algorithm** for large-scale vector search3067. **Create suggesters** at index creation time (cannot add later)307308## Reference Files309310| File | Contents |311|------|----------|312| [references/vector-search.md](references/vector-search.md) | HNSW configuration, integrated vectorization, multi-vector queries |313| [references/semantic-ranking.md](references/semantic-ranking.md) | Semantic configuration, captions, answers, hybrid patterns |314| [scripts/setup_vector_index.py](scripts/setup_vector_index.py) | CLI script to create vector-enabled search index |315316317---318319## Additional Azure AI Search Patterns320321# Azure AI Search Python SDK322323Write clean, idiomatic Python code for Azure AI Search using `azure-search-documents`.324325## Installation326327```bash328pip install azure-search-documents azure-identity329```330331## Environment Variables332333```bash334AZURE_SEARCH_ENDPOINT=https://<search-service>.search.windows.net335AZURE_SEARCH_INDEX_NAME=<index-name>336# For API key auth (not recommended for production)337AZURE_SEARCH_API_KEY=<api-key>338```339340## Authentication341342**DefaultAzureCredential (preferred)**:343```python344from azure.identity import DefaultAzureCredential345from azure.search.documents import SearchClient346347credential = DefaultAzureCredential()348client = SearchClient(endpoint, index_name, credential)349```350351**API Key**:352```python353from azure.core.credentials import AzureKeyCredential354from azure.search.documents import SearchClient355356client = SearchClient(endpoint, index_name, AzureKeyCredential(api_key))357```358359## Client Selection360361| Client | Purpose |362|--------|---------|363| `SearchClient` | Query indexes, upload/update/delete documents |364| `SearchIndexClient` | Create/manage indexes, knowledge sources, knowledge bases |365| `SearchIndexerClient` | Manage indexers, skillsets, data sources |366| `KnowledgeBaseRetrievalClient` | Agentic retrieval with LLM-powered Q&A |367368## Index Creation Pattern369370```python371from azure.search.documents.indexes import SearchIndexClient372from azure.search.documents.indexes.models import (373 SearchIndex, SearchField, VectorSearch, VectorSearchProfile,374 HnswAlgorithmConfiguration, AzureOpenAIVectorizer,375 AzureOpenAIVectorizerParameters, SemanticSearch,376 SemanticConfiguration, SemanticPrioritizedFields, SemanticField377)378379index = SearchIndex(380 name=index_name,381 fields=[382 SearchField(name="id", type="Edm.String", key=True),383 SearchField(name="content", type="Edm.String", searchable=True),384 SearchField(name="embedding", type="Collection(Edm.Single)",385 vector_search_dimensions=3072,386 vector_search_profile_name="vector-profile"),387 ],388 vector_search=VectorSearch(389 profiles=[VectorSearchProfile(390 name="vector-profile",391 algorithm_configuration_name="hnsw-algo",392 vectorizer_name="openai-vectorizer"393 )],394 algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo")],395 vectorizers=[AzureOpenAIVectorizer(396 vectorizer_name="openai-vectorizer",397 parameters=AzureOpenAIVectorizerParameters(398 resource_url=aoai_endpoint,399 deployment_name=embedding_deployment,400 model_name=embedding_model401 )402 )]403 ),404 semantic_search=SemanticSearch(405 default_configuration_name="semantic-config",406 configurations=[SemanticConfiguration(407 name="semantic-config",408 prioritized_fields=SemanticPrioritizedFields(409 content_fields=[SemanticField(field_name="content")]410 )411 )]412 )413)414415index_client = SearchIndexClient(endpoint, credential)416index_client.create_or_update_index(index)417```418419## Document Operations420421```python422from azure.search.documents import SearchIndexingBufferedSender423424# Batch upload with automatic batching425with SearchIndexingBufferedSender(endpoint, index_name, credential) as sender:426 sender.upload_documents(documents)427428# Direct operations via SearchClient429search_client = SearchClient(endpoint, index_name, credential)430search_client.upload_documents(documents) # Add new431search_client.merge_documents(documents) # Update existing432search_client.merge_or_upload_documents(documents) # Upsert433search_client.delete_documents(documents) # Remove434```435436## Search Patterns437438```python439# Basic search440results = search_client.search(search_text="query")441442# Vector search443from azure.search.documents.models import VectorizedQuery444445results = search_client.search(446 search_text=None,447 vector_queries=[VectorizedQuery(448 vector=embedding,449 k_nearest_neighbors=5,450 fields="embedding"451 )]452)453454# Hybrid search (vector + keyword)455results = search_client.search(456 search_text="query",457 vector_queries=[VectorizedQuery(vector=embedding, k_nearest_neighbors=5, fields="embedding")],458 query_type="semantic",459 semantic_configuration_name="semantic-config"460)461462# With filters463results = search_client.search(464 search_text="query",465 filter="category eq 'technology'",466 select=["id", "title", "content"],467 top=10468)469```470471## Agentic Retrieval (Knowledge Bases)472473For LLM-powered Q&A with answer synthesis, see [references/agentic-retrieval.md](references/agentic-retrieval.md).474475Key concepts:476- **Knowledge Source**: Points to a search index477- **Knowledge Base**: Wraps knowledge sources + LLM for query planning and synthesis478- **Output modes**: `EXTRACTIVE_DATA` (raw chunks) or `ANSWER_SYNTHESIS` (LLM-generated answers)479480## Async Pattern481482```python483from azure.search.documents.aio import SearchClient484485async with SearchClient(endpoint, index_name, credential) as client:486 results = await client.search(search_text="query")487 async for result in results:488 print(result["title"])489```490491## Best Practices4924931. **Use environment variables** for endpoints, keys, and deployment names4942. **Prefer `DefaultAzureCredential`** over API keys for production4953. **Use `SearchIndexingBufferedSender`** for batch uploads (handles batching/retries)4964. **Always define semantic configuration** for agentic retrieval indexes4975. **Use `create_or_update_index`** for idempotent index creation4986. **Close clients** with context managers or explicit `close()`499500## Field Types Reference501502| EDM Type | Python | Notes |503|----------|--------|-------|504| `Edm.String` | str | Searchable text |505| `Edm.Int32` | int | Integer |506| `Edm.Int64` | int | Long integer |507| `Edm.Double` | float | Floating point |508| `Edm.Boolean` | bool | True/False |509| `Edm.DateTimeOffset` | datetime | ISO 8601 |510| `Collection(Edm.Single)` | List[float] | Vector embeddings |511| `Collection(Edm.String)` | List[str] | String arrays |512513## Error Handling514515```python516from azure.core.exceptions import (517 HttpResponseError,518 ResourceNotFoundError,519 ResourceExistsError520)521522try:523 result = search_client.get_document(key="123")524except ResourceNotFoundError:525 print("Document not found")526except HttpResponseError as e:527 print(f"Search error: {e.message}")528```529
Full transparency — inspect the skill content before installing.