API Reference

Data representation

Solr documents are modeled as Python dictionaries with field names as keys and field values as values.

Multi-valued fields use list, tuple, or set as values.
datetime.datetime values are converted to UTC.
datetime.date values are converted to datetime.datetime at 00:00:00 UTC.
bool values are converted to 'true' or 'false'.
None values are omitted from the document sent to Solr.

Exceptions

class solr.core.SolrVersionError(feature, required, actual)

Raised when a feature requires a higher Solr version than connected. Subclass of Exception.

feature: Name of the feature that was called (string).

required: Minimum version required as a tuple, e.g. (4, 0).

actual: Detected server version as a tuple, e.g. (3, 6, 2).

Solr class

class solr.Solr(url, persistent=True, timeout=None, ssl_key=None, ssl_cert=None, http_user=None, http_pass=None, post_headers=None, max_retries=3, always_commit=False, debug=False)

Connect to the Solr instance at url. If the Solr instance provides multiple cores, url should point to a specific core.

Constructor parameters:

Parameter	Description
`url`	URI pointing to the Solr instance (e.g. `http://localhost:8983/solr/mycore`). A `UserWarning` is issued if the path does not contain `/solr`.
`persistent`	Keep a persistent HTTP connection open. Defaults to `True`.
`timeout`	Timeout in seconds for server responses.
`ssl_key`	Path to PEM key file for SSL client authentication.
`ssl_cert`	Path to PEM certificate file for SSL client authentication.
`http_user`	Username for HTTP Basic authentication.
`http_pass`	Password for HTTP Basic authentication.
`post_headers`	Dictionary of additional headers to include in all requests.
`max_retries`	Maximum number of automatic retries on connection errors. Defaults to `3`.
`retry_delay`	Base delay in seconds between retries. Uses exponential backoff: first retry waits `retry_delay`, second waits `retry_delay * 2`, etc. Defaults to `0.1`. Each retry is logged at WARNING level.
`always_commit`	If `True`, all update methods (`add`, `add_many`, `delete`, etc.) will automatically commit changes. Individual calls can override this by passing `commit=False`. Defaults to `False`.
`response_format`	Response format for queries: `'json'` (default) or `'xml'`. When `'json'`, queries use `wt=json` and the JSON parser. Use `'xml'` for legacy compatibility with older code.
`auth_token`	Bearer token string. Sends `Authorization: Bearer <token>` header. Takes priority over `http_user`/`http_pass`.
`auth`	A callable returning a `dict[str, str]` of headers. Called on every request, enabling dynamic token refresh (e.g., OAuth2). Takes priority over `auth_token` and `http_user`/`http_pass`.
`debug`	If `True`, log all requests and responses.

Attributes:

server_version: Tuple representing the detected Solr version, e.g. (9, 4, 1). Automatically populated during initialization.

always_commit: Boolean indicating whether update methods auto-commit by default.

select: A SearchHandler instance bound to the /select endpoint.

Health check:

ping()

Ping the Solr server to check if it is reachable.

Returns True if the server responds to /admin/ping, False otherwise. Tries both the core path and its parent path.

Works on all Solr versions (1.2+).

Example:

conn = solr.Solr('http://localhost:8983/solr/mycore')
if conn.ping():
    print('Solr is up')

Search methods:

The select attribute is the primary search interface. See SearchHandler for details:

response = conn.select('title:lucene')

Update methods:

Atomic update methods (Solr 4.0+):

atomic_update(doc, commit=False)

Partial update of a single document. Field values can be plain values or dicts with a modifier key: set, add, remove, removeregex (Solr 5.0+), inc. Use {'set': None} to remove a field.

Example:

conn.atomic_update({
    'id': 'doc1',
    'title': {'set': 'New Title'},
    'count': {'inc': 1},
    'old_field': {'set': None},  # remove field
}, commit=True)

atomic_update_many(docs, commit=False): Partial update of multiple documents. Same modifier syntax as atomic_update.

Real-time Get (Solr 4.0+):

get(id=None, ids=None, fields=None)

Retrieve documents directly from the transaction log without waiting for a commit. Returns a dict for single id (or None if not found), or a list for ids.

Parameters:

id – Single document ID.
ids – List of document IDs.
fields – Optional list of fields to return.

Cursor pagination (Solr 4.7+):

iter_cursor(q, sort, rows=100, **params)

Generator that yields Response objects for each batch of cursor-paginated results. Stops when all results are consumed.

Parameters:

q – Query string.
sort – Sort clause (must include uniqueKey field).
rows – Batch size per request.

Raises:

ValueError – If sort is not provided.

MoreLikeThis (Solr 4.0+):

Create a MoreLikeThis instance:

from solr import MoreLikeThis

mlt = MoreLikeThis(conn)
response = mlt('interesting text', fl='title,body')

class solr.MoreLikeThis(conn)

Find similar documents using Solr’s /mlt handler.

Parameters:: conn – A Solr instance.

__call__(q=None, **params): Query the MLT handler. Same parameters as SearchHandler.

raw(**params): Issue a raw MLT query.

Delete methods:

Commit and optimize:

Connection management:

Commit-control arguments

Several update methods support optional keyword arguments to control commits. These arguments are always optional; when always_commit is False (the default), no commit is performed unless explicitly requested.

Argument	Description
`commit`	If `True`, commit changes before returning. When `always_commit` is `True` on the connection, this defaults to `True` but can be overridden with `commit=False`.
`optimize`	If `True`, optimize the index before returning (implies `commit=True`).
`wait_flush`	Block until the commit is flushed to disk. Defaults to `True`.
`wait_searcher`	Block until searcher objects are warmed. Defaults to `True`.

If wait_flush or wait_searcher are specified without commit or optimize, a TypeError is raised.

Methods that support commit-control arguments: add, add_many, delete, delete_many, delete_query.

All update methods and SearchHandler calls also accept a timeout keyword argument to override the connection-level timeout for that individual request.

SearchHandler class

class solr.SearchHandler(connection, path='/select', arg_separator='_')

Provides access to a named Solr request handler. The select attribute on Solr instances is a SearchHandler bound to /select.

Create handlers for custom endpoints:

import solr

conn = solr.Solr('http://localhost:8983/solr/mycore')
my_handler = solr.SearchHandler(conn, '/my_handler')
response = my_handler('some query')

SearchHandler.__call__(q=None, fields=None, highlight=None, score=True, sort=None, sort_order='asc', **params)

Execute a search query against Solr.

Parameters:

q – Query string.
fields – Fields to return. String or iterable. Defaults to '*'.
highlight – False (default), True, or a list of field names. When enabled, the response gets a highlighting attribute — a dict keyed by document ID, where each value is a dict of field names to lists of highlighted snippets (e.g. {'doc1': {'title': ['<em>Lucene</em> in Action']}}). Customize with hl_simple_pre, hl_simple_post, hl_fragsize, hl_snippets, etc. via **params.
score – Include score in results. Defaults to True.
sort – Fields to sort by. String or iterable.
sort_order – Default sort direction ('asc' or 'desc').
json_facet – JSON Facet API dict (Solr 5.0+). Serialized to json.facet query parameter automatically.
params – Additional Solr parameters (use underscores for dots).
timeout – Per-request timeout in seconds (overrides connection-level timeout).

Returns:

A Response instance.

Raises:

ValueError – If highlight=True but no fields are specified, or if sort_order is invalid.

SearchHandler.raw(**params): Issue a raw query. No processing is performed on parameters or responses. Returns the raw response text.

Response class

class solr.Response

Container for query results.

Attributes:

header: Dictionary containing response header values (status, QTime, params).

results: A Results list of matching documents. Each document is a dictionary of field names to values.

numFound: Total number of matching documents.

start: Starting offset of the current result set.

maxScore: Maximum relevance score across all matches.

facet_counts: Dictionary containing traditional facet results when facet=true is used. Contains keys like facet_fields, facet_queries, facet_ranges, facet_pivots, etc.

facets: Dictionary containing JSON Facet API results when json.facet is used (Solr 5.0+). Contains the structured facet buckets returned by Solr’s JSON faceting.

stats: Dictionary containing field statistics when stats=true is used. Contains per-field stats such as min, max, count, mean, etc.

debug: Dictionary containing debug information when debug=true (or debugQuery=true) is used. Contains parsed query, explain info, timing data, etc.

Note

Any top-level key in the Solr JSON response that is not responseHeader or response is automatically set as an attribute on the Response object. This includes highlighting, facet_counts, facets, stats, debug, nextCursorMark, grouped, and any other component output. You can access them as response.key_name.

Pydantic models (opt-in):

as_models(model)

Convert result documents to Pydantic BaseModel instances. Requires pydantic (pip install solrpy[pydantic]).

Parameters:: model – A Pydantic BaseModel subclass.
Returns:: List of model instances.

The model= parameter on select() and get() does this automatically:

resp = conn.select('*:*', model=MyDoc)  # results are list[MyDoc]
doc = conn.get(id='1', model=MyDoc)      # MyDoc | None

Cursor pagination (Solr 4.7+):

cursor_next()

Follow cursor-based pagination. Returns the next page of results, or None if no more results (nextCursorMark == cursorMark) or if the query did not use cursorMark.

Example:

resp = conn.select('*:*', sort='id asc', cursorMark='*', rows=100)
while resp:
    process(resp.results)
    resp = resp.cursor_next()

Offset pagination methods:

next_batch(): Fetch the next batch of results. Returns a new Response, or None if there are no more results.

previous_batch(): Fetch the previous batch of results. Returns a new Response, or None if this is the first batch.

Grouping (Solr 3.3+):

grouped

A GroupedResult object when the response contains grouped results, otherwise not present. Enable grouping with group='true' and group_field='field'.

Example:

resp = conn.select('*:*', group='true', group_field='category',
                   group_limit=5, group_ngroups='true')
for group in resp.grouped['category'].groups:
    print(group.groupValue, len(group.doclist))

Spellcheck (Solr 1.4+):

spellcheck

A SpellcheckResult object if the response contains spellcheck data, otherwise None. Spellcheck data is returned when you include spellcheck='true' in the query parameters.

Example:

resp = conn.select('misspeled query', spellcheck='true',
                   spellcheck_collate='true')
if resp.spellcheck and not resp.spellcheck.correctly_spelled:
    print('Did you mean:', resp.spellcheck.collation)

Iteration:

Response objects support len() and iteration:

response = conn.select('*:*')
print(len(response))
for doc in response:
    print(doc['id'])

SpellcheckResult class (Solr 1.4+)

class solr.SpellcheckResult(raw)

Wrapper around the raw spellcheck response dict. Returned by Response.spellcheck when the query includes spellcheck=true.

Parameters:: raw – The raw spellcheck dict from the Solr response.

correctly_spelled: True if all query terms were spelled correctly.

collation: The corrected full query string suggested by Solr (collation), or None if not present. Requires spellcheck.collate=true on the request.

suggestions

List of per-word suggestion entries. Each entry is a dict that includes an 'original' key (the misspelled word) merged with the Solr info dict ('numFound', 'startOffset', 'endOffset', 'suggestion' list, etc.).

Example:

for entry in resp.spellcheck.suggestions:
    print(entry['original'], '->', entry.get('suggestion', []))

Extract class (Solr 1.4+)

class solr.Extract(conn)

Index or extract rich documents via Solr Cell (Apache Tika) using the /update/extract handler. The handler must be configured in solrconfig.xml.

Parameters:: conn – A Solr or AsyncSolr instance. With AsyncSolr, methods return coroutines.

__call__(file_obj, content_type='application/octet-stream', commit=False, **params)

Index a rich document.

Parameters:

file_obj – Binary file-like object (opened in 'rb' mode).
content_type – MIME type of the document. Defaults to 'application/octet-stream'.
commit – Commit to the index immediately. Defaults to False.
params – Additional Solr parameters. The first underscore in each key is replaced with a dot: literal_id='x' → literal.id=x. Field names with underscores are preserved: literal_my_field='v' → literal.my_field=v.

Returns:

Parsed JSON response dict (contains responseHeader).

Raises:

SolrVersionError – If the server is older than Solr 1.4.

Example:

from solr import Solr, Extract

conn = Solr('http://localhost:8983/solr/mycore')
ext = Extract(conn)

with open('report.pdf', 'rb') as f:
    ext(f, content_type='application/pdf',
        literal_id='report1', literal_title='Annual Report',
        commit=True)

extract_only(file_obj, content_type='application/octet-stream', **params)

Extract text and metadata without indexing.

Calls /update/extract with extractOnly=true.

Returns:: (text, metadata) tuple. text is the extracted plain text; metadata is a dict of Tika metadata (e.g. 'Content-Type', 'Author', 'title').

from_path(file_path, **params)

Index a document from a filesystem path. MIME type is guessed from the file extension via mimetypes; falls back to 'application/octet-stream'.

Parameters:

file_path – Path to the file.
params – Forwarded to __call__() (commit, literal_*, etc.).

extract_from_path(file_path, **params)

Extract text and metadata from a file path without indexing.

Returns:: (text, metadata) tuple (same as extract_only()).

Suggest class (Solr 4.7+)

class solr.Suggest(conn)

Query Solr’s SuggestComponent via the /suggest handler.

The /suggest handler and at least one SuggestComponent must be configured in solrconfig.xml.

Parameters:: conn – A Solr or AsyncSolr instance. With AsyncSolr, methods return coroutines.

__call__(q, dictionary=None, count=10, **params)

Return a flat list of suggestion dicts for the query term.

Parameters:

q – Partial query string to suggest for.
dictionary – Name of the suggester dictionary to use. If None, Solr uses the default suggester.
count – Maximum number of suggestions to return. Defaults to 10.
params – Extra parameters forwarded verbatim to /suggest.

Returns:

List of suggestion dicts. Each dict typically has 'term', 'weight', and 'payload' keys.

Raises:

SolrVersionError – If the server is older than Solr 4.7.

Example:

from solr import Solr, Suggest

conn = Solr('http://localhost:8983/solr/mycore')
suggest = Suggest(conn)
results = suggest('que', dictionary='mySuggester', count=5)
for s in results:
    print(s['term'], s['weight'])

Grouping classes (Solr 3.3+)

class solr.GroupedResult(raw)

Wrapper around a Solr grouped response. Supports subscript access by field name, iteration, in, and len().

__getitem__(field): Return a GroupField for the given field name.

class solr.GroupField(raw)

Grouped results for a single field.

matches: Total number of documents matching the query across all groups.

ngroups: Number of distinct groups, or None if group.ngroups was not requested.

groups: List of Group objects.

class solr.Group(raw)

A single group.

groupValue: The field value that defines this group, or None.

doclist: A Results list of matching documents, with numFound and start attributes.

KNN / Dense Vector Search (Solr 9.0+)

class solr.KNN(conn)

Dense Vector / KNN Search using Solr’s {!knn} and {!vectorSimilarity} query parsers. Supports top-K search, similarity threshold search, hybrid (lexical + vector) search, and re-ranking. Created explicitly by the user.

Parameters:: conn – A Solr or AsyncSolr instance. With AsyncSolr, execution methods return coroutines.

Execution methods:

search(vector, field, top_k=10, filters=None, early_termination=False, saturation_threshold=None, patience=None, ef_search_scale_factor=None, seed_query=None, pre_filter=None, **params)

Execute a {!knn} search query (top-K nearest neighbors).

Parameters:

vector – Dense vector as a sequence of floats.
field – Name of the DenseVectorField to search.
top_k – Number of nearest neighbors to retrieve. Defaults to 10.
filters – Filter query (fq parameter).
early_termination – Enable HNSW early termination optimization.
saturation_threshold – Queue saturation cutoff for early termination (float between 0 and 1).
patience – Iteration limit for early termination (integer).
ef_search_scale_factor – Candidate examination multiplier (Solr 10.0+). Raises SolrVersionError if Solr < 10.0.
seed_query – Lexical query string to guide the vector search entry point.
pre_filter – Explicit pre-filter query string(s). A single string or a list of strings.
params – Additional Solr parameters.

Returns:

A Response instance (sync) or coroutine (async).

Raises:

SolrVersionError – If connected Solr is older than 9.0.

similarity(vector, field, min_return, min_traverse=None, pre_filter=None, filters=None, **params)

Execute a {!vectorSimilarity} search. Returns all documents whose similarity to the vector exceeds min_return.

Parameters:

vector – Dense vector as a sequence of floats.
field – Name of the DenseVectorField.
min_return – Minimum similarity threshold for results (float).
min_traverse – Minimum similarity to continue graph traversal (float). Can improve performance by pruning low-similarity branches.
pre_filter – Explicit pre-filter query string(s).
filters – Filter query (fq parameter).
params – Additional Solr parameters.

Returns:

A Response instance.

Raises:

SolrVersionError – If connected Solr is older than 9.0.

hybrid(text_query, vector, field, min_return=0.5, **params)

Execute a hybrid (lexical OR vector) search. Combines a standard text query with a {!vectorSimilarity} query using an OR clause.

Parameters:

text_query – The lexical search query string.
vector – Dense vector for similarity matching.
field – Name of the DenseVectorField.
min_return – Minimum similarity threshold for the vector part. Defaults to 0.5.
params – Additional Solr parameters.

Returns:

A Response instance.

Raises:

SolrVersionError – If connected Solr is older than 9.0.

rerank(query, vector, field, top_k=10, rerank_docs=100, rerank_weight=1.0, **params)

Execute a lexical query re-ranked by vector similarity. Uses Solr’s {!rerank} query parser to re-score the top lexical results with a {!knn} query.

Parameters:

query – The base lexical query string.
vector – Dense vector for re-ranking.
field – Name of the DenseVectorField.
top_k – topK for the KNN re-rank query. Defaults to 10.
rerank_docs – Number of top lexical docs to re-rank. Defaults to 100.
rerank_weight – Weight of the vector score in the final ranking. Defaults to 1.0.
params – Additional Solr parameters.

Returns:

A Response instance.

Raises:

SolrVersionError – If connected Solr is older than 9.0.

__call__(vector, field, top_k=10, **params): Shortcut for search().

Query builder methods:

build_knn_query(vector, field, top_k=10, early_termination=False, saturation_threshold=None, patience=None, ef_search_scale_factor=None, seed_query=None, pre_filter=None, include_tags=None, exclude_tags=None)

Build a {!knn} query string without executing it.

Parameters:

vector – Dense vector as a sequence of floats.
field – Name of the DenseVectorField to search.
top_k – Number of nearest neighbors to retrieve.
early_termination – Enable HNSW early termination.
saturation_threshold – Queue saturation cutoff (float).
patience – Iteration limit for early termination (int).
ef_search_scale_factor – Candidate examination multiplier (Solr 10.0+).
seed_query – Lexical query to guide vector search entry point.
pre_filter – Pre-filter query string(s) (string or list of strings).
include_tags – Only use fq filters with these tags.
exclude_tags – Exclude fq filters with these tags.

Returns:

The KNN query string, e.g. {!knn f=embedding topK=10}[0.1,0.2,...].

build_similarity_query(vector, field, min_return, min_traverse=None, pre_filter=None)

Build a {!vectorSimilarity} query string without executing it.

Parameters:

vector – Dense vector as a sequence of floats.
field – Name of the DenseVectorField.
min_return – Minimum similarity threshold for results.
min_traverse – Minimum similarity to continue traversal.
pre_filter – Pre-filter query string(s).

Returns:

The vectorSimilarity query string.

build_hybrid_query(text_query, vector, field, min_return=0.5)

Build a hybrid (lexical OR vector) query string without executing it.

Parameters:

text_query – The lexical search query.
vector – Dense vector for similarity matching.
field – Name of the DenseVectorField.
min_return – Minimum similarity threshold for the vector part.

Returns:

A combined OR query string.

build_rerank_params(vector, field, top_k=10, rerank_docs=100, rerank_weight=1.0)

Build re-ranking parameters for use with a lexical base query.

Parameters:

vector – Dense vector for re-ranking.
field – Name of the DenseVectorField.
top_k – topK for the KNN re-rank query.
rerank_docs – Number of top lexical docs to re-rank.
rerank_weight – Weight of vector score in final ranking.

Returns:

A dict with rq and rqq keys ready to pass as query parameters.

build_query(vector, field, top_k=10, ef_search_scale_factor=None): Alias for build_knn_query() (backward compatibility).

Example:

from solr import Solr, KNN

conn = Solr('http://localhost:8983/solr/mycore')
knn = KNN(conn)

# Top-K nearest neighbors
response = knn.search([0.1, 0.2, 0.3], field='embedding', top_k=10)

# Similarity threshold
response = knn.similarity([0.1, 0.2, 0.3], field='embedding',
                          min_return=0.7)

# Hybrid (lexical + vector)
response = knn.hybrid('machine learning', [0.1, 0.2, 0.3],
                      field='embedding')

# Re-rank lexical results by vector similarity
response = knn.rerank('machine learning', [0.1, 0.2, 0.3],
                      field='embedding', rerank_docs=100)

SolrCloud (Solr 4.0+)

class solr.SolrCloud(zk, collection, retry_count=3, retry_delay=0.5, **solr_kwargs)

SolrCloud client with leader-aware routing and automatic failover. Two modes of operation: ZooKeeper mode (real-time node discovery) and HTTP mode (CLUSTERSTATUS polling, no ZooKeeper needed).

Parameters:

zk – A SolrZooKeeper instance.
collection – Solr collection name.
retry_count – Number of failover retries (default 3). On each failure, the client reconnects to a different node and retries.
retry_delay – Base delay in seconds between retries (default 0.5). Uses exponential backoff: retry_delay * 2^attempt.
solr_kwargs – Extra keyword arguments forwarded to the underlying Solr connection (e.g. timeout, http_user, http_pass, auth_token, auth, response_format).

classmethod from_urls(urls, collection, retry_count=3, retry_delay=0.5, **solr_kwargs)

Create without ZooKeeper, using HTTP-only CLUSTERSTATUS discovery. The client probes provided URLs to find active nodes and discovers shard leaders via the CLUSTERSTATUS admin API.

Parameters:

urls – List of Solr base URLs, e.g. ['http://solr1:8983/solr', 'http://solr2:8983/solr'].
collection – Solr collection name.
retry_count – Number of failover retries.
retry_delay – Base delay between retries.
solr_kwargs – Forwarded to Solr.

Returns:

A SolrCloud instance in HTTP mode.

Properties:

server_version: Detected Solr server version tuple, e.g. (9, 4, 1).

Read operations (routed to any active replica):

select(*args, **kwargs): Execute a search query with automatic failover. Same parameters as Solr.select().

ping(): Ping the current Solr node.

Write operations (routed to a shard leader):

add(doc, **kwargs): Add a document, routed to a shard leader.

add_many(docs, **kwargs): Add multiple documents, routed to a shard leader.

delete(**kwargs): Delete documents, routed to a shard leader.

delete_query(query, **kwargs): Delete by query, routed to a shard leader.

delete_many(ids, **kwargs): Delete multiple documents by ID, routed to a shard leader.

commit(**kwargs): Commit changes, routed to a shard leader.

optimize(**kwargs): Optimize the index, routed to a shard leader.

close(): Close the underlying Solr connection.

Failover behavior:

When any operation fails, the client:

Logs a WARNING with the attempt number and error.
Waits retry_delay * 2^attempt seconds (exponential backoff).
Reconnects to a different node (leader for writes, any replica for reads).
Retries the operation.
After retry_count retries, raises the last exception.

Example (ZooKeeper mode):

from solr import SolrZooKeeper, SolrCloud

zk = SolrZooKeeper('zk1:2181,zk2:2181,zk3:2181')
cloud = SolrCloud(zk, collection='products',
                  timeout=10, auth_token='my-jwt-token')

# reads go to any active replica
response = cloud.select('category:books', rows=20)

# writes are routed to shard leaders
cloud.add({'id': '1', 'title': 'Solr in Action'}, commit=True)
cloud.delete(id='1', commit=True)

cloud.close()
zk.close()

Example (HTTP-only mode):

from solr import SolrCloud

cloud = SolrCloud.from_urls(
    ['http://solr1:8983/solr', 'http://solr2:8983/solr'],
    collection='products')

response = cloud.select('*:*')
cloud.close()

SolrZooKeeper

class solr.SolrZooKeeper(hosts, timeout=10.0)

ZooKeeper client for SolrCloud node discovery. Reads ZooKeeper state (/live_nodes, /collections/{name}/state.json, /aliases.json) to discover active Solr nodes, shard leaders, and collection aliases.

Requires the kazoo library:

pip install solrpy[cloud]

Parameters:

hosts – ZooKeeper connection string, e.g. 'zk1:2181,zk2:2181,zk3:2181'. Supports chroot paths: 'zk1:2181,zk2:2181/solr'.
timeout – Connection timeout in seconds (default 10.0).

Raises:

ImportError – If kazoo is not installed.

live_nodes()

Return a list of currently active Solr node identifiers as reported by ZooKeeper’s /live_nodes znode.

Returns:: list[str] — Node identifiers, e.g. ['solr1:8983_solr', 'solr2:8983_solr'].

collection_state(collection)

Return the full state dict for a collection. Contains shard info, replica info, router config, and more.

Tries per-collection state.json first (Solr 5+), then falls back to the legacy /clusterstate.json (Solr 4.x).

Parameters:: collection – Collection name (not an alias).
Returns:: dict — State dict with keys like 'shards', 'router', 'maxShardsPerNode', etc.
Raises:: RuntimeError – If the collection is not found in ZooKeeper.

aliases()

Return collection aliases as a dict.

Returns:: dict[str, str] — {alias_name: real_collection_name}. Empty dict if no aliases are defined.

replica_urls(collection)

Return base URLs of all active replicas for a collection. Aliases are resolved automatically.

Parameters:: collection – Collection name or alias.
Returns:: list[str] — Solr base URLs, e.g. ['http://solr1:8983/solr', 'http://solr2:8983/solr'].

leader_urls(collection)

Return base URLs of shard leaders for a collection (one per shard). Aliases are resolved automatically.

Parameters:: collection – Collection name or alias.
Returns:: list[str] — Leader Solr base URLs.

random_url(collection)

Return a random active replica URL for load balancing.

Parameters:: collection – Collection name or alias.
Returns:: str — A Solr base URL.
Raises:: RuntimeError – If no active replicas are found.

random_leader_url(collection)

Return a random shard leader URL for write operations.

Parameters:: collection – Collection name or alias.
Returns:: str — A leader Solr base URL.
Raises:: RuntimeError – If no leaders are found.

close(): Close the ZooKeeper connection. Always call this when done.

Example:

from solr import SolrZooKeeper

zk = SolrZooKeeper('zk1:2181,zk2:2181,zk3:2181')

# Discover nodes
nodes = zk.live_nodes()
print('Active nodes:', nodes)

# Get all replica URLs
replicas = zk.replica_urls('mycore')
print('Replicas:', replicas)

# Get shard leaders
leaders = zk.leader_urls('mycore')
print('Leaders:', leaders)

# Check aliases
aliases = zk.aliases()
print('Aliases:', aliases)  # e.g. {'prod': 'mycore_v2'}

# Get collection state
state = zk.collection_state('mycore')
for shard, data in state['shards'].items():
    print(shard, ':', len(data['replicas']), 'replicas')

zk.close()

Query builders

class solr.Field(name, alias=None)

Structured field expression for the fl parameter.

classmethod func(name, \*args): Function field, e.g. Field.func('sum', 'price', 'tax') → sum(price,tax).

classmethod transformer(name, \*\*params): Document transformer, e.g. Field.transformer('explain') → [explain].

classmethod score(): Score pseudo-field → score.

class solr.Sort(field, direction='asc')

Structured sort clause.

classmethod func(expr, direction='asc'): Function sort, e.g. Sort.func('geodist()', 'asc').

class solr.Facet

Builder for traditional Solr facet parameters.

classmethod field(name, \*\*opts): Field facet with per-field options (mincount, limit, sort, etc.).

classmethod range(name, start, end, gap, \*\*opts): Range facet.

classmethod query(name, q): Query facet.

classmethod pivot(\*fields, mincount=None): Pivot facet.

to_params(): Convert to Solr query parameter dict.

All builders coexist with raw string parameters:

# Raw strings (always works)
conn.select('*:*', fl='id,title', sort='price desc',
            facet='true', facet_field='category')

# Builder objects (optional)
conn.select('*:*',
    fields=[Field('id'), Field('title')],
    sort=[Sort('price', 'desc')],
    facets=[Facet.field('category')],
)

Schema API (Solr 4.2+)

class solr.SchemaAPI(conn)

Created explicitly by the user. All methods require Solr 4.2+.

Parameters:: conn – A Solr or AsyncSolr instance. With AsyncSolr, all methods return coroutines.

Example:

from solr import Solr, AsyncSolr, SchemaAPI

# Sync
conn = Solr('http://localhost:8983/solr/mycore')
schema = SchemaAPI(conn)
fields = schema.fields()

# Async
async with AsyncSolr('http://localhost:8983/solr/mycore') as conn:
    schema = SchemaAPI(conn)
    fields = await schema.fields()

Full schema:

get_schema(): Return the full schema definition as a dict.

Field operations:

fields(): List all fields. Returns a list of field definition dicts.

add_field(name, field_type, **opts)

Add a new field. Example:

conn.schema.add_field('title', 'text_general', stored=True, indexed=True)

replace_field(name, field_type, **opts): Replace an existing field definition.

delete_field(name): Delete a field by name.

Dynamic field operations:

dynamic_fields()

add_dynamic_field(name, field_type, **opts)

delete_dynamic_field(name)

Field type operations:

field_types()

add_field_type(**definition)

replace_field_type(**definition)

delete_field_type(name)

Copy field operations:

copy_fields()

add_copy_field(source, dest, max_chars=None)

delete_copy_field(source, dest)

Streaming Expressions (Solr 5.0+)

Build and execute Solr Streaming Expressions using Python builder functions. Each function returns a StreamExpression node. Nodes can be chained with the | (pipe) operator.

Core classes:

class solr.stream.StreamExpression(func_name, \*args, \*\*kwargs)

A Solr streaming expression node. Renders to a Solr expression string via str(). Supports the | (pipe) operator for chaining: the left-hand expression becomes the first positional argument of the right-hand expression.

Parameters:

func_name – The Solr streaming function name (e.g. "search").
args – Positional arguments (collection names, sub-expressions).
kwargs – Named parameters rendered as key=value pairs. String values containing spaces are auto-quoted.

__or__(other): Pipe operator. Inserts self as the first argument of other and returns other.

__str__(): Render as a Solr streaming expression string, e.g. search(mycore,q="*:*",fl="id,title",sort="id asc").

class solr.stream.AggregateExpression(func_name, field)

An aggregate function for use inside rollup(), stats(), etc. Renders as func(field).

Parameters:

func_name – The aggregate function name (e.g. "sum").
field – The Solr field to aggregate over.

Source expressions:

solr.stream.search(collection, \*\*kwargs)

Build a search() streaming expression.

Parameters:

collection – Solr collection name.
kwargs – Query parameters – q, fl, sort, rows, fq, etc.

solr.stream.facet(collection, \*\*kwargs)

Build a facet() streaming expression.

Parameters:

collection – Solr collection name.
kwargs – Facet parameters – q, buckets, bucketSorts, bucketSizeLimit, etc.

solr.stream.topic(collection, \*\*kwargs)

Build a topic() streaming expression.

Parameters:

collection – Solr collection name.
kwargs – Topic parameters – q, fl, id, checkpointEvery, etc.

Transform expressions:

solr.stream.unique(\*args, \*\*kwargs)

Build a unique() expression. De-duplicates a sorted stream by field.

Parameters:: kwargs – over – the field to de-duplicate on.

solr.stream.top(\*args, \*\*kwargs)

Build a top() expression. Returns the top n tuples by sort order.

Parameters:: kwargs – n – number of tuples to return; sort – sort clause.

solr.stream.sort(\*args, \*\*kwargs)

Build a sort() expression. Re-sorts a stream.

Parameters:: kwargs – by – sort clause.

solr.stream.select(\*args, \*\*kwargs): Build a select() expression. Projects / renames fields.

solr.stream.rollup(\*args, \*\*kwargs)

Build a rollup() expression. Groups a sorted stream and applies aggregate functions.

Parameters:: kwargs – over – field to group by; additional keyword arguments are aggregate expressions (e.g. total=sum('bytes')).

solr.stream.reduce(\*args, \*\*kwargs)

Build a reduce() expression. Groups a stream by field values.

Parameters:: kwargs – by – field to reduce on.

Join expressions:

solr.stream.merge(\*args, \*\*kwargs)

Build a merge() expression. Merges two or more sorted streams.

Parameters:

args – Two or more sub-expressions.
kwargs – on – merge key and direction (e.g. "id asc").

solr.stream.innerJoin(\*args, \*\*kwargs)

Build an innerJoin() expression.

Parameters:

args – Two sub-expressions (left, right).
kwargs – on – join key mapping (e.g. "left.id=right.id").

solr.stream.leftOuterJoin(\*args, \*\*kwargs): Build a leftOuterJoin() expression.

solr.stream.hashJoin(\*args, \*\*kwargs)

Build a hashJoin() expression.

Parameters:: kwargs – on – join key; hashed – the hashed side.

solr.stream.intersect(\*args, \*\*kwargs): Build an intersect() expression. Returns tuples present in both streams.

solr.stream.complement(\*args, \*\*kwargs): Build a complement() expression. Returns tuples in the first stream that are not in the second.

Aggregate functions:

These return AggregateExpression instances for use inside rollup(), stats(), and similar expressions.

solr.stream.count(field): Aggregate: count(field).

solr.stream.sum(field): Aggregate: sum(field).

solr.stream.avg(field): Aggregate: avg(field).

solr.stream.min(field): Aggregate: min(field).

solr.stream.max(field): Aggregate: max(field).

Control expressions:

solr.stream.fetch(\*args, \*\*kwargs)

Build a fetch() expression. Enriches tuples by fetching additional fields from a collection.

Parameters:: kwargs – fl – fields to fetch; on – join key.

solr.stream.parallel(\*args, \*\*kwargs)

Build a parallel() expression. Distributes a stream across workers.

Parameters:: kwargs – workers – number of parallel workers; sort – merge sort clause.

solr.stream.daemon(\*args, \*\*kwargs)

Build a daemon() expression. Wraps a stream to run continuously.

Parameters:: kwargs – id – daemon identifier; runInterval – interval in milliseconds; queueSize – internal queue size.

solr.stream.update(\*args, \*\*kwargs)

Build an update() expression. Sends tuples to a collection as documents.

Parameters:: kwargs – batchSize – number of documents per batch.

solr.stream.commit(\*args, \*\*kwargs): Build a commit() expression. Commits after an update stream.

Execution via Solr / AsyncSolr:

Solr.stream(expr, model=None)

Execute a streaming expression via the /stream handler (Solr 5.0+). Returns a synchronous iterator of result dicts. The final EOF marker tuple is automatically skipped.

Parameters:

expr – A StreamExpression or a raw expression string.
model – Optional Pydantic BaseModel subclass. When provided, each result dict is converted via model.model_validate().

Returns:

Iterator[dict[str, Any]] (or Iterator[Model]).

Raises:

SolrVersionError – If connected Solr is older than 5.0.

Example:

from solr.stream import search, rollup, sum

expr = (search('logs', q='*:*', fl='host,bytes', sort='host asc')
        | rollup(over='host', total=sum('bytes')))

for doc in conn.stream(expr):
    print(doc)

async AsyncSolr.stream(expr, model=None)

Async version of Solr.stream(). Returns an async generator.

Parameters:

expr – A StreamExpression or a raw string.
model – Optional Pydantic model for automatic conversion.

Returns:

Async generator of result dicts (or model instances).

Example:

async for doc in await conn.stream(expr):
    print(doc)

PysolrCompat class

class solr.PysolrCompat(url, **kwargs)

A pysolr-compatible wrapper around Solr. Subclasses Solr so all native solrpy features remain available, while adding method aliases that match the pysolr library’s public API. This allows drop-in migration from pysolr with minimal code changes.

Constructor parameters are identical to Solr.

search(q, **kwargs)

Search for documents. Alias for Solr.select().

Parameters:

q – The query string.
kwargs – Additional Solr parameters (e.g. rows, fq).

Returns:

A Response instance.

add(docs, commit=True, **kwargs)

Add one or more documents. Unlike native Solr.add() (which takes a single dict), this method accepts either a list of dicts or a single dict, matching pysolr behavior.

Parameters:

docs – A list of document dicts, or a single document dict.
commit – Whether to auto-commit after adding. Defaults to True to match pysolr convention.

delete(id=None, q=None, commit=True, **kwargs)

Delete documents by id and/or query. Supports both id= and q= keyword arguments in a single call, matching pysolr’s API.

Parameters:

id – Document id to delete.
q – Query string; all matching documents will be deleted.
commit – Whether to auto-commit after deleting. Defaults to True to match pysolr convention.

extract(file_obj, **kwargs)

Extract/index a rich document via Solr Cell. Creates an Extract companion internally and delegates.

Parameters:

file_obj – A file-like object containing the document bytes.
kwargs – Additional parameters forwarded to Extract.

Returns:

The extraction result.

Example:

from solr import PysolrCompat

conn = PysolrCompat('http://localhost:8983/solr/mycore')
results = conn.search('title:lucene', rows=10)
conn.add([{'id': '1', 'title': 'Hello'}])
conn.delete(id='1')
conn.delete(q='title:Hello')
conn.commit()

AsyncSolr class

class solr.AsyncSolr(url, timeout=None, http_user=None, http_pass=None, post_headers=None, max_retries=3, retry_delay=0.1, always_commit=False, response_format='json', auth_token=None, auth=None, debug=False)

Async Solr client built on httpx.AsyncClient. Provides the same API as Solr but with async/await methods.

Constructor parameters are the same as Solr (except persistent, ssl_key, and ssl_cert are not applicable).

Context manager usage:

AsyncSolr should be used as an async context manager to ensure the underlying HTTP client is properly closed:

from solr import AsyncSolr

async with AsyncSolr('http://localhost:8983/solr/mycore') as conn:
    response = await conn.select('*:*')
    for doc in response.results:
        print(doc['id'])

Attributes:

server_version: Tuple representing the detected Solr version, e.g. (9, 4, 1).

Search methods:

async select(q=None, **params)

Async search query. Same parameters as SearchHandler.

Parameters:

q – Query string.
params – Additional Solr parameters (underscores become dots).
model – Optional Pydantic model class for automatic conversion.

Returns:

A Response instance.

Update methods:

async add(doc, **kwargs)

Add a single document.

Parameters:

doc – Dictionary mapping field names to values.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).

async add_many(docs, **kwargs)

Add multiple documents.

Parameters:

docs – Iterable of document dicts.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).

Delete methods:

async delete(id=None, **kwargs)

Delete a document by id.

Parameters:

id – Unique identifier of the document to delete.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).

async delete_query(q, **kwargs)

Delete documents by query.

Parameters:

q – Solr query string identifying documents to delete.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).

Commit:

async commit(**kwargs)

Commit pending changes.

Parameters:: soft_commit – If True, perform a soft commit (bool).

Real-time Get (Solr 4.0+):

async get(id=None, ids=None, fields=None, model=None)

Retrieve documents from the transaction log.

Parameters:

id – Single document ID.
ids – List of document IDs.
fields – List of field names to return.
model – Optional Pydantic model class.

Returns:

A dict for single id (or None), a list for ids.

Streaming Expressions (Solr 5.0+):

async stream(expr, model=None)

Execute a streaming expression. Returns an async generator of result dicts (or model instances). Skips the final EOF marker.

Parameters:

expr – A StreamExpression or string.
model – Optional Pydantic model for automatic conversion.

Usage:

async for doc in await conn.stream(expr):
    print(doc)

Connection management:

async close(): Close the underlying httpx.AsyncClient.

ping(): Ping the Solr server (synchronous). Returns True if reachable.

Paginator

class solr.SolrPaginator(result, default_page_size=None)

Paginator for a Solr response object. Provides Django-like pagination without any Django dependency.

Parameters:

result – A Response instance from a query.
default_page_size – Override the page size. If not given, uses the rows parameter from the query, or the number of results returned.

count: Total number of matching documents.

num_pages: Total number of pages.

page_range: A range of valid page numbers.

page(page_num=1)

Return a SolrPage for the given page number.

Raises:

PageNotAnInteger – If page_num cannot be converted to int.
EmptyPage – If page_num is out of range.

class solr.SolrPage

A single page of results.

object_list: List of documents on this page.

has_next()

has_previous()

has_other_pages()

next_page_number()

previous_page_number()

start_index()

end_index()

class solr.EmptyPage: Raised when the requested page is out of range. Subclass of ValueError.

class solr.PageNotAnInteger: Raised when the page number is not an integer. Subclass of TypeError.

Response parsing

solrpy provides two response parsers:

solr.core.parse_query_response(data, params, query)

Parse an XML response from Solr (wt=standard or wt=xml).

Parameters:

data – A file-like object containing the XML response.
params – Dictionary of query parameters used for the request.
query – The SearchHandler that issued the query (used for next_batch() / previous_batch()).

Returns:

A Response instance, or None if the response is empty.

This is the default parser used by SearchHandler.

solr.core.parse_json_response(data, params, query)

Parse a JSON response dict from Solr (wt=json).

Parameters:

data – A dictionary (already deserialized from JSON).
params – Dictionary of query parameters used for the request.
query – The SearchHandler that issued the query.

Returns:

A Response instance.

Handles all standard Solr response fields: responseHeader, response (docs, numFound, start, maxScore), and any additional top-level keys such as highlighting, facet_counts, stats, debug, etc. Extra keys are attached directly as Response attributes.

Example usage with a raw JSON query:

import json
import solr
from solr.core import parse_json_response

conn = solr.Solr('http://localhost:8983/solr/mycore')
raw = conn.select.raw(q='*:*', wt='json')
data = json.loads(raw)
response = parse_json_response(data, {'q': '*:*'}, conn.select)

Gzip compression

All requests include an Accept-Encoding: gzip header. When the Solr server returns a gzip-compressed response, it is transparently decompressed before parsing.

This reduces network transfer size, especially for large result sets. No configuration is needed; gzip support is always enabled.

solr.core.read_response(response)

Read an HTTP response body, decompressing gzip if the Content-Encoding header indicates compression.

Parameters:: response – An http.client.HTTPResponse object.
Returns:: Decoded string (UTF-8).