API Reference

Data representation

Solr documents are modeled as Python dictionaries with field names as keys and field values as values.

  • Multi-valued fields use list, tuple, or set as values.

  • datetime.datetime values are converted to UTC.

  • datetime.date values are converted to datetime.datetime at 00:00:00 UTC.

  • bool values are converted to 'true' or 'false'.

  • None values are omitted from the document sent to Solr.

Exceptions

class solr.core.SolrVersionError(feature, required, actual)

Raised when a feature requires a higher Solr version than connected. Subclass of Exception.

feature

Name of the feature that was called (string).

required

Minimum version required as a tuple, e.g. (4, 0).

actual

Detected server version as a tuple, e.g. (3, 6, 2).

Solr class

class solr.Solr(url, persistent=True, timeout=None, ssl_key=None, ssl_cert=None, http_user=None, http_pass=None, post_headers=None, max_retries=3, always_commit=False, debug=False)

Connect to the Solr instance at url. If the Solr instance provides multiple cores, url should point to a specific core.

Constructor parameters:

Parameter

Description

url

URI pointing to the Solr instance (e.g. http://localhost:8983/solr/mycore). A UserWarning is issued if the path does not contain /solr.

persistent

Keep a persistent HTTP connection open. Defaults to True.

timeout

Timeout in seconds for server responses.

ssl_key

Path to PEM key file for SSL client authentication.

ssl_cert

Path to PEM certificate file for SSL client authentication.

http_user

Username for HTTP Basic authentication.

http_pass

Password for HTTP Basic authentication.

post_headers

Dictionary of additional headers to include in all requests.

max_retries

Maximum number of automatic retries on connection errors. Defaults to 3.

retry_delay

Base delay in seconds between retries. Uses exponential backoff: first retry waits retry_delay, second waits retry_delay * 2, etc. Defaults to 0.1. Each retry is logged at WARNING level.

always_commit

If True, all update methods (add, add_many, delete, etc.) will automatically commit changes. Individual calls can override this by passing commit=False. Defaults to False.

response_format

Response format for queries: 'json' (default) or 'xml'. When 'json', queries use wt=json and the JSON parser. Use 'xml' for legacy compatibility with older code.

auth_token

Bearer token string. Sends Authorization: Bearer <token> header. Takes priority over http_user/http_pass.

auth

A callable returning a dict[str, str] of headers. Called on every request, enabling dynamic token refresh (e.g., OAuth2). Takes priority over auth_token and http_user/http_pass.

debug

If True, log all requests and responses.

Attributes:

server_version

Tuple representing the detected Solr version, e.g. (9, 4, 1). Automatically populated during initialization.

always_commit

Boolean indicating whether update methods auto-commit by default.

select

A SearchHandler instance bound to the /select endpoint.

Health check:

ping()

Ping the Solr server to check if it is reachable.

Returns True if the server responds to /admin/ping, False otherwise. Tries both the core path and its parent path.

Works on all Solr versions (1.2+).

Example:

conn = solr.Solr('http://localhost:8983/solr/mycore')
if conn.ping():
    print('Solr is up')

Search methods:

The select attribute is the primary search interface. See SearchHandler for details:

response = conn.select('title:lucene')

Update methods:

Atomic update methods (Solr 4.0+):

atomic_update(doc, commit=False)

Partial update of a single document. Field values can be plain values or dicts with a modifier key: set, add, remove, removeregex (Solr 5.0+), inc. Use {'set': None} to remove a field.

Example:

conn.atomic_update({
    'id': 'doc1',
    'title': {'set': 'New Title'},
    'count': {'inc': 1},
    'old_field': {'set': None},  # remove field
}, commit=True)
atomic_update_many(docs, commit=False)

Partial update of multiple documents. Same modifier syntax as atomic_update.

Real-time Get (Solr 4.0+):

get(id=None, ids=None, fields=None)

Retrieve documents directly from the transaction log without waiting for a commit. Returns a dict for single id (or None if not found), or a list for ids.

Parameters:
  • id – Single document ID.

  • ids – List of document IDs.

  • fields – Optional list of fields to return.

Cursor pagination (Solr 4.7+):

iter_cursor(q, sort, rows=100, **params)

Generator that yields Response objects for each batch of cursor-paginated results. Stops when all results are consumed.

Parameters:
  • q – Query string.

  • sort – Sort clause (must include uniqueKey field).

  • rows – Batch size per request.

Raises:

ValueError – If sort is not provided.

MoreLikeThis (Solr 4.0+):

Create a MoreLikeThis instance:

from solr import MoreLikeThis

mlt = MoreLikeThis(conn)
response = mlt('interesting text', fl='title,body')
class solr.MoreLikeThis(conn)

Find similar documents using Solr’s /mlt handler.

Parameters:

conn – A Solr instance.

__call__(q=None, **params)

Query the MLT handler. Same parameters as SearchHandler.

raw(**params)

Issue a raw MLT query.

Delete methods:

Commit and optimize:

Connection management:

Commit-control arguments

Several update methods support optional keyword arguments to control commits. These arguments are always optional; when always_commit is False (the default), no commit is performed unless explicitly requested.

Argument

Description

commit

If True, commit changes before returning. When always_commit is True on the connection, this defaults to True but can be overridden with commit=False.

optimize

If True, optimize the index before returning (implies commit=True).

wait_flush

Block until the commit is flushed to disk. Defaults to True.

wait_searcher

Block until searcher objects are warmed. Defaults to True.

If wait_flush or wait_searcher are specified without commit or optimize, a TypeError is raised.

Methods that support commit-control arguments: add, add_many, delete, delete_many, delete_query.

All update methods and SearchHandler calls also accept a timeout keyword argument to override the connection-level timeout for that individual request.

SearchHandler class

class solr.SearchHandler(connection, path='/select', arg_separator='_')

Provides access to a named Solr request handler. The select attribute on Solr instances is a SearchHandler bound to /select.

Create handlers for custom endpoints:

import solr

conn = solr.Solr('http://localhost:8983/solr/mycore')
my_handler = solr.SearchHandler(conn, '/my_handler')
response = my_handler('some query')
SearchHandler.__call__(q=None, fields=None, highlight=None, score=True, sort=None, sort_order='asc', **params)

Execute a search query against Solr.

Parameters:
  • q – Query string.

  • fields – Fields to return. String or iterable. Defaults to '*'.

  • highlightFalse (default), True, or a list of field names. When enabled, the response gets a highlighting attribute — a dict keyed by document ID, where each value is a dict of field names to lists of highlighted snippets (e.g. {'doc1': {'title': ['<em>Lucene</em> in Action']}}). Customize with hl_simple_pre, hl_simple_post, hl_fragsize, hl_snippets, etc. via **params.

  • score – Include score in results. Defaults to True.

  • sort – Fields to sort by. String or iterable.

  • sort_order – Default sort direction ('asc' or 'desc').

  • json_facet – JSON Facet API dict (Solr 5.0+). Serialized to json.facet query parameter automatically.

  • params – Additional Solr parameters (use underscores for dots).

  • timeout – Per-request timeout in seconds (overrides connection-level timeout).

Returns:

A Response instance.

Raises:

ValueError – If highlight=True but no fields are specified, or if sort_order is invalid.

SearchHandler.raw(**params)

Issue a raw query. No processing is performed on parameters or responses. Returns the raw response text.

Response class

class solr.Response

Container for query results.

Attributes:

header

Dictionary containing response header values (status, QTime, params).

results

A Results list of matching documents. Each document is a dictionary of field names to values.

numFound

Total number of matching documents.

start

Starting offset of the current result set.

maxScore

Maximum relevance score across all matches.

facet_counts

Dictionary containing traditional facet results when facet=true is used. Contains keys like facet_fields, facet_queries, facet_ranges, facet_pivots, etc.

facets

Dictionary containing JSON Facet API results when json.facet is used (Solr 5.0+). Contains the structured facet buckets returned by Solr’s JSON faceting.

stats

Dictionary containing field statistics when stats=true is used. Contains per-field stats such as min, max, count, mean, etc.

debug

Dictionary containing debug information when debug=true (or debugQuery=true) is used. Contains parsed query, explain info, timing data, etc.

Note

Any top-level key in the Solr JSON response that is not responseHeader or response is automatically set as an attribute on the Response object. This includes highlighting, facet_counts, facets, stats, debug, nextCursorMark, grouped, and any other component output. You can access them as response.key_name.

Pydantic models (opt-in):

as_models(model)

Convert result documents to Pydantic BaseModel instances. Requires pydantic (pip install solrpy[pydantic]).

Parameters:

model – A Pydantic BaseModel subclass.

Returns:

List of model instances.

The model= parameter on select() and get() does this automatically:

resp = conn.select('*:*', model=MyDoc)  # results are list[MyDoc]
doc = conn.get(id='1', model=MyDoc)      # MyDoc | None

Cursor pagination (Solr 4.7+):

cursor_next()

Follow cursor-based pagination. Returns the next page of results, or None if no more results (nextCursorMark == cursorMark) or if the query did not use cursorMark.

Example:

resp = conn.select('*:*', sort='id asc', cursorMark='*', rows=100)
while resp:
    process(resp.results)
    resp = resp.cursor_next()

Offset pagination methods:

next_batch()

Fetch the next batch of results. Returns a new Response, or None if there are no more results.

previous_batch()

Fetch the previous batch of results. Returns a new Response, or None if this is the first batch.

Grouping (Solr 3.3+):

grouped

A GroupedResult object when the response contains grouped results, otherwise not present. Enable grouping with group='true' and group_field='field'.

Example:

resp = conn.select('*:*', group='true', group_field='category',
                   group_limit=5, group_ngroups='true')
for group in resp.grouped['category'].groups:
    print(group.groupValue, len(group.doclist))

Spellcheck (Solr 1.4+):

spellcheck

A SpellcheckResult object if the response contains spellcheck data, otherwise None. Spellcheck data is returned when you include spellcheck='true' in the query parameters.

Example:

resp = conn.select('misspeled query', spellcheck='true',
                   spellcheck_collate='true')
if resp.spellcheck and not resp.spellcheck.correctly_spelled:
    print('Did you mean:', resp.spellcheck.collation)

Iteration:

Response objects support len() and iteration:

response = conn.select('*:*')
print(len(response))
for doc in response:
    print(doc['id'])

SpellcheckResult class (Solr 1.4+)

class solr.SpellcheckResult(raw)

Wrapper around the raw spellcheck response dict. Returned by Response.spellcheck when the query includes spellcheck=true.

Parameters:

raw – The raw spellcheck dict from the Solr response.

correctly_spelled

True if all query terms were spelled correctly.

collation

The corrected full query string suggested by Solr (collation), or None if not present. Requires spellcheck.collate=true on the request.

suggestions

List of per-word suggestion entries. Each entry is a dict that includes an 'original' key (the misspelled word) merged with the Solr info dict ('numFound', 'startOffset', 'endOffset', 'suggestion' list, etc.).

Example:

for entry in resp.spellcheck.suggestions:
    print(entry['original'], '->', entry.get('suggestion', []))

Extract class (Solr 1.4+)

class solr.Extract(conn)

Index or extract rich documents via Solr Cell (Apache Tika) using the /update/extract handler. The handler must be configured in solrconfig.xml.

Parameters:

conn – A Solr or AsyncSolr instance. With AsyncSolr, methods return coroutines.

__call__(file_obj, content_type='application/octet-stream', commit=False, **params)

Index a rich document.

Parameters:
  • file_obj – Binary file-like object (opened in 'rb' mode).

  • content_type – MIME type of the document. Defaults to 'application/octet-stream'.

  • commit – Commit to the index immediately. Defaults to False.

  • params – Additional Solr parameters. The first underscore in each key is replaced with a dot: literal_id='x'literal.id=x. Field names with underscores are preserved: literal_my_field='v'literal.my_field=v.

Returns:

Parsed JSON response dict (contains responseHeader).

Raises:

SolrVersionError – If the server is older than Solr 1.4.

Example:

from solr import Solr, Extract

conn = Solr('http://localhost:8983/solr/mycore')
ext = Extract(conn)

with open('report.pdf', 'rb') as f:
    ext(f, content_type='application/pdf',
        literal_id='report1', literal_title='Annual Report',
        commit=True)
extract_only(file_obj, content_type='application/octet-stream', **params)

Extract text and metadata without indexing.

Calls /update/extract with extractOnly=true.

Returns:

(text, metadata) tuple. text is the extracted plain text; metadata is a dict of Tika metadata (e.g. 'Content-Type', 'Author', 'title').

from_path(file_path, **params)

Index a document from a filesystem path. MIME type is guessed from the file extension via mimetypes; falls back to 'application/octet-stream'.

Parameters:
  • file_path – Path to the file.

  • params – Forwarded to __call__() (commit, literal_*, etc.).

extract_from_path(file_path, **params)

Extract text and metadata from a file path without indexing.

Returns:

(text, metadata) tuple (same as extract_only()).

Suggest class (Solr 4.7+)

class solr.Suggest(conn)

Query Solr’s SuggestComponent via the /suggest handler.

The /suggest handler and at least one SuggestComponent must be configured in solrconfig.xml.

Parameters:

conn – A Solr or AsyncSolr instance. With AsyncSolr, methods return coroutines.

__call__(q, dictionary=None, count=10, **params)

Return a flat list of suggestion dicts for the query term.

Parameters:
  • q – Partial query string to suggest for.

  • dictionary – Name of the suggester dictionary to use. If None, Solr uses the default suggester.

  • count – Maximum number of suggestions to return. Defaults to 10.

  • params – Extra parameters forwarded verbatim to /suggest.

Returns:

List of suggestion dicts. Each dict typically has 'term', 'weight', and 'payload' keys.

Raises:

SolrVersionError – If the server is older than Solr 4.7.

Example:

from solr import Solr, Suggest

conn = Solr('http://localhost:8983/solr/mycore')
suggest = Suggest(conn)
results = suggest('que', dictionary='mySuggester', count=5)
for s in results:
    print(s['term'], s['weight'])

Grouping classes (Solr 3.3+)

class solr.GroupedResult(raw)

Wrapper around a Solr grouped response. Supports subscript access by field name, iteration, in, and len().

__getitem__(field)

Return a GroupField for the given field name.

class solr.GroupField(raw)

Grouped results for a single field.

matches

Total number of documents matching the query across all groups.

ngroups

Number of distinct groups, or None if group.ngroups was not requested.

groups

List of Group objects.

class solr.Group(raw)

A single group.

groupValue

The field value that defines this group, or None.

doclist

A Results list of matching documents, with numFound and start attributes.

KNN / Dense Vector Search (Solr 9.0+)

class solr.KNN(conn)

Dense Vector / KNN Search using Solr’s {!knn} and {!vectorSimilarity} query parsers. Supports top-K search, similarity threshold search, hybrid (lexical + vector) search, and re-ranking. Created explicitly by the user.

Parameters:

conn – A Solr or AsyncSolr instance. With AsyncSolr, execution methods return coroutines.

Execution methods:

search(vector, field, top_k=10, filters=None, early_termination=False, saturation_threshold=None, patience=None, ef_search_scale_factor=None, seed_query=None, pre_filter=None, **params)

Execute a {!knn} search query (top-K nearest neighbors).

Parameters:
  • vector – Dense vector as a sequence of floats.

  • field – Name of the DenseVectorField to search.

  • top_k – Number of nearest neighbors to retrieve. Defaults to 10.

  • filters – Filter query (fq parameter).

  • early_termination – Enable HNSW early termination optimization.

  • saturation_threshold – Queue saturation cutoff for early termination (float between 0 and 1).

  • patience – Iteration limit for early termination (integer).

  • ef_search_scale_factor – Candidate examination multiplier (Solr 10.0+). Raises SolrVersionError if Solr < 10.0.

  • seed_query – Lexical query string to guide the vector search entry point.

  • pre_filter – Explicit pre-filter query string(s). A single string or a list of strings.

  • params – Additional Solr parameters.

Returns:

A Response instance (sync) or coroutine (async).

Raises:

SolrVersionError – If connected Solr is older than 9.0.

similarity(vector, field, min_return, min_traverse=None, pre_filter=None, filters=None, **params)

Execute a {!vectorSimilarity} search. Returns all documents whose similarity to the vector exceeds min_return.

Parameters:
  • vector – Dense vector as a sequence of floats.

  • field – Name of the DenseVectorField.

  • min_return – Minimum similarity threshold for results (float).

  • min_traverse – Minimum similarity to continue graph traversal (float). Can improve performance by pruning low-similarity branches.

  • pre_filter – Explicit pre-filter query string(s).

  • filters – Filter query (fq parameter).

  • params – Additional Solr parameters.

Returns:

A Response instance.

Raises:

SolrVersionError – If connected Solr is older than 9.0.

hybrid(text_query, vector, field, min_return=0.5, **params)

Execute a hybrid (lexical OR vector) search. Combines a standard text query with a {!vectorSimilarity} query using an OR clause.

Parameters:
  • text_query – The lexical search query string.

  • vector – Dense vector for similarity matching.

  • field – Name of the DenseVectorField.

  • min_return – Minimum similarity threshold for the vector part. Defaults to 0.5.

  • params – Additional Solr parameters.

Returns:

A Response instance.

Raises:

SolrVersionError – If connected Solr is older than 9.0.

rerank(query, vector, field, top_k=10, rerank_docs=100, rerank_weight=1.0, **params)

Execute a lexical query re-ranked by vector similarity. Uses Solr’s {!rerank} query parser to re-score the top lexical results with a {!knn} query.

Parameters:
  • query – The base lexical query string.

  • vector – Dense vector for re-ranking.

  • field – Name of the DenseVectorField.

  • top_k – topK for the KNN re-rank query. Defaults to 10.

  • rerank_docs – Number of top lexical docs to re-rank. Defaults to 100.

  • rerank_weight – Weight of the vector score in the final ranking. Defaults to 1.0.

  • params – Additional Solr parameters.

Returns:

A Response instance.

Raises:

SolrVersionError – If connected Solr is older than 9.0.

__call__(vector, field, top_k=10, **params)

Shortcut for search().

Query builder methods:

build_knn_query(vector, field, top_k=10, early_termination=False, saturation_threshold=None, patience=None, ef_search_scale_factor=None, seed_query=None, pre_filter=None, include_tags=None, exclude_tags=None)

Build a {!knn} query string without executing it.

Parameters:
  • vector – Dense vector as a sequence of floats.

  • field – Name of the DenseVectorField to search.

  • top_k – Number of nearest neighbors to retrieve.

  • early_termination – Enable HNSW early termination.

  • saturation_threshold – Queue saturation cutoff (float).

  • patience – Iteration limit for early termination (int).

  • ef_search_scale_factor – Candidate examination multiplier (Solr 10.0+).

  • seed_query – Lexical query to guide vector search entry point.

  • pre_filter – Pre-filter query string(s) (string or list of strings).

  • include_tags – Only use fq filters with these tags.

  • exclude_tags – Exclude fq filters with these tags.

Returns:

The KNN query string, e.g. {!knn f=embedding topK=10}[0.1,0.2,...].

build_similarity_query(vector, field, min_return, min_traverse=None, pre_filter=None)

Build a {!vectorSimilarity} query string without executing it.

Parameters:
  • vector – Dense vector as a sequence of floats.

  • field – Name of the DenseVectorField.

  • min_return – Minimum similarity threshold for results.

  • min_traverse – Minimum similarity to continue traversal.

  • pre_filter – Pre-filter query string(s).

Returns:

The vectorSimilarity query string.

build_hybrid_query(text_query, vector, field, min_return=0.5)

Build a hybrid (lexical OR vector) query string without executing it.

Parameters:
  • text_query – The lexical search query.

  • vector – Dense vector for similarity matching.

  • field – Name of the DenseVectorField.

  • min_return – Minimum similarity threshold for the vector part.

Returns:

A combined OR query string.

build_rerank_params(vector, field, top_k=10, rerank_docs=100, rerank_weight=1.0)

Build re-ranking parameters for use with a lexical base query.

Parameters:
  • vector – Dense vector for re-ranking.

  • field – Name of the DenseVectorField.

  • top_k – topK for the KNN re-rank query.

  • rerank_docs – Number of top lexical docs to re-rank.

  • rerank_weight – Weight of vector score in final ranking.

Returns:

A dict with rq and rqq keys ready to pass as query parameters.

build_query(vector, field, top_k=10, ef_search_scale_factor=None)

Alias for build_knn_query() (backward compatibility).

Example:

from solr import Solr, KNN

conn = Solr('http://localhost:8983/solr/mycore')
knn = KNN(conn)

# Top-K nearest neighbors
response = knn.search([0.1, 0.2, 0.3], field='embedding', top_k=10)

# Similarity threshold
response = knn.similarity([0.1, 0.2, 0.3], field='embedding',
                          min_return=0.7)

# Hybrid (lexical + vector)
response = knn.hybrid('machine learning', [0.1, 0.2, 0.3],
                      field='embedding')

# Re-rank lexical results by vector similarity
response = knn.rerank('machine learning', [0.1, 0.2, 0.3],
                      field='embedding', rerank_docs=100)

SolrCloud (Solr 4.0+)

class solr.SolrCloud(zk, collection, retry_count=3, retry_delay=0.5, **solr_kwargs)

SolrCloud client with leader-aware routing and automatic failover. Two modes of operation: ZooKeeper mode (real-time node discovery) and HTTP mode (CLUSTERSTATUS polling, no ZooKeeper needed).

Parameters:
  • zk – A SolrZooKeeper instance.

  • collection – Solr collection name.

  • retry_count – Number of failover retries (default 3). On each failure, the client reconnects to a different node and retries.

  • retry_delay – Base delay in seconds between retries (default 0.5). Uses exponential backoff: retry_delay * 2^attempt.

  • solr_kwargs – Extra keyword arguments forwarded to the underlying Solr connection (e.g. timeout, http_user, http_pass, auth_token, auth, response_format).

classmethod from_urls(urls, collection, retry_count=3, retry_delay=0.5, **solr_kwargs)

Create without ZooKeeper, using HTTP-only CLUSTERSTATUS discovery. The client probes provided URLs to find active nodes and discovers shard leaders via the CLUSTERSTATUS admin API.

Parameters:
  • urls – List of Solr base URLs, e.g. ['http://solr1:8983/solr', 'http://solr2:8983/solr'].

  • collection – Solr collection name.

  • retry_count – Number of failover retries.

  • retry_delay – Base delay between retries.

  • solr_kwargs – Forwarded to Solr.

Returns:

A SolrCloud instance in HTTP mode.

Properties:

server_version

Detected Solr server version tuple, e.g. (9, 4, 1).

Read operations (routed to any active replica):

select(*args, **kwargs)

Execute a search query with automatic failover. Same parameters as Solr.select().

ping()

Ping the current Solr node.

Write operations (routed to a shard leader):

add(doc, **kwargs)

Add a document, routed to a shard leader.

add_many(docs, **kwargs)

Add multiple documents, routed to a shard leader.

delete(**kwargs)

Delete documents, routed to a shard leader.

delete_query(query, **kwargs)

Delete by query, routed to a shard leader.

delete_many(ids, **kwargs)

Delete multiple documents by ID, routed to a shard leader.

commit(**kwargs)

Commit changes, routed to a shard leader.

optimize(**kwargs)

Optimize the index, routed to a shard leader.

close()

Close the underlying Solr connection.

Failover behavior:

When any operation fails, the client:

  1. Logs a WARNING with the attempt number and error.

  2. Waits retry_delay * 2^attempt seconds (exponential backoff).

  3. Reconnects to a different node (leader for writes, any replica for reads).

  4. Retries the operation.

  5. After retry_count retries, raises the last exception.

Example (ZooKeeper mode):

from solr import SolrZooKeeper, SolrCloud

zk = SolrZooKeeper('zk1:2181,zk2:2181,zk3:2181')
cloud = SolrCloud(zk, collection='products',
                  timeout=10, auth_token='my-jwt-token')

# reads go to any active replica
response = cloud.select('category:books', rows=20)

# writes are routed to shard leaders
cloud.add({'id': '1', 'title': 'Solr in Action'}, commit=True)
cloud.delete(id='1', commit=True)

cloud.close()
zk.close()

Example (HTTP-only mode):

from solr import SolrCloud

cloud = SolrCloud.from_urls(
    ['http://solr1:8983/solr', 'http://solr2:8983/solr'],
    collection='products')

response = cloud.select('*:*')
cloud.close()

SolrZooKeeper

class solr.SolrZooKeeper(hosts, timeout=10.0)

ZooKeeper client for SolrCloud node discovery. Reads ZooKeeper state (/live_nodes, /collections/{name}/state.json, /aliases.json) to discover active Solr nodes, shard leaders, and collection aliases.

Requires the kazoo library:

pip install solrpy[cloud]
Parameters:
  • hosts – ZooKeeper connection string, e.g. 'zk1:2181,zk2:2181,zk3:2181'. Supports chroot paths: 'zk1:2181,zk2:2181/solr'.

  • timeout – Connection timeout in seconds (default 10.0).

Raises:

ImportError – If kazoo is not installed.

live_nodes()

Return a list of currently active Solr node identifiers as reported by ZooKeeper’s /live_nodes znode.

Returns:

list[str] — Node identifiers, e.g. ['solr1:8983_solr', 'solr2:8983_solr'].

collection_state(collection)

Return the full state dict for a collection. Contains shard info, replica info, router config, and more.

Tries per-collection state.json first (Solr 5+), then falls back to the legacy /clusterstate.json (Solr 4.x).

Parameters:

collection – Collection name (not an alias).

Returns:

dict — State dict with keys like 'shards', 'router', 'maxShardsPerNode', etc.

Raises:

RuntimeError – If the collection is not found in ZooKeeper.

aliases()

Return collection aliases as a dict.

Returns:

dict[str, str]{alias_name: real_collection_name}. Empty dict if no aliases are defined.

replica_urls(collection)

Return base URLs of all active replicas for a collection. Aliases are resolved automatically.

Parameters:

collection – Collection name or alias.

Returns:

list[str] — Solr base URLs, e.g. ['http://solr1:8983/solr', 'http://solr2:8983/solr'].

leader_urls(collection)

Return base URLs of shard leaders for a collection (one per shard). Aliases are resolved automatically.

Parameters:

collection – Collection name or alias.

Returns:

list[str] — Leader Solr base URLs.

random_url(collection)

Return a random active replica URL for load balancing.

Parameters:

collection – Collection name or alias.

Returns:

str — A Solr base URL.

Raises:

RuntimeError – If no active replicas are found.

random_leader_url(collection)

Return a random shard leader URL for write operations.

Parameters:

collection – Collection name or alias.

Returns:

str — A leader Solr base URL.

Raises:

RuntimeError – If no leaders are found.

close()

Close the ZooKeeper connection. Always call this when done.

Example:

from solr import SolrZooKeeper

zk = SolrZooKeeper('zk1:2181,zk2:2181,zk3:2181')

# Discover nodes
nodes = zk.live_nodes()
print('Active nodes:', nodes)

# Get all replica URLs
replicas = zk.replica_urls('mycore')
print('Replicas:', replicas)

# Get shard leaders
leaders = zk.leader_urls('mycore')
print('Leaders:', leaders)

# Check aliases
aliases = zk.aliases()
print('Aliases:', aliases)  # e.g. {'prod': 'mycore_v2'}

# Get collection state
state = zk.collection_state('mycore')
for shard, data in state['shards'].items():
    print(shard, ':', len(data['replicas']), 'replicas')

zk.close()

Query builders

class solr.Field(name, alias=None)

Structured field expression for the fl parameter.

classmethod func(name, \*args)

Function field, e.g. Field.func('sum', 'price', 'tax')sum(price,tax).

classmethod transformer(name, \*\*params)

Document transformer, e.g. Field.transformer('explain')[explain].

classmethod score()

Score pseudo-field → score.

class solr.Sort(field, direction='asc')

Structured sort clause.

classmethod func(expr, direction='asc')

Function sort, e.g. Sort.func('geodist()', 'asc').

class solr.Facet

Builder for traditional Solr facet parameters.

classmethod field(name, \*\*opts)

Field facet with per-field options (mincount, limit, sort, etc.).

classmethod range(name, start, end, gap, \*\*opts)

Range facet.

classmethod query(name, q)

Query facet.

classmethod pivot(\*fields, mincount=None)

Pivot facet.

to_params()

Convert to Solr query parameter dict.

All builders coexist with raw string parameters:

# Raw strings (always works)
conn.select('*:*', fl='id,title', sort='price desc',
            facet='true', facet_field='category')

# Builder objects (optional)
conn.select('*:*',
    fields=[Field('id'), Field('title')],
    sort=[Sort('price', 'desc')],
    facets=[Facet.field('category')],
)

Schema API (Solr 4.2+)

class solr.SchemaAPI(conn)

Created explicitly by the user. All methods require Solr 4.2+.

Parameters:

conn – A Solr or AsyncSolr instance. With AsyncSolr, all methods return coroutines.

Example:

from solr import Solr, AsyncSolr, SchemaAPI

# Sync
conn = Solr('http://localhost:8983/solr/mycore')
schema = SchemaAPI(conn)
fields = schema.fields()

# Async
async with AsyncSolr('http://localhost:8983/solr/mycore') as conn:
    schema = SchemaAPI(conn)
    fields = await schema.fields()

Full schema:

get_schema()

Return the full schema definition as a dict.

Field operations:

fields()

List all fields. Returns a list of field definition dicts.

add_field(name, field_type, **opts)

Add a new field. Example:

conn.schema.add_field('title', 'text_general', stored=True, indexed=True)
replace_field(name, field_type, **opts)

Replace an existing field definition.

delete_field(name)

Delete a field by name.

Dynamic field operations:

dynamic_fields()
add_dynamic_field(name, field_type, **opts)
delete_dynamic_field(name)

Field type operations:

field_types()
add_field_type(**definition)
replace_field_type(**definition)
delete_field_type(name)

Copy field operations:

copy_fields()
add_copy_field(source, dest, max_chars=None)
delete_copy_field(source, dest)

Streaming Expressions (Solr 5.0+)

Build and execute Solr Streaming Expressions using Python builder functions. Each function returns a StreamExpression node. Nodes can be chained with the | (pipe) operator.

Core classes:

class solr.stream.StreamExpression(func_name, \*args, \*\*kwargs)

A Solr streaming expression node. Renders to a Solr expression string via str(). Supports the | (pipe) operator for chaining: the left-hand expression becomes the first positional argument of the right-hand expression.

Parameters:
  • func_name – The Solr streaming function name (e.g. "search").

  • args – Positional arguments (collection names, sub-expressions).

  • kwargs – Named parameters rendered as key=value pairs. String values containing spaces are auto-quoted.

__or__(other)

Pipe operator. Inserts self as the first argument of other and returns other.

__str__()

Render as a Solr streaming expression string, e.g. search(mycore,q="*:*",fl="id,title",sort="id asc").

class solr.stream.AggregateExpression(func_name, field)

An aggregate function for use inside rollup(), stats(), etc. Renders as func(field).

Parameters:
  • func_name – The aggregate function name (e.g. "sum").

  • field – The Solr field to aggregate over.

Source expressions:

solr.stream.search(collection, \*\*kwargs)

Build a search() streaming expression.

Parameters:
  • collection – Solr collection name.

  • kwargs – Query parameters – q, fl, sort, rows, fq, etc.

solr.stream.facet(collection, \*\*kwargs)

Build a facet() streaming expression.

Parameters:
  • collection – Solr collection name.

  • kwargs – Facet parameters – q, buckets, bucketSorts, bucketSizeLimit, etc.

solr.stream.topic(collection, \*\*kwargs)

Build a topic() streaming expression.

Parameters:
  • collection – Solr collection name.

  • kwargs – Topic parameters – q, fl, id, checkpointEvery, etc.

Transform expressions:

solr.stream.unique(\*args, \*\*kwargs)

Build a unique() expression. De-duplicates a sorted stream by field.

Parameters:

kwargsover – the field to de-duplicate on.

solr.stream.top(\*args, \*\*kwargs)

Build a top() expression. Returns the top n tuples by sort order.

Parameters:

kwargsn – number of tuples to return; sort – sort clause.

solr.stream.sort(\*args, \*\*kwargs)

Build a sort() expression. Re-sorts a stream.

Parameters:

kwargsby – sort clause.

solr.stream.select(\*args, \*\*kwargs)

Build a select() expression. Projects / renames fields.

solr.stream.rollup(\*args, \*\*kwargs)

Build a rollup() expression. Groups a sorted stream and applies aggregate functions.

Parameters:

kwargsover – field to group by; additional keyword arguments are aggregate expressions (e.g. total=sum('bytes')).

solr.stream.reduce(\*args, \*\*kwargs)

Build a reduce() expression. Groups a stream by field values.

Parameters:

kwargsby – field to reduce on.

Join expressions:

solr.stream.merge(\*args, \*\*kwargs)

Build a merge() expression. Merges two or more sorted streams.

Parameters:
  • args – Two or more sub-expressions.

  • kwargson – merge key and direction (e.g. "id asc").

solr.stream.innerJoin(\*args, \*\*kwargs)

Build an innerJoin() expression.

Parameters:
  • args – Two sub-expressions (left, right).

  • kwargson – join key mapping (e.g. "left.id=right.id").

solr.stream.leftOuterJoin(\*args, \*\*kwargs)

Build a leftOuterJoin() expression.

solr.stream.hashJoin(\*args, \*\*kwargs)

Build a hashJoin() expression.

Parameters:

kwargson – join key; hashed – the hashed side.

solr.stream.intersect(\*args, \*\*kwargs)

Build an intersect() expression. Returns tuples present in both streams.

solr.stream.complement(\*args, \*\*kwargs)

Build a complement() expression. Returns tuples in the first stream that are not in the second.

Aggregate functions:

These return AggregateExpression instances for use inside rollup(), stats(), and similar expressions.

solr.stream.count(field)

Aggregate: count(field).

solr.stream.sum(field)

Aggregate: sum(field).

solr.stream.avg(field)

Aggregate: avg(field).

solr.stream.min(field)

Aggregate: min(field).

solr.stream.max(field)

Aggregate: max(field).

Control expressions:

solr.stream.fetch(\*args, \*\*kwargs)

Build a fetch() expression. Enriches tuples by fetching additional fields from a collection.

Parameters:

kwargsfl – fields to fetch; on – join key.

solr.stream.parallel(\*args, \*\*kwargs)

Build a parallel() expression. Distributes a stream across workers.

Parameters:

kwargsworkers – number of parallel workers; sort – merge sort clause.

solr.stream.daemon(\*args, \*\*kwargs)

Build a daemon() expression. Wraps a stream to run continuously.

Parameters:

kwargsid – daemon identifier; runInterval – interval in milliseconds; queueSize – internal queue size.

solr.stream.update(\*args, \*\*kwargs)

Build an update() expression. Sends tuples to a collection as documents.

Parameters:

kwargsbatchSize – number of documents per batch.

solr.stream.commit(\*args, \*\*kwargs)

Build a commit() expression. Commits after an update stream.

Execution via Solr / AsyncSolr:

Solr.stream(expr, model=None)

Execute a streaming expression via the /stream handler (Solr 5.0+). Returns a synchronous iterator of result dicts. The final EOF marker tuple is automatically skipped.

Parameters:
  • expr – A StreamExpression or a raw expression string.

  • model – Optional Pydantic BaseModel subclass. When provided, each result dict is converted via model.model_validate().

Returns:

Iterator[dict[str, Any]] (or Iterator[Model]).

Raises:

SolrVersionError – If connected Solr is older than 5.0.

Example:

from solr.stream import search, rollup, sum

expr = (search('logs', q='*:*', fl='host,bytes', sort='host asc')
        | rollup(over='host', total=sum('bytes')))

for doc in conn.stream(expr):
    print(doc)
async AsyncSolr.stream(expr, model=None)

Async version of Solr.stream(). Returns an async generator.

Parameters:
  • expr – A StreamExpression or a raw string.

  • model – Optional Pydantic model for automatic conversion.

Returns:

Async generator of result dicts (or model instances).

Example:

async for doc in await conn.stream(expr):
    print(doc)

PysolrCompat class

class solr.PysolrCompat(url, **kwargs)

A pysolr-compatible wrapper around Solr. Subclasses Solr so all native solrpy features remain available, while adding method aliases that match the pysolr library’s public API. This allows drop-in migration from pysolr with minimal code changes.

Constructor parameters are identical to Solr.

search(q, **kwargs)

Search for documents. Alias for Solr.select().

Parameters:
  • q – The query string.

  • kwargs – Additional Solr parameters (e.g. rows, fq).

Returns:

A Response instance.

add(docs, commit=True, **kwargs)

Add one or more documents. Unlike native Solr.add() (which takes a single dict), this method accepts either a list of dicts or a single dict, matching pysolr behavior.

Parameters:
  • docs – A list of document dicts, or a single document dict.

  • commit – Whether to auto-commit after adding. Defaults to True to match pysolr convention.

delete(id=None, q=None, commit=True, **kwargs)

Delete documents by id and/or query. Supports both id= and q= keyword arguments in a single call, matching pysolr’s API.

Parameters:
  • id – Document id to delete.

  • q – Query string; all matching documents will be deleted.

  • commit – Whether to auto-commit after deleting. Defaults to True to match pysolr convention.

extract(file_obj, **kwargs)

Extract/index a rich document via Solr Cell. Creates an Extract companion internally and delegates.

Parameters:
  • file_obj – A file-like object containing the document bytes.

  • kwargs – Additional parameters forwarded to Extract.

Returns:

The extraction result.

Example:

from solr import PysolrCompat

conn = PysolrCompat('http://localhost:8983/solr/mycore')
results = conn.search('title:lucene', rows=10)
conn.add([{'id': '1', 'title': 'Hello'}])
conn.delete(id='1')
conn.delete(q='title:Hello')
conn.commit()

AsyncSolr class

class solr.AsyncSolr(url, timeout=None, http_user=None, http_pass=None, post_headers=None, max_retries=3, retry_delay=0.1, always_commit=False, response_format='json', auth_token=None, auth=None, debug=False)

Async Solr client built on httpx.AsyncClient. Provides the same API as Solr but with async/await methods.

Constructor parameters are the same as Solr (except persistent, ssl_key, and ssl_cert are not applicable).

Context manager usage:

AsyncSolr should be used as an async context manager to ensure the underlying HTTP client is properly closed:

from solr import AsyncSolr

async with AsyncSolr('http://localhost:8983/solr/mycore') as conn:
    response = await conn.select('*:*')
    for doc in response.results:
        print(doc['id'])

Attributes:

server_version

Tuple representing the detected Solr version, e.g. (9, 4, 1).

Search methods:

async select(q=None, **params)

Async search query. Same parameters as SearchHandler.

Parameters:
  • q – Query string.

  • params – Additional Solr parameters (underscores become dots).

  • model – Optional Pydantic model class for automatic conversion.

Returns:

A Response instance.

Update methods:

async add(doc, **kwargs)

Add a single document.

Parameters:
  • doc – Dictionary mapping field names to values.

  • commit – Force an immediate commit (bool).

  • timeout – Override the request timeout (float).

async add_many(docs, **kwargs)

Add multiple documents.

Parameters:
  • docs – Iterable of document dicts.

  • commit – Force an immediate commit (bool).

  • timeout – Override the request timeout (float).

Delete methods:

async delete(id=None, **kwargs)

Delete a document by id.

Parameters:
  • id – Unique identifier of the document to delete.

  • commit – Force an immediate commit (bool).

  • timeout – Override the request timeout (float).

async delete_query(q, **kwargs)

Delete documents by query.

Parameters:
  • q – Solr query string identifying documents to delete.

  • commit – Force an immediate commit (bool).

  • timeout – Override the request timeout (float).

Commit:

async commit(**kwargs)

Commit pending changes.

Parameters:

soft_commit – If True, perform a soft commit (bool).

Real-time Get (Solr 4.0+):

async get(id=None, ids=None, fields=None, model=None)

Retrieve documents from the transaction log.

Parameters:
  • id – Single document ID.

  • ids – List of document IDs.

  • fields – List of field names to return.

  • model – Optional Pydantic model class.

Returns:

A dict for single id (or None), a list for ids.

Streaming Expressions (Solr 5.0+):

async stream(expr, model=None)

Execute a streaming expression. Returns an async generator of result dicts (or model instances). Skips the final EOF marker.

Parameters:
  • expr – A StreamExpression or string.

  • model – Optional Pydantic model for automatic conversion.

Usage:

async for doc in await conn.stream(expr):
    print(doc)

Connection management:

async close()

Close the underlying httpx.AsyncClient.

ping()

Ping the Solr server (synchronous). Returns True if reachable.

Paginator

class solr.SolrPaginator(result, default_page_size=None)

Paginator for a Solr response object. Provides Django-like pagination without any Django dependency.

Parameters:
  • result – A Response instance from a query.

  • default_page_size – Override the page size. If not given, uses the rows parameter from the query, or the number of results returned.

count

Total number of matching documents.

num_pages

Total number of pages.

page_range

A range of valid page numbers.

page(page_num=1)

Return a SolrPage for the given page number.

Raises:
class solr.SolrPage

A single page of results.

object_list

List of documents on this page.

has_next()
has_previous()
has_other_pages()
next_page_number()
previous_page_number()
start_index()
end_index()
class solr.EmptyPage

Raised when the requested page is out of range. Subclass of ValueError.

class solr.PageNotAnInteger

Raised when the page number is not an integer. Subclass of TypeError.

Response parsing

solrpy provides two response parsers:

solr.core.parse_query_response(data, params, query)

Parse an XML response from Solr (wt=standard or wt=xml).

Parameters:
  • data – A file-like object containing the XML response.

  • params – Dictionary of query parameters used for the request.

  • query – The SearchHandler that issued the query (used for next_batch() / previous_batch()).

Returns:

A Response instance, or None if the response is empty.

This is the default parser used by SearchHandler.

solr.core.parse_json_response(data, params, query)

Parse a JSON response dict from Solr (wt=json).

Parameters:
  • data – A dictionary (already deserialized from JSON).

  • params – Dictionary of query parameters used for the request.

  • query – The SearchHandler that issued the query.

Returns:

A Response instance.

Handles all standard Solr response fields: responseHeader, response (docs, numFound, start, maxScore), and any additional top-level keys such as highlighting, facet_counts, stats, debug, etc. Extra keys are attached directly as Response attributes.

Example usage with a raw JSON query:

import json
import solr
from solr.core import parse_json_response

conn = solr.Solr('http://localhost:8983/solr/mycore')
raw = conn.select.raw(q='*:*', wt='json')
data = json.loads(raw)
response = parse_json_response(data, {'q': '*:*'}, conn.select)

Gzip compression

All requests include an Accept-Encoding: gzip header. When the Solr server returns a gzip-compressed response, it is transparently decompressed before parsing.

This reduces network transfer size, especially for large result sets. No configuration is needed; gzip support is always enabled.

solr.core.read_response(response)

Read an HTTP response body, decompressing gzip if the Content-Encoding header indicates compression.

Parameters:

response – An http.client.HTTPResponse object.

Returns:

Decoded string (UTF-8).