API Reference
Data representation
Solr documents are modeled as Python dictionaries with field names as keys and field values as values.
Multi-valued fields use
list,tuple, orsetas values.datetime.datetimevalues are converted to UTC.datetime.datevalues are converted todatetime.datetimeat 00:00:00 UTC.boolvalues are converted to'true'or'false'.Nonevalues are omitted from the document sent to Solr.
Exceptions
- class solr.core.SolrVersionError(feature, required, actual)
Raised when a feature requires a higher Solr version than connected. Subclass of
Exception.- feature
Name of the feature that was called (string).
- required
Minimum version required as a tuple, e.g.
(4, 0).
- actual
Detected server version as a tuple, e.g.
(3, 6, 2).
Solr class
- class solr.Solr(url, persistent=True, timeout=None, ssl_key=None, ssl_cert=None, http_user=None, http_pass=None, post_headers=None, max_retries=3, always_commit=False, debug=False)
Connect to the Solr instance at url. If the Solr instance provides multiple cores, url should point to a specific core.
Constructor parameters:
Parameter
Description
urlURI pointing to the Solr instance (e.g.
http://localhost:8983/solr/mycore). AUserWarningis issued if the path does not contain/solr.persistentKeep a persistent HTTP connection open. Defaults to
True.timeoutTimeout in seconds for server responses.
ssl_keyPath to PEM key file for SSL client authentication.
ssl_certPath to PEM certificate file for SSL client authentication.
http_userUsername for HTTP Basic authentication.
http_passPassword for HTTP Basic authentication.
post_headersDictionary of additional headers to include in all requests.
max_retriesMaximum number of automatic retries on connection errors. Defaults to
3.retry_delayBase delay in seconds between retries. Uses exponential backoff: first retry waits
retry_delay, second waitsretry_delay * 2, etc. Defaults to0.1. Each retry is logged at WARNING level.always_commitIf
True, all update methods (add,add_many,delete, etc.) will automatically commit changes. Individual calls can override this by passingcommit=False. Defaults toFalse.response_formatResponse format for queries:
'json'(default) or'xml'. When'json', queries usewt=jsonand the JSON parser. Use'xml'for legacy compatibility with older code.auth_tokenBearer token string. Sends
Authorization: Bearer <token>header. Takes priority overhttp_user/http_pass.authA callable returning a
dict[str, str]of headers. Called on every request, enabling dynamic token refresh (e.g., OAuth2). Takes priority overauth_tokenandhttp_user/http_pass.debugIf
True, log all requests and responses.Attributes:
- server_version
Tuple representing the detected Solr version, e.g.
(9, 4, 1). Automatically populated during initialization.
- always_commit
Boolean indicating whether update methods auto-commit by default.
- select
A
SearchHandlerinstance bound to the/selectendpoint.
Health check:
- ping()
Ping the Solr server to check if it is reachable.
Returns
Trueif the server responds to/admin/ping,Falseotherwise. Tries both the core path and its parent path.Works on all Solr versions (1.2+).
Example:
conn = solr.Solr('http://localhost:8983/solr/mycore') if conn.ping(): print('Solr is up')
Search methods:
The
selectattribute is the primary search interface. SeeSearchHandlerfor details:response = conn.select('title:lucene')
Update methods:
Atomic update methods (Solr 4.0+):
- atomic_update(doc, commit=False)
Partial update of a single document. Field values can be plain values or dicts with a modifier key:
set,add,remove,removeregex(Solr 5.0+),inc. Use{'set': None}to remove a field.Example:
conn.atomic_update({ 'id': 'doc1', 'title': {'set': 'New Title'}, 'count': {'inc': 1}, 'old_field': {'set': None}, # remove field }, commit=True)
- atomic_update_many(docs, commit=False)
Partial update of multiple documents. Same modifier syntax as
atomic_update.
Real-time Get (Solr 4.0+):
- get(id=None, ids=None, fields=None)
Retrieve documents directly from the transaction log without waiting for a commit. Returns a dict for single
id(orNoneif not found), or a list forids.- Parameters:
id – Single document ID.
ids – List of document IDs.
fields – Optional list of fields to return.
Cursor pagination (Solr 4.7+):
- iter_cursor(q, sort, rows=100, **params)
Generator that yields
Responseobjects for each batch of cursor-paginated results. Stops when all results are consumed.- Parameters:
q – Query string.
sort – Sort clause (must include uniqueKey field).
rows – Batch size per request.
- Raises:
ValueError – If
sortis not provided.
MoreLikeThis (Solr 4.0+):
Create a
MoreLikeThisinstance:from solr import MoreLikeThis mlt = MoreLikeThis(conn) response = mlt('interesting text', fl='title,body')
- class solr.MoreLikeThis(conn)
Find similar documents using Solr’s
/mlthandler.- Parameters:
conn – A
Solrinstance.
- __call__(q=None, **params)
Query the MLT handler. Same parameters as
SearchHandler.
- raw(**params)
Issue a raw MLT query.
Delete methods:
Commit and optimize:
Connection management:
Commit-control arguments
Several update methods support optional keyword arguments to control
commits. These arguments are always optional; when always_commit is
False (the default), no commit is performed unless explicitly requested.
Argument |
Description |
|---|---|
|
If |
|
If |
|
Block until the commit is flushed to disk. Defaults to |
|
Block until searcher objects are warmed. Defaults to |
If wait_flush or wait_searcher are specified without commit or
optimize, a TypeError is raised.
Methods that support commit-control arguments: add, add_many,
delete, delete_many, delete_query.
All update methods and SearchHandler calls also accept a timeout
keyword argument to override the connection-level timeout for that
individual request.
SearchHandler class
- class solr.SearchHandler(connection, path='/select', arg_separator='_')
Provides access to a named Solr request handler. The
selectattribute onSolrinstances is aSearchHandlerbound to/select.Create handlers for custom endpoints:
import solr conn = solr.Solr('http://localhost:8983/solr/mycore') my_handler = solr.SearchHandler(conn, '/my_handler') response = my_handler('some query')
- SearchHandler.__call__(q=None, fields=None, highlight=None, score=True, sort=None, sort_order='asc', **params)
Execute a search query against Solr.
- Parameters:
q – Query string.
fields – Fields to return. String or iterable. Defaults to
'*'.highlight –
False(default),True, or a list of field names. When enabled, the response gets ahighlightingattribute — a dict keyed by document ID, where each value is a dict of field names to lists of highlighted snippets (e.g.{'doc1': {'title': ['<em>Lucene</em> in Action']}}). Customize withhl_simple_pre,hl_simple_post,hl_fragsize,hl_snippets, etc. via**params.score – Include
scorein results. Defaults toTrue.sort – Fields to sort by. String or iterable.
sort_order – Default sort direction (
'asc'or'desc').json_facet – JSON Facet API dict (Solr 5.0+). Serialized to
json.facetquery parameter automatically.params – Additional Solr parameters (use underscores for dots).
timeout – Per-request timeout in seconds (overrides connection-level timeout).
- Returns:
A
Responseinstance.- Raises:
ValueError – If
highlight=Truebut no fields are specified, or ifsort_orderis invalid.
- SearchHandler.raw(**params)
Issue a raw query. No processing is performed on parameters or responses. Returns the raw response text.
Response class
- class solr.Response
Container for query results.
Attributes:
- header
Dictionary containing response header values (status, QTime, params).
- results
A
Resultslist of matching documents. Each document is a dictionary of field names to values.
- numFound
Total number of matching documents.
- start
Starting offset of the current result set.
- maxScore
Maximum relevance score across all matches.
- facet_counts
Dictionary containing traditional facet results when
facet=trueis used. Contains keys likefacet_fields,facet_queries,facet_ranges,facet_pivots, etc.
- facets
Dictionary containing JSON Facet API results when
json.facetis used (Solr 5.0+). Contains the structured facet buckets returned by Solr’s JSON faceting.
- stats
Dictionary containing field statistics when
stats=trueis used. Contains per-field stats such as min, max, count, mean, etc.
- debug
Dictionary containing debug information when
debug=true(ordebugQuery=true) is used. Contains parsed query, explain info, timing data, etc.
Note
Any top-level key in the Solr JSON response that is not
responseHeaderorresponseis automatically set as an attribute on theResponseobject. This includeshighlighting,facet_counts,facets,stats,debug,nextCursorMark,grouped, and any other component output. You can access them asresponse.key_name.Pydantic models (opt-in):
- as_models(model)
Convert result documents to Pydantic
BaseModelinstances. Requirespydantic(pip install solrpy[pydantic]).- Parameters:
model – A Pydantic
BaseModelsubclass.- Returns:
List of model instances.
The
model=parameter onselect()andget()does this automatically:resp = conn.select('*:*', model=MyDoc) # results are list[MyDoc] doc = conn.get(id='1', model=MyDoc) # MyDoc | None
Cursor pagination (Solr 4.7+):
- cursor_next()
Follow cursor-based pagination. Returns the next page of results, or
Noneif no more results (nextCursorMark == cursorMark) or if the query did not usecursorMark.Example:
resp = conn.select('*:*', sort='id asc', cursorMark='*', rows=100) while resp: process(resp.results) resp = resp.cursor_next()
Offset pagination methods:
- next_batch()
Fetch the next batch of results. Returns a new
Response, orNoneif there are no more results.
- previous_batch()
Fetch the previous batch of results. Returns a new
Response, orNoneif this is the first batch.
Grouping (Solr 3.3+):
- grouped
A
GroupedResultobject when the response contains grouped results, otherwise not present. Enable grouping withgroup='true'andgroup_field='field'.Example:
resp = conn.select('*:*', group='true', group_field='category', group_limit=5, group_ngroups='true') for group in resp.grouped['category'].groups: print(group.groupValue, len(group.doclist))
Spellcheck (Solr 1.4+):
- spellcheck
A
SpellcheckResultobject if the response contains spellcheck data, otherwiseNone. Spellcheck data is returned when you includespellcheck='true'in the query parameters.Example:
resp = conn.select('misspeled query', spellcheck='true', spellcheck_collate='true') if resp.spellcheck and not resp.spellcheck.correctly_spelled: print('Did you mean:', resp.spellcheck.collation)
Iteration:
Response objects support
len()and iteration:response = conn.select('*:*') print(len(response)) for doc in response: print(doc['id'])
SpellcheckResult class (Solr 1.4+)
- class solr.SpellcheckResult(raw)
Wrapper around the raw spellcheck response dict. Returned by
Response.spellcheckwhen the query includesspellcheck=true.- Parameters:
raw – The raw spellcheck dict from the Solr response.
- correctly_spelled
Trueif all query terms were spelled correctly.
- collation
The corrected full query string suggested by Solr (collation), or
Noneif not present. Requiresspellcheck.collate=trueon the request.
- suggestions
List of per-word suggestion entries. Each entry is a dict that includes an
'original'key (the misspelled word) merged with the Solr info dict ('numFound','startOffset','endOffset','suggestion'list, etc.).Example:
for entry in resp.spellcheck.suggestions: print(entry['original'], '->', entry.get('suggestion', []))
Extract class (Solr 1.4+)
- class solr.Extract(conn)
Index or extract rich documents via Solr Cell (Apache Tika) using the
/update/extracthandler. The handler must be configured insolrconfig.xml.- __call__(file_obj, content_type='application/octet-stream', commit=False, **params)
Index a rich document.
- Parameters:
file_obj – Binary file-like object (opened in
'rb'mode).content_type – MIME type of the document. Defaults to
'application/octet-stream'.commit – Commit to the index immediately. Defaults to
False.params – Additional Solr parameters. The first underscore in each key is replaced with a dot:
literal_id='x'→literal.id=x. Field names with underscores are preserved:literal_my_field='v'→literal.my_field=v.
- Returns:
Parsed JSON response dict (contains
responseHeader).- Raises:
SolrVersionError – If the server is older than Solr 1.4.
Example:
from solr import Solr, Extract conn = Solr('http://localhost:8983/solr/mycore') ext = Extract(conn) with open('report.pdf', 'rb') as f: ext(f, content_type='application/pdf', literal_id='report1', literal_title='Annual Report', commit=True)
- extract_only(file_obj, content_type='application/octet-stream', **params)
Extract text and metadata without indexing.
Calls
/update/extractwithextractOnly=true.- Returns:
(text, metadata)tuple. text is the extracted plain text; metadata is a dict of Tika metadata (e.g.'Content-Type','Author','title').
- from_path(file_path, **params)
Index a document from a filesystem path. MIME type is guessed from the file extension via
mimetypes; falls back to'application/octet-stream'.- Parameters:
file_path – Path to the file.
params – Forwarded to
__call__()(commit,literal_*, etc.).
- extract_from_path(file_path, **params)
Extract text and metadata from a file path without indexing.
- Returns:
(text, metadata)tuple (same asextract_only()).
Suggest class (Solr 4.7+)
- class solr.Suggest(conn)
Query Solr’s SuggestComponent via the
/suggesthandler.The
/suggesthandler and at least oneSuggestComponentmust be configured insolrconfig.xml.- __call__(q, dictionary=None, count=10, **params)
Return a flat list of suggestion dicts for the query term.
- Parameters:
q – Partial query string to suggest for.
dictionary – Name of the suggester dictionary to use. If
None, Solr uses the default suggester.count – Maximum number of suggestions to return. Defaults to
10.params – Extra parameters forwarded verbatim to
/suggest.
- Returns:
List of suggestion dicts. Each dict typically has
'term','weight', and'payload'keys.- Raises:
SolrVersionError – If the server is older than Solr 4.7.
Example:
from solr import Solr, Suggest conn = Solr('http://localhost:8983/solr/mycore') suggest = Suggest(conn) results = suggest('que', dictionary='mySuggester', count=5) for s in results: print(s['term'], s['weight'])
Grouping classes (Solr 3.3+)
- class solr.GroupedResult(raw)
Wrapper around a Solr grouped response. Supports subscript access by field name, iteration,
in, andlen().- __getitem__(field)
Return a
GroupFieldfor the given field name.
KNN / Dense Vector Search (Solr 9.0+)
- class solr.KNN(conn)
Dense Vector / KNN Search using Solr’s
{!knn}and{!vectorSimilarity}query parsers. Supports top-K search, similarity threshold search, hybrid (lexical + vector) search, and re-ranking. Created explicitly by the user.- Parameters:
conn – A
SolrorAsyncSolrinstance. WithAsyncSolr, execution methods return coroutines.
Execution methods:
- search(vector, field, top_k=10, filters=None, early_termination=False, saturation_threshold=None, patience=None, ef_search_scale_factor=None, seed_query=None, pre_filter=None, **params)
Execute a
{!knn}search query (top-K nearest neighbors).- Parameters:
vector – Dense vector as a sequence of floats.
field – Name of the
DenseVectorFieldto search.top_k – Number of nearest neighbors to retrieve. Defaults to
10.filters – Filter query (
fqparameter).early_termination – Enable HNSW early termination optimization.
saturation_threshold – Queue saturation cutoff for early termination (float between 0 and 1).
patience – Iteration limit for early termination (integer).
ef_search_scale_factor – Candidate examination multiplier (Solr 10.0+). Raises
SolrVersionErrorif Solr < 10.0.seed_query – Lexical query string to guide the vector search entry point.
pre_filter – Explicit pre-filter query string(s). A single string or a list of strings.
params – Additional Solr parameters.
- Returns:
A
Responseinstance (sync) or coroutine (async).- Raises:
SolrVersionError – If connected Solr is older than 9.0.
- similarity(vector, field, min_return, min_traverse=None, pre_filter=None, filters=None, **params)
Execute a
{!vectorSimilarity}search. Returns all documents whose similarity to the vector exceeds min_return.- Parameters:
vector – Dense vector as a sequence of floats.
field – Name of the
DenseVectorField.min_return – Minimum similarity threshold for results (float).
min_traverse – Minimum similarity to continue graph traversal (float). Can improve performance by pruning low-similarity branches.
pre_filter – Explicit pre-filter query string(s).
filters – Filter query (
fqparameter).params – Additional Solr parameters.
- Returns:
A
Responseinstance.- Raises:
SolrVersionError – If connected Solr is older than 9.0.
- hybrid(text_query, vector, field, min_return=0.5, **params)
Execute a hybrid (lexical OR vector) search. Combines a standard text query with a
{!vectorSimilarity}query using an OR clause.- Parameters:
text_query – The lexical search query string.
vector – Dense vector for similarity matching.
field – Name of the
DenseVectorField.min_return – Minimum similarity threshold for the vector part. Defaults to
0.5.params – Additional Solr parameters.
- Returns:
A
Responseinstance.- Raises:
SolrVersionError – If connected Solr is older than 9.0.
- rerank(query, vector, field, top_k=10, rerank_docs=100, rerank_weight=1.0, **params)
Execute a lexical query re-ranked by vector similarity. Uses Solr’s
{!rerank}query parser to re-score the top lexical results with a{!knn}query.- Parameters:
query – The base lexical query string.
vector – Dense vector for re-ranking.
field – Name of the
DenseVectorField.top_k – topK for the KNN re-rank query. Defaults to
10.rerank_docs – Number of top lexical docs to re-rank. Defaults to
100.rerank_weight – Weight of the vector score in the final ranking. Defaults to
1.0.params – Additional Solr parameters.
- Returns:
A
Responseinstance.- Raises:
SolrVersionError – If connected Solr is older than 9.0.
Query builder methods:
- build_knn_query(vector, field, top_k=10, early_termination=False, saturation_threshold=None, patience=None, ef_search_scale_factor=None, seed_query=None, pre_filter=None, include_tags=None, exclude_tags=None)
Build a
{!knn}query string without executing it.- Parameters:
vector – Dense vector as a sequence of floats.
field – Name of the
DenseVectorFieldto search.top_k – Number of nearest neighbors to retrieve.
early_termination – Enable HNSW early termination.
saturation_threshold – Queue saturation cutoff (float).
patience – Iteration limit for early termination (int).
ef_search_scale_factor – Candidate examination multiplier (Solr 10.0+).
seed_query – Lexical query to guide vector search entry point.
pre_filter – Pre-filter query string(s) (string or list of strings).
include_tags – Only use
fqfilters with these tags.exclude_tags – Exclude
fqfilters with these tags.
- Returns:
The KNN query string, e.g.
{!knn f=embedding topK=10}[0.1,0.2,...].
- build_similarity_query(vector, field, min_return, min_traverse=None, pre_filter=None)
Build a
{!vectorSimilarity}query string without executing it.- Parameters:
vector – Dense vector as a sequence of floats.
field – Name of the
DenseVectorField.min_return – Minimum similarity threshold for results.
min_traverse – Minimum similarity to continue traversal.
pre_filter – Pre-filter query string(s).
- Returns:
The vectorSimilarity query string.
- build_hybrid_query(text_query, vector, field, min_return=0.5)
Build a hybrid (lexical OR vector) query string without executing it.
- Parameters:
text_query – The lexical search query.
vector – Dense vector for similarity matching.
field – Name of the
DenseVectorField.min_return – Minimum similarity threshold for the vector part.
- Returns:
A combined OR query string.
- build_rerank_params(vector, field, top_k=10, rerank_docs=100, rerank_weight=1.0)
Build re-ranking parameters for use with a lexical base query.
- Parameters:
vector – Dense vector for re-ranking.
field – Name of the
DenseVectorField.top_k – topK for the KNN re-rank query.
rerank_docs – Number of top lexical docs to re-rank.
rerank_weight – Weight of vector score in final ranking.
- Returns:
A dict with
rqandrqqkeys ready to pass as query parameters.
- build_query(vector, field, top_k=10, ef_search_scale_factor=None)
Alias for
build_knn_query()(backward compatibility).
Example:
from solr import Solr, KNN conn = Solr('http://localhost:8983/solr/mycore') knn = KNN(conn) # Top-K nearest neighbors response = knn.search([0.1, 0.2, 0.3], field='embedding', top_k=10) # Similarity threshold response = knn.similarity([0.1, 0.2, 0.3], field='embedding', min_return=0.7) # Hybrid (lexical + vector) response = knn.hybrid('machine learning', [0.1, 0.2, 0.3], field='embedding') # Re-rank lexical results by vector similarity response = knn.rerank('machine learning', [0.1, 0.2, 0.3], field='embedding', rerank_docs=100)
SolrCloud (Solr 4.0+)
- class solr.SolrCloud(zk, collection, retry_count=3, retry_delay=0.5, **solr_kwargs)
SolrCloud client with leader-aware routing and automatic failover. Two modes of operation: ZooKeeper mode (real-time node discovery) and HTTP mode (CLUSTERSTATUS polling, no ZooKeeper needed).
- Parameters:
zk – A
SolrZooKeeperinstance.collection – Solr collection name.
retry_count – Number of failover retries (default
3). On each failure, the client reconnects to a different node and retries.retry_delay – Base delay in seconds between retries (default
0.5). Uses exponential backoff:retry_delay * 2^attempt.solr_kwargs – Extra keyword arguments forwarded to the underlying
Solrconnection (e.g.timeout,http_user,http_pass,auth_token,auth,response_format).
- classmethod from_urls(urls, collection, retry_count=3, retry_delay=0.5, **solr_kwargs)
Create without ZooKeeper, using HTTP-only CLUSTERSTATUS discovery. The client probes provided URLs to find active nodes and discovers shard leaders via the
CLUSTERSTATUSadmin API.
Properties:
- server_version
Detected Solr server version tuple, e.g.
(9, 4, 1).
Read operations (routed to any active replica):
- select(*args, **kwargs)
Execute a search query with automatic failover. Same parameters as
Solr.select().
- ping()
Ping the current Solr node.
Write operations (routed to a shard leader):
- add(doc, **kwargs)
Add a document, routed to a shard leader.
- add_many(docs, **kwargs)
Add multiple documents, routed to a shard leader.
- delete(**kwargs)
Delete documents, routed to a shard leader.
- delete_query(query, **kwargs)
Delete by query, routed to a shard leader.
- delete_many(ids, **kwargs)
Delete multiple documents by ID, routed to a shard leader.
- commit(**kwargs)
Commit changes, routed to a shard leader.
- optimize(**kwargs)
Optimize the index, routed to a shard leader.
- close()
Close the underlying Solr connection.
Failover behavior:
When any operation fails, the client:
Logs a WARNING with the attempt number and error.
Waits
retry_delay * 2^attemptseconds (exponential backoff).Reconnects to a different node (leader for writes, any replica for reads).
Retries the operation.
After
retry_countretries, raises the last exception.
Example (ZooKeeper mode):
from solr import SolrZooKeeper, SolrCloud zk = SolrZooKeeper('zk1:2181,zk2:2181,zk3:2181') cloud = SolrCloud(zk, collection='products', timeout=10, auth_token='my-jwt-token') # reads go to any active replica response = cloud.select('category:books', rows=20) # writes are routed to shard leaders cloud.add({'id': '1', 'title': 'Solr in Action'}, commit=True) cloud.delete(id='1', commit=True) cloud.close() zk.close()
Example (HTTP-only mode):
from solr import SolrCloud cloud = SolrCloud.from_urls( ['http://solr1:8983/solr', 'http://solr2:8983/solr'], collection='products') response = cloud.select('*:*') cloud.close()
SolrZooKeeper
- class solr.SolrZooKeeper(hosts, timeout=10.0)
ZooKeeper client for SolrCloud node discovery. Reads ZooKeeper state (
/live_nodes,/collections/{name}/state.json,/aliases.json) to discover active Solr nodes, shard leaders, and collection aliases.Requires the
kazoolibrary:pip install solrpy[cloud]
- Parameters:
hosts – ZooKeeper connection string, e.g.
'zk1:2181,zk2:2181,zk3:2181'. Supports chroot paths:'zk1:2181,zk2:2181/solr'.timeout – Connection timeout in seconds (default
10.0).
- Raises:
ImportError – If
kazoois not installed.
- live_nodes()
Return a list of currently active Solr node identifiers as reported by ZooKeeper’s
/live_nodesznode.- Returns:
list[str]— Node identifiers, e.g.['solr1:8983_solr', 'solr2:8983_solr'].
- collection_state(collection)
Return the full state dict for a collection. Contains shard info, replica info, router config, and more.
Tries per-collection
state.jsonfirst (Solr 5+), then falls back to the legacy/clusterstate.json(Solr 4.x).- Parameters:
collection – Collection name (not an alias).
- Returns:
dict— State dict with keys like'shards','router','maxShardsPerNode', etc.- Raises:
RuntimeError – If the collection is not found in ZooKeeper.
- aliases()
Return collection aliases as a dict.
- Returns:
dict[str, str]—{alias_name: real_collection_name}. Empty dict if no aliases are defined.
- replica_urls(collection)
Return base URLs of all active replicas for a collection. Aliases are resolved automatically.
- Parameters:
collection – Collection name or alias.
- Returns:
list[str]— Solr base URLs, e.g.['http://solr1:8983/solr', 'http://solr2:8983/solr'].
- leader_urls(collection)
Return base URLs of shard leaders for a collection (one per shard). Aliases are resolved automatically.
- Parameters:
collection – Collection name or alias.
- Returns:
list[str]— Leader Solr base URLs.
- random_url(collection)
Return a random active replica URL for load balancing.
- Parameters:
collection – Collection name or alias.
- Returns:
str— A Solr base URL.- Raises:
RuntimeError – If no active replicas are found.
- random_leader_url(collection)
Return a random shard leader URL for write operations.
- Parameters:
collection – Collection name or alias.
- Returns:
str— A leader Solr base URL.- Raises:
RuntimeError – If no leaders are found.
- close()
Close the ZooKeeper connection. Always call this when done.
Example:
from solr import SolrZooKeeper zk = SolrZooKeeper('zk1:2181,zk2:2181,zk3:2181') # Discover nodes nodes = zk.live_nodes() print('Active nodes:', nodes) # Get all replica URLs replicas = zk.replica_urls('mycore') print('Replicas:', replicas) # Get shard leaders leaders = zk.leader_urls('mycore') print('Leaders:', leaders) # Check aliases aliases = zk.aliases() print('Aliases:', aliases) # e.g. {'prod': 'mycore_v2'} # Get collection state state = zk.collection_state('mycore') for shard, data in state['shards'].items(): print(shard, ':', len(data['replicas']), 'replicas') zk.close()
Query builders
- class solr.Field(name, alias=None)
Structured field expression for the
flparameter.- classmethod func(name, \*args)
Function field, e.g.
Field.func('sum', 'price', 'tax')→sum(price,tax).
- classmethod transformer(name, \*\*params)
Document transformer, e.g.
Field.transformer('explain')→[explain].
- classmethod score()
Score pseudo-field →
score.
- class solr.Sort(field, direction='asc')
Structured sort clause.
- classmethod func(expr, direction='asc')
Function sort, e.g.
Sort.func('geodist()', 'asc').
- class solr.Facet
Builder for traditional Solr facet parameters.
- classmethod field(name, \*\*opts)
Field facet with per-field options (mincount, limit, sort, etc.).
- classmethod range(name, start, end, gap, \*\*opts)
Range facet.
- classmethod query(name, q)
Query facet.
- classmethod pivot(\*fields, mincount=None)
Pivot facet.
- to_params()
Convert to Solr query parameter dict.
All builders coexist with raw string parameters:
# Raw strings (always works)
conn.select('*:*', fl='id,title', sort='price desc',
facet='true', facet_field='category')
# Builder objects (optional)
conn.select('*:*',
fields=[Field('id'), Field('title')],
sort=[Sort('price', 'desc')],
facets=[Facet.field('category')],
)
Schema API (Solr 4.2+)
- class solr.SchemaAPI(conn)
Created explicitly by the user. All methods require Solr 4.2+.
Example:
from solr import Solr, AsyncSolr, SchemaAPI # Sync conn = Solr('http://localhost:8983/solr/mycore') schema = SchemaAPI(conn) fields = schema.fields() # Async async with AsyncSolr('http://localhost:8983/solr/mycore') as conn: schema = SchemaAPI(conn) fields = await schema.fields()
Full schema:
- get_schema()
Return the full schema definition as a dict.
Field operations:
- fields()
List all fields. Returns a list of field definition dicts.
- add_field(name, field_type, **opts)
Add a new field. Example:
conn.schema.add_field('title', 'text_general', stored=True, indexed=True)
- replace_field(name, field_type, **opts)
Replace an existing field definition.
- delete_field(name)
Delete a field by name.
Dynamic field operations:
- dynamic_fields()
- add_dynamic_field(name, field_type, **opts)
- delete_dynamic_field(name)
Field type operations:
- field_types()
- add_field_type(**definition)
- replace_field_type(**definition)
- delete_field_type(name)
Copy field operations:
- copy_fields()
- add_copy_field(source, dest, max_chars=None)
- delete_copy_field(source, dest)
Streaming Expressions (Solr 5.0+)
Build and execute Solr Streaming Expressions using Python builder functions.
Each function returns a StreamExpression node. Nodes can be chained
with the | (pipe) operator.
Core classes:
- class solr.stream.StreamExpression(func_name, \*args, \*\*kwargs)
A Solr streaming expression node. Renders to a Solr expression string via
str(). Supports the|(pipe) operator for chaining: the left-hand expression becomes the first positional argument of the right-hand expression.- Parameters:
func_name – The Solr streaming function name (e.g.
"search").args – Positional arguments (collection names, sub-expressions).
kwargs – Named parameters rendered as
key=valuepairs. String values containing spaces are auto-quoted.
- __or__(other)
Pipe operator. Inserts
selfas the first argument of other and returns other.
- __str__()
Render as a Solr streaming expression string, e.g.
search(mycore,q="*:*",fl="id,title",sort="id asc").
- class solr.stream.AggregateExpression(func_name, field)
An aggregate function for use inside
rollup(),stats(), etc. Renders asfunc(field).- Parameters:
func_name – The aggregate function name (e.g.
"sum").field – The Solr field to aggregate over.
Source expressions:
- solr.stream.search(collection, \*\*kwargs)
Build a
search()streaming expression.- Parameters:
collection – Solr collection name.
kwargs – Query parameters –
q,fl,sort,rows,fq, etc.
- solr.stream.facet(collection, \*\*kwargs)
Build a
facet()streaming expression.- Parameters:
collection – Solr collection name.
kwargs – Facet parameters –
q,buckets,bucketSorts,bucketSizeLimit, etc.
- solr.stream.topic(collection, \*\*kwargs)
Build a
topic()streaming expression.- Parameters:
collection – Solr collection name.
kwargs – Topic parameters –
q,fl,id,checkpointEvery, etc.
Transform expressions:
- solr.stream.unique(\*args, \*\*kwargs)
Build a
unique()expression. De-duplicates a sorted stream by field.- Parameters:
kwargs –
over– the field to de-duplicate on.
- solr.stream.top(\*args, \*\*kwargs)
Build a
top()expression. Returns the top n tuples by sort order.- Parameters:
kwargs –
n– number of tuples to return;sort– sort clause.
- solr.stream.sort(\*args, \*\*kwargs)
Build a
sort()expression. Re-sorts a stream.- Parameters:
kwargs –
by– sort clause.
- solr.stream.select(\*args, \*\*kwargs)
Build a
select()expression. Projects / renames fields.
- solr.stream.rollup(\*args, \*\*kwargs)
Build a
rollup()expression. Groups a sorted stream and applies aggregate functions.- Parameters:
kwargs –
over– field to group by; additional keyword arguments are aggregate expressions (e.g.total=sum('bytes')).
- solr.stream.reduce(\*args, \*\*kwargs)
Build a
reduce()expression. Groups a stream by field values.- Parameters:
kwargs –
by– field to reduce on.
Join expressions:
- solr.stream.merge(\*args, \*\*kwargs)
Build a
merge()expression. Merges two or more sorted streams.- Parameters:
args – Two or more sub-expressions.
kwargs –
on– merge key and direction (e.g."id asc").
- solr.stream.innerJoin(\*args, \*\*kwargs)
Build an
innerJoin()expression.- Parameters:
args – Two sub-expressions (left, right).
kwargs –
on– join key mapping (e.g."left.id=right.id").
- solr.stream.leftOuterJoin(\*args, \*\*kwargs)
Build a
leftOuterJoin()expression.
- solr.stream.hashJoin(\*args, \*\*kwargs)
Build a
hashJoin()expression.- Parameters:
kwargs –
on– join key;hashed– the hashed side.
- solr.stream.intersect(\*args, \*\*kwargs)
Build an
intersect()expression. Returns tuples present in both streams.
- solr.stream.complement(\*args, \*\*kwargs)
Build a
complement()expression. Returns tuples in the first stream that are not in the second.
Aggregate functions:
These return AggregateExpression instances for use inside
rollup(), stats(), and similar expressions.
- solr.stream.count(field)
Aggregate:
count(field).
- solr.stream.sum(field)
Aggregate:
sum(field).
- solr.stream.avg(field)
Aggregate:
avg(field).
- solr.stream.min(field)
Aggregate:
min(field).
- solr.stream.max(field)
Aggregate:
max(field).
Control expressions:
- solr.stream.fetch(\*args, \*\*kwargs)
Build a
fetch()expression. Enriches tuples by fetching additional fields from a collection.- Parameters:
kwargs –
fl– fields to fetch;on– join key.
- solr.stream.parallel(\*args, \*\*kwargs)
Build a
parallel()expression. Distributes a stream across workers.- Parameters:
kwargs –
workers– number of parallel workers;sort– merge sort clause.
- solr.stream.daemon(\*args, \*\*kwargs)
Build a
daemon()expression. Wraps a stream to run continuously.- Parameters:
kwargs –
id– daemon identifier;runInterval– interval in milliseconds;queueSize– internal queue size.
- solr.stream.update(\*args, \*\*kwargs)
Build an
update()expression. Sends tuples to a collection as documents.- Parameters:
kwargs –
batchSize– number of documents per batch.
- solr.stream.commit(\*args, \*\*kwargs)
Build a
commit()expression. Commits after an update stream.
Execution via Solr / AsyncSolr:
- Solr.stream(expr, model=None)
Execute a streaming expression via the
/streamhandler (Solr 5.0+). Returns a synchronous iterator of result dicts. The final EOF marker tuple is automatically skipped.- Parameters:
expr – A
StreamExpressionor a raw expression string.model – Optional Pydantic
BaseModelsubclass. When provided, each result dict is converted viamodel.model_validate().
- Returns:
Iterator[dict[str, Any]](orIterator[Model]).- Raises:
SolrVersionError – If connected Solr is older than 5.0.
Example:
from solr.stream import search, rollup, sum expr = (search('logs', q='*:*', fl='host,bytes', sort='host asc') | rollup(over='host', total=sum('bytes'))) for doc in conn.stream(expr): print(doc)
- async AsyncSolr.stream(expr, model=None)
Async version of
Solr.stream(). Returns an async generator.- Parameters:
expr – A
StreamExpressionor a raw string.model – Optional Pydantic model for automatic conversion.
- Returns:
Async generator of result dicts (or model instances).
Example:
async for doc in await conn.stream(expr): print(doc)
PysolrCompat class
- class solr.PysolrCompat(url, **kwargs)
A pysolr-compatible wrapper around
Solr. SubclassesSolrso all native solrpy features remain available, while adding method aliases that match thepysolrlibrary’s public API. This allows drop-in migration from pysolr with minimal code changes.Constructor parameters are identical to
Solr.- search(q, **kwargs)
Search for documents. Alias for
Solr.select().- Parameters:
q – The query string.
kwargs – Additional Solr parameters (e.g.
rows,fq).
- Returns:
A
Responseinstance.
- add(docs, commit=True, **kwargs)
Add one or more documents. Unlike native
Solr.add()(which takes a single dict), this method accepts either alistof dicts or a single dict, matching pysolr behavior.- Parameters:
docs – A list of document dicts, or a single document dict.
commit – Whether to auto-commit after adding. Defaults to
Trueto match pysolr convention.
- delete(id=None, q=None, commit=True, **kwargs)
Delete documents by id and/or query. Supports both
id=andq=keyword arguments in a single call, matching pysolr’s API.- Parameters:
id – Document id to delete.
q – Query string; all matching documents will be deleted.
commit – Whether to auto-commit after deleting. Defaults to
Trueto match pysolr convention.
- extract(file_obj, **kwargs)
Extract/index a rich document via Solr Cell. Creates an
Extractcompanion internally and delegates.- Parameters:
file_obj – A file-like object containing the document bytes.
kwargs – Additional parameters forwarded to
Extract.
- Returns:
The extraction result.
Example:
from solr import PysolrCompat conn = PysolrCompat('http://localhost:8983/solr/mycore') results = conn.search('title:lucene', rows=10) conn.add([{'id': '1', 'title': 'Hello'}]) conn.delete(id='1') conn.delete(q='title:Hello') conn.commit()
AsyncSolr class
- class solr.AsyncSolr(url, timeout=None, http_user=None, http_pass=None, post_headers=None, max_retries=3, retry_delay=0.1, always_commit=False, response_format='json', auth_token=None, auth=None, debug=False)
Async Solr client built on
httpx.AsyncClient. Provides the same API asSolrbut withasync/awaitmethods.Constructor parameters are the same as
Solr(exceptpersistent,ssl_key, andssl_certare not applicable).Context manager usage:
AsyncSolrshould be used as an async context manager to ensure the underlying HTTP client is properly closed:from solr import AsyncSolr async with AsyncSolr('http://localhost:8983/solr/mycore') as conn: response = await conn.select('*:*') for doc in response.results: print(doc['id'])
Attributes:
- server_version
Tuple representing the detected Solr version, e.g.
(9, 4, 1).
Search methods:
- async select(q=None, **params)
Async search query. Same parameters as
SearchHandler.- Parameters:
q – Query string.
params – Additional Solr parameters (underscores become dots).
model – Optional Pydantic model class for automatic conversion.
- Returns:
A
Responseinstance.
Update methods:
- async add(doc, **kwargs)
Add a single document.
- Parameters:
doc – Dictionary mapping field names to values.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).
- async add_many(docs, **kwargs)
Add multiple documents.
- Parameters:
docs – Iterable of document dicts.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).
Delete methods:
- async delete(id=None, **kwargs)
Delete a document by id.
- Parameters:
id – Unique identifier of the document to delete.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).
- async delete_query(q, **kwargs)
Delete documents by query.
- Parameters:
q – Solr query string identifying documents to delete.
commit – Force an immediate commit (bool).
timeout – Override the request timeout (float).
Commit:
- async commit(**kwargs)
Commit pending changes.
- Parameters:
soft_commit – If
True, perform a soft commit (bool).
Real-time Get (Solr 4.0+):
- async get(id=None, ids=None, fields=None, model=None)
Retrieve documents from the transaction log.
- Parameters:
id – Single document ID.
ids – List of document IDs.
fields – List of field names to return.
model – Optional Pydantic model class.
- Returns:
A dict for single
id(orNone), a list forids.
Streaming Expressions (Solr 5.0+):
- async stream(expr, model=None)
Execute a streaming expression. Returns an async generator of result dicts (or model instances). Skips the final EOF marker.
- Parameters:
expr – A
StreamExpressionor string.model – Optional Pydantic model for automatic conversion.
Usage:
async for doc in await conn.stream(expr): print(doc)
Connection management:
- async close()
Close the underlying
httpx.AsyncClient.
- ping()
Ping the Solr server (synchronous). Returns
Trueif reachable.
Paginator
- class solr.SolrPaginator(result, default_page_size=None)
Paginator for a Solr response object. Provides Django-like pagination without any Django dependency.
- Parameters:
result – A
Responseinstance from a query.default_page_size – Override the page size. If not given, uses the
rowsparameter from the query, or the number of results returned.
- count
Total number of matching documents.
- num_pages
Total number of pages.
- page_range
A
rangeof valid page numbers.
- page(page_num=1)
Return a
SolrPagefor the given page number.- Raises:
PageNotAnInteger – If
page_numcannot be converted to int.EmptyPage – If
page_numis out of range.
- class solr.SolrPage
A single page of results.
- object_list
List of documents on this page.
- has_next()
- has_previous()
- has_other_pages()
- next_page_number()
- previous_page_number()
- start_index()
- end_index()
- class solr.EmptyPage
Raised when the requested page is out of range. Subclass of
ValueError.
Response parsing
solrpy provides two response parsers:
- solr.core.parse_query_response(data, params, query)
Parse an XML response from Solr (
wt=standardorwt=xml).- Parameters:
data – A file-like object containing the XML response.
params – Dictionary of query parameters used for the request.
query – The
SearchHandlerthat issued the query (used fornext_batch()/previous_batch()).
- Returns:
A
Responseinstance, orNoneif the response is empty.
This is the default parser used by
SearchHandler.
- solr.core.parse_json_response(data, params, query)
Parse a JSON response dict from Solr (
wt=json).- Parameters:
data – A dictionary (already deserialized from JSON).
params – Dictionary of query parameters used for the request.
query – The
SearchHandlerthat issued the query.
- Returns:
A
Responseinstance.
Handles all standard Solr response fields:
responseHeader,response(docs, numFound, start, maxScore), and any additional top-level keys such ashighlighting,facet_counts,stats,debug, etc. Extra keys are attached directly as Response attributes.Example usage with a raw JSON query:
import json import solr from solr.core import parse_json_response conn = solr.Solr('http://localhost:8983/solr/mycore') raw = conn.select.raw(q='*:*', wt='json') data = json.loads(raw) response = parse_json_response(data, {'q': '*:*'}, conn.select)
Gzip compression
All requests include an Accept-Encoding: gzip header. When the Solr
server returns a gzip-compressed response, it is transparently decompressed
before parsing.
This reduces network transfer size, especially for large result sets. No configuration is needed; gzip support is always enabled.
- solr.core.read_response(response)
Read an HTTP response body, decompressing gzip if the
Content-Encodingheader indicates compression.- Parameters:
response – An
http.client.HTTPResponseobject.- Returns:
Decoded string (UTF-8).