Lifecycle and State Management#
This diagram shows the complete lifecycle and state transitions of the Ollama LLM.
stateDiagram-v2
[*] --> Uninitialized
Uninitialized --> Configured: __init__(model, base_url, ...)
state Configured {
[*] --> ClientNotCreated
note right of ClientNotCreated
State: Configuration stored
- model: str
- base_url: str
- timeout: float
- temperature: float
- json_mode: bool
- _client: None
- _async_client: None
end note
ClientNotCreated --> ClientInitialized: First chat/complete call
state ClientInitialized {
[*] --> Idle
note right of Idle
State: Ready for requests
- _client: Client instance
- _async_client: None or AsyncClient
end note
Idle --> ProcessingChat: chat(messages)
Idle --> ProcessingComplete: complete(prompt)
Idle --> ProcessingStream: chat(messages, stream=True)
Idle --> ProcessingAsync: achat(messages)
Idle --> ProcessingTools: generate_tool_calls(messages, tools)
state ProcessingChat {
[*] --> BuildingRequest
BuildingRequest --> ConvertingMessages: Convert Message objects
ConvertingMessages --> AddingOptions: Add temperature, etc.
AddingOptions --> AddingFormat: Add json format if enabled
AddingFormat --> SendingRequest: HTTP POST to server
SendingRequest --> WaitingResponse: Awaiting response
WaitingResponse --> ParsingResponse: Parse raw dict
ParsingResponse --> CreatingChatResponse: Create ChatResponse
CreatingChatResponse --> [*]
}
state ProcessingComplete {
[*] --> DecoratorWrap
DecoratorWrap --> ConvertToMessage: Wrap prompt as Message
ConvertToMessage --> DelegateToChat: Call chat([message])
DelegateToChat --> ProcessingChat
ProcessingChat --> DecoratorUnwrap: Extract text
DecoratorUnwrap --> CreateCompletionResponse: Create CompletionResponse
CreateCompletionResponse --> [*]
}
state ProcessingStream {
[*] --> BuildingStreamRequest
BuildingStreamRequest --> SendingStreamRequest: stream=True
SendingStreamRequest --> StreamLoop
state StreamLoop {
[*] --> WaitingChunk
WaitingChunk --> ReceivingChunk: Chunk arrives
ReceivingChunk --> ParsingChunk: Parse chunk dict
ParsingChunk --> AccumulatingContent: Accumulate content
AccumulatingContent --> AccumulatingTools: Accumulate tool_calls
AccumulatingTools --> YieldingResponse: Yield ChatResponse
YieldingResponse --> CheckDone: Check done flag
CheckDone --> WaitingChunk: done=False
CheckDone --> [*]: done=True
}
StreamLoop --> [*]
}
state ProcessingAsync {
[*] --> EnsureAsyncClient
EnsureAsyncClient --> BuildingAsyncRequest: Create AsyncClient if needed
BuildingAsyncRequest --> SendingAsyncRequest: await async_client.chat()
SendingAsyncRequest --> WaitingAsyncResponse: Awaiting
WaitingAsyncResponse --> ParsingAsyncResponse: Parse response
ParsingAsyncResponse --> CreatingAsyncChatResponse: Create ChatResponse
CreatingAsyncChatResponse --> [*]
}
state ProcessingTools {
[*] --> PreparingTools
PreparingTools --> ConvertingTools: Convert to Ollama format
ConvertingTools --> ExtractingSchemas: Extract fn_schema from metadata
ExtractingSchemas --> BuildingToolDicts: Build tool dicts
BuildingToolDicts --> MergingKwargs: Add tools to kwargs
MergingKwargs --> CallingChat: Call chat(messages, tools=...)
CallingChat --> ProcessingChat
ProcessingChat --> ValidatingToolResponse: Check tool_calls present
ValidatingToolResponse --> CheckParallel: Check allow_parallel_tool_calls
CheckParallel --> ForcingSingle: False - trim to 1
CheckParallel --> ReturningMultiple: True - keep all
ForcingSingle --> [*]
ReturningMultiple --> [*]
}
ProcessingChat --> Idle: Return ChatResponse
ProcessingComplete --> Idle: Return CompletionResponse
ProcessingStream --> Idle: Stream complete
ProcessingAsync --> Idle: Return ChatResponse
ProcessingTools --> Idle: Return ChatResponse with tools
Idle --> Error: Exception occurs
ProcessingChat --> Error: Network/Parse error
ProcessingComplete --> Error: Network/Parse error
ProcessingStream --> Error: Network/Parse error
ProcessingAsync --> Error: Network/Parse error
ProcessingTools --> Error: Tool validation error
state Error {
[*] --> LoggingError
LoggingError --> RaisingException: Raise appropriate exception
RaisingException --> [*]
}
Error --> Idle: Error handled by caller
}
}
Configured --> [*]: Delete instance
note left of Configured
Lifecycle Phases:
1. Uninitialized: Before __init__
2. Configured: After __init__, clients lazy
3. ClientInitialized: After first use
4. Idle: Ready for requests
5. Processing*: Handling request
6. Error: Exception state
end note
State Transitions#
1. Initialization → Configured#
import os
from serapeum.ollama import Ollama
llm = Ollama(
model="llama3.1",
api_key=os.environ.get("OLLAMA_API_KEY"),
base_url="http://localhost:11434",
timeout=180
)
# State: Configured
# - Configuration fields populated
# - Metadata created (is_chat_model=True, is_function_calling_model=True)
# - _client = None (not yet created)
# - _async_client = None (not yet created)
2. Configured → ClientInitialized (Lazy)#
```python notest from serapeum.core.llms import Message, MessageRole, TextChunk
First call triggers client creation#
response = llm.chat([Message(role=MessageRole.USER, chunks=[TextChunk(content="Hello")])])
Transition:#
- Access self.client property#
- Check if self._client is None → True#
- Create Client(host=self.base_url, timeout=self.timeout)#
- Store in self._client#
State: ClientInitialized → Idle#
- _client = Client instance#
- Ready to process requests#
### 3. Idle → ProcessingChat → Idle
```python notest
# Idle state: Ready for requests
response = llm.chat(messages)
# Transition to ProcessingChat:
# 1. BuildingRequest: Create request dict
# 2. ConvertingMessages: Message objects to dicts
# 3. AddingOptions: Merge temperature, etc.
# 4. AddingFormat: Add json format if enabled
# 5. SendingRequest: client.chat(**request)
# 6. WaitingResponse: Block until response
# 7. ParsingResponse: _chat_from_response(raw)
# 8. CreatingChatResponse: Build ChatResponse object
# Transition back to Idle:
# - Return ChatResponse to caller
4. Idle → ProcessingComplete → Idle#
```python notest
Complete uses decorator pattern#
response = llm.complete(prompt)
Transition to ProcessingComplete:#
1. DecoratorWrap: @chat_to_completion_decorator intercepts#
2. ConvertToMessage: prompt → Message(role=USER, chunks=[TextChunk(content=prompt)])#
3. DelegateToChat: Call self.chat([message])#
[Enters ProcessingChat state]#
4. DecoratorUnwrap: Extract message.content#
5. CreateCompletionResponse: Wrap in CompletionResponse#
Transition back to Idle:#
- Return CompletionResponse to caller#
### 5. Idle → ProcessingStream → Idle
```python notest
# Streaming maintains state across multiple yields
for chunk in llm.chat(messages, stream=True):
print(chunk.message.content)
# Transition to ProcessingStream:
# 1. BuildingStreamRequest: Create request with stream=True
# 2. SendingStreamRequest: client.chat(stream=True)
# 3. StreamLoop - for each chunk:
# a. WaitingChunk: Block for next chunk
# b. ReceivingChunk: Chunk dict arrives
# c. ParsingChunk: _chat_stream_from_response(chunk)
# d. AccumulatingContent: Append to content buffer
# e. AccumulatingTools: Append to tool_calls buffer
# f. YieldingResponse: Create and yield ChatResponse with delta
# g. CheckDone: If done=True, exit loop
# 4. StreamLoop exits when done=True
# Transition back to Idle:
# - Generator exhausted
6. Idle → ProcessingAsync → Idle#
```python notest
Async uses separate client and event loop#
response = await llm.achat(messages)
Transition to ProcessingAsync:#
1. EnsureAsyncClient: Check self._async_client#
- If None, create AsyncClient(host=base_url, timeout=timeout)#
2. BuildingAsyncRequest: Create request dict#
3. SendingAsyncRequest: await async_client.chat(**request)#
4. WaitingAsyncResponse: Coroutine awaits response#
5. ParsingAsyncResponse: _chat_from_response(raw)#
6. CreatingAsyncChatResponse: Build ChatResponse#
Transition back to Idle:#
- Return ChatResponse to caller#
### 7. Idle → ProcessingTools → Idle
```python notest
# Tool calling adds preparation and validation steps
response = llm.generate_tool_calls(messages, tools)
# Transition to ProcessingTools:
# 1. PreparingTools: _prepare_chat_with_tools(messages, tools)
# 2. ConvertingTools: For each tool:
# a. ExtractingSchemas: Get tool.metadata.fn_schema
# b. BuildingToolDicts: Create Ollama tool dict format
# 3. MergingKwargs: Add tools to kwargs
# 4. CallingChat: Delegate to chat(messages, **kwargs)
# [Enters ProcessingChat state]
# 5. ValidatingToolResponse: _validate_chat_with_tools_response
# 6. CheckParallel: Check allow_parallel_tool_calls flag
# - If False: ForcingSingle → trim to first tool call
# - If True: ReturningMultiple → keep all tool calls
# Transition back to Idle:
# - Return ChatResponse with tool_calls
8. Any State → Error → Idle#
```python notest try: response = llm.chat(messages) except Exception as e: # Handle error
Error transition can occur from:#
- ProcessingChat: Network timeout, invalid response#
- ProcessingComplete: Any chat error propagates#
- ProcessingStream: Chunk parsing error#
- ProcessingAsync: Async operation failure#
- ProcessingTools: Tool schema validation error#
Error state:#
1. LoggingError: Log exception details#
2. RaisingException: Raise appropriate exception type#
- TimeoutError: timeout exceeded#
- ConnectionError: Cannot reach server#
- ValueError: Invalid response format#
- KeyError: Missing required field in response#
Transition back to Idle:#
- Exception handled by caller#
- Instance still usable for next call#
## State Variables
### Configuration State (Immutable after init)
```python notest
# Set during __init__, never change
self.model: str = "llama3.1"
self.base_url: str = "http://localhost:11434"
self.timeout: float = 180.0
self.temperature: float = 0.75
self.context_window: int = 3900
self.prompt_key: str = "prompt"
self.json_mode: bool = False
self.additional_kwargs: dict[str, Any] = {}
self.keep_alive: Optional[str] = None
self._is_function_calling_model: bool = True
Client State (Mutable, lazy-initialized)#
```python notest
None until first use#
self._client: Client | None = None self._async_client: AsyncClient | None = None
After first sync call#
self._client: Client = Client(host=self.base_url, timeout=self.timeout)
After first async call#
self._async_client: AsyncClient = AsyncClient(host=self.base_url, timeout=self.timeout)
### Request State (Per-call, transient)
```python notest
# Created fresh for each call, not stored
request_dict = {
"model": self.model,
"messages": [...],
"options": {"temperature": self.temperature, ...},
"stream": False,
"format": "json" if self.json_mode else None,
"tools": [...] if tools else None,
"keep_alive": self.keep_alive
}
Streaming State (Per-stream, transient)#
```python notest
Maintained during stream, not stored on instance#
accumulated_content: str = "" accumulated_tool_calls: list[dict] = [] current_chunk: dict = {} done: bool = False
### Response State (Per-call, returned)
```python notest
# Created and returned, not stored
chat_response = ChatResponse(
message=Message(
role=MessageRole.ASSISTANT,
chunks=[TextChunk(content="...")],
additional_kwargs={"tool_calls": [...]}
),
raw={...},
additional_kwargs={...}
)
Lifecycle Diagram#
graph LR
A[Create Instance] --> B[Configured State]
B --> C{First Call?}
C -->|Yes| D[Initialize Client]
C -->|No| E[Use Existing Client]
D --> E
E --> F[Process Request]
F --> G{Success?}
G -->|Yes| H[Return Response]
G -->|No| I[Handle Error]
H --> J{More Calls?}
I --> J
J -->|Yes| E
J -->|No| K[Delete Instance]
style A fill:#e1f5ff
style B fill:#fff9c4
style D fill:#f3e5f5
style F fill:#e0f2f1
style H fill:#c8e6c9
style I fill:#ffcdd2
style K fill:#efebe9
Concurrency Considerations#
Thread Safety#
Ollama instance is NOT thread-safe by default:
- _client and _async_client are shared state
- Lazy initialization is not synchronized
Recommendation:
- Use separate Ollama instance per thread
- Or use locks around lazy initialization
Async Safety#
Ollama async methods are event-loop safe:
- Uses separate _async_client per event loop
- No shared mutable state in async methods
Safe to use:
- Multiple concurrent achat() calls in same loop
- Multiple event loops with same instance (separate clients)
Streaming State#
Each stream maintains its own state:
- Generator/async generator has local variables
- No shared state between streams
Safe to have:
- Multiple concurrent streams from same instance
State Management Best Practices#
1. Initialization#
```python notest import os from serapeum.ollama import Ollama
✓ Good: Initialize once, reuse#
llm = Ollama(model="llama3.1", api_key=os.environ.get("OLLAMA_API_KEY"), timeout=180)
✗ Bad: Create new instance per call#
def get_response(prompt): llm = Ollama(model="llama3.1", api_key=os.environ.get("OLLAMA_API_KEY")) # Inefficient return llm.complete(prompt)
### 2. Client Reuse
```python notest
import os
from serapeum.ollama import Ollama
# ✓ Good: Client automatically reused
llm = Ollama(model="llama3.1", api_key=os.environ.get("OLLAMA_API_KEY"))
response1 = llm.chat(messages1) # Creates client
response2 = llm.chat(messages2) # Reuses client
# ✗ Bad: Don't access _client directly
llm._client = None # Don't do this
3. Configuration#
```python notest import os from serapeum.ollama import Ollama
✓ Good: Set configuration at init#
llm = Ollama(model="llama3.1", api_key=os.environ.get("OLLAMA_API_KEY"), temperature=0.8, json_mode=True)
✗ Bad: Don't modify config after init#
llm.temperature = 0.5 # Config is immutable
### 4. Error Handling
```python notest
import os
from serapeum.ollama import Ollama
# ✓ Good: Instance remains usable after error
llm = Ollama(model="llama3.1", api_key=os.environ.get("OLLAMA_API_KEY"))
try:
response = llm.chat(messages)
except TimeoutError:
# Can still use llm for next call
response = llm.chat(messages, temperature=0.2)
# ✓ Good: Instance is reusable
5. Streaming#
```python notest
✓ Good: Complete stream before next call#
for chunk in llm.chat(messages1, stream=True): process(chunk) response = llm.chat(messages2) # Safe
⚠ Warning: Interleaving streams#
stream1 = llm.chat(messages1, stream=True) stream2 = llm.chat(messages2, stream=True) # Both use same client ```