Building an AI Chief of Staff with Claude Tool Use

← lab

Building an AI Chief of Staff with Claude Tool Use

How I turned Claude into a useful assistant using function calling and real tools. Architecture, code patterns, and what actually works.

Most AI assistants are toys.

You ask a question, get a hallucinated answer, maybe it’s helpful. But it’s not grounded in reality.

I wanted something different. An AI that could access my actual data, call real functions, and give me answers based on truth.

The Goal

Build a Chief of Staff that can:

  • Check my calendar and tell me what’s next
  • Look at my finances and answer “can I afford this?”
  • Search my email for specific conversations
  • Analyze transactions and explain where money went
  • Surface what actually matters without me asking

Not a chatbot. A functional tool.

Why Claude?

I tried GPT-4 first. It works, but Claude’s tool use pattern is cleaner:

  1. XML-style function calls — more structured than JSON in practice
  2. Multi-step reasoning — better at chaining tool calls together
  3. Consistent output format — less parsing headaches
  4. Context handling — better memory of tool results

The pattern just clicked for this use case.

The Architecture

FastAPI backend with three layers:

1. Tool Definitions — Python functions that do real things

2. Claude Integration — API calls with tool schemas

3. Conversational Interface — Chat endpoint that manages state

Here’s the structure:

# tools.py
def get_finance_status():
    """Get current financial health status."""
    balances = fetch_balances()  # from SimpleFIN
    burn_rate = calculate_burn_rate()
    runway = balances / burn_rate
    color = get_traffic_light_color(runway)
    
    return {
        "color": color,
        "runway_days": runway,
        "total_balance": balances
    }

def check_calendar(days=1):
    """Check upcoming calendar events."""
    events = fetch_google_calendar(days)
    return [
        {"title": e.title, "start": e.start, "end": e.end}
        for e in events
    ]

def search_email(query, max_results=10):
    """Search email by query string."""
    messages = gmail_search(query)
    return [
        {"from": m.sender, "subject": m.subject, "snippet": m.snippet}
        for m in messages
    ]

Each function returns structured data. No free-form text. Claude interprets it.

The Tool Schema

Claude needs schemas to understand what tools exist and how to call them:

tools = [
    {
        "name": "get_finance_status",
        "description": "Get current financial health including runway days, balance, and traffic light color (green/yellow/red).",
        "input_schema": {
            "type": "object",
            "properties": {},
            "required": []
        }
    },
    {
        "name": "can_i_afford",
        "description": "Check if a specific purchase amount is affordable without dropping below safe thresholds.",
        "input_schema": {
            "type": "object",
            "properties": {
                "amount": {
                    "type": "number",
                    "description": "The dollar amount to check"
                }
            },
            "required": ["amount"]
        }
    },
    {
        "name": "check_calendar",
        "description": "Get upcoming calendar events for the next N days.",
        "input_schema": {
            "type": "object",
            "properties": {
                "days": {
                    "type": "integer",
                    "description": "Number of days to look ahead (default 1)"
                }
            },
            "required": []
        }
    }
]

The descriptions matter. Claude uses them to decide when to call what.

The Execution Loop

Here’s the actual pattern:

async def chat(user_message: str, conversation_history: list):
    # Add user message to history
    messages = conversation_history + [
        {"role": "user", "content": user_message}
    ]
    
    # Initial Claude call
    response = anthropic.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=4096,
        tools=tools,
        messages=messages
    )
    
    # Handle tool calls
    while response.stop_reason == "tool_use":
        # Extract tool calls from response
        tool_calls = [
            block for block in response.content 
            if block.type == "tool_use"
        ]
        
        # Execute each tool
        tool_results = []
        for tool_call in tool_calls:
            result = execute_tool(
                tool_call.name, 
                tool_call.input
            )
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": json.dumps(result)
            })
        
        # Send results back to Claude
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})
        
        response = anthropic.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
    
    # Return final response
    return response.content[0].text

Claude calls tools. We execute them. Claude gets results. Claude synthesizes answer.

The loop continues until Claude has what it needs.

Real Examples

Query: “What’s on my calendar tomorrow?”

Claude calls: check_calendar(days=1)

Returns:

[
    {"title": "MagHugg inventory review", "start": "2026-02-07T10:00:00", "end": "2026-02-07T11:00:00"},
    {"title": "Brand.i client call", "start": "2026-02-07T14:00:00", "end": "2026-02-07T15:00:00"}
]

Claude responds: “You have two things tomorrow: MagHugg inventory review at 10am, and a Brand.i client call at 2pm.”


Query: “Can I afford a $300 Starlink upgrade?”

Claude calls: can_i_afford(amount=300)

Returns:

{
    "affordable": true,
    "current_runway": 32,
    "new_runway": 31,
    "color": "green",
    "warning": null
}

Claude responds: “Yes, you can afford it. Would drop runway from 32 to 31 days, still green.”


Query: “Where did all my money go last week?”

Claude calls: get_transactions(days=7) then categorize_spending(transactions)

Returns:

{
    "total_spent": 1847,
    "by_category": {
        "Van": 450,
        "Personal": 320,
        "Brand.i": 780,
        "MagHugg": 297
    }
}

Claude responds: “You spent $1,847 last week. Biggest categories: Brand.i ($780), Van ($450), Personal ($320), MagHugg ($297).”

What Makes This Useful

The key difference between this and a chatbot:

Chatbot: “Your balance is probably around $10k based on what you told me last week.”

Tool-using AI: “Your current balance is $12,453. Runway is 32 days. You’re green.”

One is a guess. The other is real data.

The architecture ensures:

  • No hallucinations — Claude only says what tools return
  • Always current — Data is fresh on every query
  • Multi-step reasoning — Claude can chain tools together
  • Context awareness — Conversation history persists

The ADHD Angle

For my ADHD brain, this pattern works because:

  1. Conversations > menus — I ask questions, not navigate UIs
  2. Context is automatic — I don’t have to remember account numbers or dates
  3. Synthesis is handled — Claude connects the dots for me
  4. No mental calculation — Just ask and get an answer

It’s the difference between “I need to check three places and do math” and “Can I afford this? Yes.”

Deployment

Running on a Hetzner VPS:

  • FastAPI app on Uvicorn
  • Nginx reverse proxy
  • Systemd service for auto-restart
  • PostgreSQL for state and caching
  • Cron for background sync jobs

Total cost: $6/month.

Not AWS Lambda. Not serverless. Just a simple API on a cheap VPS. It works.

What I Learned

1. Tool descriptions are critical. Claude decides what to call based on the description. Be specific.

2. Structured data beats free text. Return JSON, not prose. Let Claude interpret.

3. The loop pattern is powerful. Multi-step tool use unlocks complex queries.

4. Context management matters. Keep conversation history under control or costs explode.

5. Idempotent tools are safer. Read-only functions first. Write operations need confirmation.

What’s Next

Features I’m adding:

  • Proactive briefings — “Here’s what you need to know today”
  • iOS Shortcuts integration — Voice commands via Siri
  • Email drafting — “Reply to that client with availability”
  • Task extraction — Pull TODOs from email and calendar

The pattern scales. Add a tool, write a schema, Claude figures out when to use it.

The Code

This isn’t theoretical. It’s running in production right now.

The full codebase is private (connects to my actual accounts), but the pattern is simple:

  1. Write Python functions
  2. Define tool schemas
  3. Build the execution loop
  4. Let Claude orchestrate

You can replicate this in a weekend. The tools you build will be different. But the architecture is the same.

The Takeaway

AI becomes useful when it stops being a chatbot and starts being a tool orchestrator.

Give it access to real data. Let it call real functions. Ground its responses in truth.

That’s how you build something that actually works.