Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
Add this skill
npx mdskills install anthropics/spreadsheets-xlsxComprehensive Excel automation with strong formatting standards and formula best practices
1---2name: xlsx3description: "Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved."4license: Proprietary. LICENSE.txt has complete terms5---67# Requirements for Outputs89## All Excel files1011### Professional Font12- Use a consistent, professional font (e.g., Arial, Times New Roman) for all deliverables unless otherwise instructed by the user1314### Zero Formula Errors15- Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)1617### Preserve Existing Templates (when updating templates)18- Study and EXACTLY match existing format, style, and conventions when modifying files19- Never impose standardized formatting on files with established patterns20- Existing template conventions ALWAYS override these guidelines2122## Financial models2324### Color Coding Standards25Unless otherwise stated by the user or existing template2627#### Industry-Standard Color Conventions28- **Blue text (RGB: 0,0,255)**: Hardcoded inputs, and numbers users will change for scenarios29- **Black text (RGB: 0,0,0)**: ALL formulas and calculations30- **Green text (RGB: 0,128,0)**: Links pulling from other worksheets within same workbook31- **Red text (RGB: 255,0,0)**: External links to other files32- **Yellow background (RGB: 255,255,0)**: Key assumptions needing attention or cells that need to be updated3334### Number Formatting Standards3536#### Required Format Rules37- **Years**: Format as text strings (e.g., "2024" not "2,024")38- **Currency**: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")39- **Zeros**: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")40- **Percentages**: Default to 0.0% format (one decimal)41- **Multiples**: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)42- **Negative numbers**: Use parentheses (123) not minus -1234344### Formula Construction Rules4546#### Assumptions Placement47- Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells48- Use cell references instead of hardcoded values in formulas49- Example: Use =B5*(1+$B$6) instead of =B5*1.055051#### Formula Error Prevention52- Verify all cell references are correct53- Check for off-by-one errors in ranges54- Ensure consistent formulas across all projection periods55- Test with edge cases (zero values, negative numbers)56- Verify no unintended circular references5758#### Documentation Requirements for Hardcodes59- Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"60- Examples:61 - "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"62 - "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"63 - "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"64 - "Source: FactSet, 8/20/2025, Consensus Estimates Screen"6566# XLSX creation, editing, and analysis6768## Overview6970A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.7172## Important Requirements7374**LibreOffice Required for Formula Recalculation**: You can assume LibreOffice is installed for recalculating formula values using the `scripts/recalc.py` script. The script automatically configures LibreOffice on first run, including in sandboxed environments where Unix sockets are restricted (handled by `scripts/office/soffice.py`)7576## Reading and analyzing data7778### Data analysis with pandas79For data analysis, visualization, and basic operations, use **pandas** which provides powerful data manipulation capabilities:8081```python82import pandas as pd8384# Read Excel85df = pd.read_excel('file.xlsx') # Default: first sheet86all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict8788# Analyze89df.head() # Preview data90df.info() # Column info91df.describe() # Statistics9293# Write Excel94df.to_excel('output.xlsx', index=False)95```9697## Excel File Workflows9899## CRITICAL: Use Formulas, Not Hardcoded Values100101**Always use Excel formulas instead of calculating values in Python and hardcoding them.** This ensures the spreadsheet remains dynamic and updateable.102103### ❌ WRONG - Hardcoding Calculated Values104```python105# Bad: Calculating in Python and hardcoding result106total = df['Sales'].sum()107sheet['B10'] = total # Hardcodes 5000108109# Bad: Computing growth rate in Python110growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']111sheet['C5'] = growth # Hardcodes 0.15112113# Bad: Python calculation for average114avg = sum(values) / len(values)115sheet['D20'] = avg # Hardcodes 42.5116```117118### ✅ CORRECT - Using Excel Formulas119```python120# Good: Let Excel calculate the sum121sheet['B10'] = '=SUM(B2:B9)'122123# Good: Growth rate as Excel formula124sheet['C5'] = '=(C4-C2)/C2'125126# Good: Average using Excel function127sheet['D20'] = '=AVERAGE(D2:D19)'128```129130This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.131132## Common Workflow1331. **Choose tool**: pandas for data, openpyxl for formulas/formatting1342. **Create/Load**: Create new workbook or load existing file1353. **Modify**: Add/edit data, formulas, and formatting1364. **Save**: Write to file1375. **Recalculate formulas (MANDATORY IF USING FORMULAS)**: Use the scripts/recalc.py script138 ```bash139 python scripts/recalc.py output.xlsx140 ```1416. **Verify and fix any errors**:142 - The script returns JSON with error details143 - If `status` is `errors_found`, check `error_summary` for specific error types and locations144 - Fix the identified errors and recalculate again145 - Common errors to fix:146 - `#REF!`: Invalid cell references147 - `#DIV/0!`: Division by zero148 - `#VALUE!`: Wrong data type in formula149 - `#NAME?`: Unrecognized formula name150151### Creating new Excel files152153```python154# Using openpyxl for formulas and formatting155from openpyxl import Workbook156from openpyxl.styles import Font, PatternFill, Alignment157158wb = Workbook()159sheet = wb.active160161# Add data162sheet['A1'] = 'Hello'163sheet['B1'] = 'World'164sheet.append(['Row', 'of', 'data'])165166# Add formula167sheet['B2'] = '=SUM(A1:A10)'168169# Formatting170sheet['A1'].font = Font(bold=True, color='FF0000')171sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')172sheet['A1'].alignment = Alignment(horizontal='center')173174# Column width175sheet.column_dimensions['A'].width = 20176177wb.save('output.xlsx')178```179180### Editing existing Excel files181182```python183# Using openpyxl to preserve formulas and formatting184from openpyxl import load_workbook185186# Load existing file187wb = load_workbook('existing.xlsx')188sheet = wb.active # or wb['SheetName'] for specific sheet189190# Working with multiple sheets191for sheet_name in wb.sheetnames:192 sheet = wb[sheet_name]193 print(f"Sheet: {sheet_name}")194195# Modify cells196sheet['A1'] = 'New Value'197sheet.insert_rows(2) # Insert row at position 2198sheet.delete_cols(3) # Delete column 3199200# Add new sheet201new_sheet = wb.create_sheet('NewSheet')202new_sheet['A1'] = 'Data'203204wb.save('modified.xlsx')205```206207## Recalculating formulas208209Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided `scripts/recalc.py` script to recalculate formulas:210211```bash212python scripts/recalc.py <excel_file> [timeout_seconds]213```214215Example:216```bash217python scripts/recalc.py output.xlsx 30218```219220The script:221- Automatically sets up LibreOffice macro on first run222- Recalculates all formulas in all sheets223- Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)224- Returns JSON with detailed error locations and counts225- Works on both Linux and macOS226227## Formula Verification Checklist228229Quick checks to ensure formulas work correctly:230231### Essential Verification232- [ ] **Test 2-3 sample references**: Verify they pull correct values before building full model233- [ ] **Column mapping**: Confirm Excel columns match (e.g., column 64 = BL, not BK)234- [ ] **Row offset**: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)235236### Common Pitfalls237- [ ] **NaN handling**: Check for null values with `pd.notna()`238- [ ] **Far-right columns**: FY data often in columns 50+239- [ ] **Multiple matches**: Search all occurrences, not just first240- [ ] **Division by zero**: Check denominators before using `/` in formulas (#DIV/0!)241- [ ] **Wrong references**: Verify all cell references point to intended cells (#REF!)242- [ ] **Cross-sheet references**: Use correct format (Sheet1!A1) for linking sheets243244### Formula Testing Strategy245- [ ] **Start small**: Test formulas on 2-3 cells before applying broadly246- [ ] **Verify dependencies**: Check all cells referenced in formulas exist247- [ ] **Test edge cases**: Include zero, negative, and very large values248249### Interpreting scripts/recalc.py Output250The script returns JSON with error details:251```json252{253 "status": "success", // or "errors_found"254 "total_errors": 0, // Total error count255 "total_formulas": 42, // Number of formulas in file256 "error_summary": { // Only present if errors found257 "#REF!": {258 "count": 2,259 "locations": ["Sheet1!B5", "Sheet1!C10"]260 }261 }262}263```264265## Best Practices266267### Library Selection268- **pandas**: Best for data analysis, bulk operations, and simple data export269- **openpyxl**: Best for complex formatting, formulas, and Excel-specific features270271### Working with openpyxl272- Cell indices are 1-based (row=1, column=1 refers to cell A1)273- Use `data_only=True` to read calculated values: `load_workbook('file.xlsx', data_only=True)`274- **Warning**: If opened with `data_only=True` and saved, formulas are replaced with values and permanently lost275- For large files: Use `read_only=True` for reading or `write_only=True` for writing276- Formulas are preserved but not evaluated - use scripts/recalc.py to update values277278### Working with pandas279- Specify data types to avoid inference issues: `pd.read_excel('file.xlsx', dtype={'id': str})`280- For large files, read specific columns: `pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])`281- Handle dates properly: `pd.read_excel('file.xlsx', parse_dates=['date_column'])`282283## Code Style Guidelines284**IMPORTANT**: When generating Python code for Excel operations:285- Write minimal, concise Python code without unnecessary comments286- Avoid verbose variable names and redundant operations287- Avoid unnecessary print statements288289**For Excel files themselves**:290- Add comments to cells with complex formulas or important assumptions291- Document data sources for hardcoded values292- Include notes for key calculations and model sections
Full transparency — inspect the skill content before installing.