Skip to content

get_file_structure

Analyze file structure and retrieve comprehensive metadata and statistics.

Overview

The get_file_structure tool provides detailed information about a file's structure, including line count, size, encoding, and statistical analysis. This is essential for understanding file characteristics before processing.

Usage

json
{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/data/large-dataset.csv"
  }
}

Parameters

ParameterTypeRequiredDefaultDescription
filePathstringYes-Absolute or relative path to the file

Response Format

typescript
{
  filePath: string;          // Absolute path to file
  fileName: string;          // File name without path
  fileSize: number;          // Size in bytes
  totalLines: number;        // Total line count
  encoding: string;          // Detected encoding (e.g., 'utf8')
  detectedType: string;      // File type (e.g., 'log', 'csv', 'json')
  chunkSize: number;         // Recommended lines per chunk
  totalChunks: number;       // Total number of chunks
  created: Date;             // File creation date
  modified: Date;            // Last modified date
  statistics: {
    avgLineLength: number;   // Average characters per line
    maxLineLength: number;   // Longest line length
    minLineLength: number;   // Shortest line length
    emptyLines: number;      // Count of empty lines
  };
}

Examples

Basic File Analysis

Analyze a log file:

json
{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/var/log/system.log"
  }
}

Response:

json
{
  "filePath": "/var/log/system.log",
  "fileName": "system.log",
  "fileSize": 10485760,
  "totalLines": 25000,
  "encoding": "utf8",
  "detectedType": "log",
  "chunkSize": 500,
  "totalChunks": 50,
  "created": "2024-01-01T00:00:00.000Z",
  "modified": "2024-01-10T15:30:00.000Z",
  "statistics": {
    "avgLineLength": 120,
    "maxLineLength": 512,
    "minLineLength": 45,
    "emptyLines": 150
  }
}

CSV File Structure

Analyze a CSV dataset:

json
{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/data/transactions.csv"
  }
}

Response:

json
{
  "filePath": "/data/transactions.csv",
  "fileName": "transactions.csv",
  "fileSize": 52428800,
  "totalLines": 500000,
  "encoding": "utf8",
  "detectedType": "csv",
  "chunkSize": 1000,
  "totalChunks": 500,
  "created": "2024-01-05T08:00:00.000Z",
  "modified": "2024-01-10T14:00:00.000Z",
  "statistics": {
    "avgLineLength": 105,
    "maxLineLength": 250,
    "minLineLength": 95,
    "emptyLines": 0
  }
}

Code File Analysis

Analyze a TypeScript file:

json
{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/code/app.ts"
  }
}

Response:

json
{
  "filePath": "/code/app.ts",
  "fileName": "app.ts",
  "fileSize": 65536,
  "totalLines": 1250,
  "encoding": "utf8",
  "detectedType": "code",
  "chunkSize": 300,
  "totalChunks": 5,
  "created": "2024-01-01T10:00:00.000Z",
  "modified": "2024-01-10T16:45:00.000Z",
  "statistics": {
    "avgLineLength": 52,
    "maxLineLength": 180,
    "minLineLength": 0,
    "emptyLines": 85
  }
}

File Type Detection

The tool automatically detects file type based on extension:

ExtensionDetected TypeChunk SizeTypical Use Case
.txttext500Plain text files
.loglog500Application logs
.csvcsv1000Data exports
.jsonjson100Configuration, API data
.xmlxml200Structured data
.mdmarkdown500Documentation
.ts, .js, .py, .javacode300Source code
.yml, .yamlconfig300Configuration
.sqlsql300Database scripts
.sh, .bashshell300Shell scripts

Use Cases

1. Pre-Processing Assessment

Determine optimal processing strategy:

typescript
const structure = await get_file_structure({
  filePath: "/data/large.csv"
});

if (structure.fileSize > 100_000_000) {
  // Use streaming approach
  console.log("Large file detected, using streaming");
} else {
  // Can load into memory
  console.log("Small file, loading directly");
}

console.log(`Will process in ${structure.totalChunks} chunks`);

2. Resource Planning

Calculate processing time and memory requirements:

typescript
const structure = await get_file_structure({
  filePath: "/logs/app.log"
});

const estimatedMemory = structure.statistics.avgLineLength * structure.chunkSize;
const estimatedTime = structure.totalChunks * 100; // 100ms per chunk

console.log(`Memory per chunk: ${estimatedMemory} bytes`);
console.log(`Estimated processing time: ${estimatedTime}ms`);

3. Data Quality Check

Identify potential issues:

typescript
const structure = await get_file_structure({
  filePath: "/data/import.csv"
});

// Check for unusual line lengths
if (structure.statistics.maxLineLength > 10000) {
  console.warn("Unusually long lines detected");
}

// Check for empty lines
const emptyLinePercent =
  (structure.statistics.emptyLines / structure.totalLines) * 100;

if (emptyLinePercent > 10) {
  console.warn(`${emptyLinePercent}% empty lines`);
}

4. File Comparison

Compare multiple files:

typescript
const file1 = await get_file_structure({
  filePath: "/data/old.csv"
});

const file2 = await get_file_structure({
  filePath: "/data/new.csv"
});

console.log(`Line difference: ${file2.totalLines - file1.totalLines}`);
console.log(`Size difference: ${file2.fileSize - file1.fileSize} bytes`);

5. Archive Decision

Determine if file should be archived:

typescript
const structure = await get_file_structure({
  filePath: "/logs/old.log"
});

const daysSinceModified =
  (Date.now() - structure.modified.getTime()) / (1000 * 60 * 60 * 24);

if (daysSinceModified > 30 && structure.fileSize > 10_000_000) {
  console.log("Consider archiving this file");
}

Statistics Interpretation

Average Line Length

Indicates file structure:

  • < 50 chars: Likely structured data or code
  • 50-200 chars: Normal text/logs
  • > 200 chars: Verbose logs or JSON

Max Line Length

Warns about potential issues:

  • > 1000 chars: May cause performance issues
  • > 10000 chars: Consider pre-processing

Empty Lines

Indicates formatting:

  • 0%: Dense data files (CSV, JSON)
  • 5-10%: Normal code/text
  • > 20%: Sparse formatting or issues

Performance

File SizeAnalysis TimeMemory UsageNotes
< 1MB< 50msMinimalFull scan
1-10MB50-200ms< 10MBStreaming
10-100MB200-1000ms< 50MBLine counting
100MB-1GB1-5s< 100MBOptimized scan
> 1GB5-30s< 200MBProgressive

Error Handling

File Not Found

json
{
  "error": "File not found: /path/to/file.csv",
  "code": "ENOENT"
}

Permission Denied

json
{
  "error": "Permission denied: /root/protected.log",
  "code": "EACCES"
}

Unsupported File Type

json
{
  "error": "Binary file not supported: /data/image.png",
  "code": "UNSUPPORTED_TYPE"
}

Best Practices

1. Always Analyze Before Processing

Check file structure before heavy operations:

typescript
// Good: Analyze first
const structure = await get_file_structure({ filePath });
console.log(`Processing ${structure.totalChunks} chunks`);

// Then process
for (let i = 0; i < structure.totalChunks; i++) {
  await process_chunk(filePath, i);
}

2. Cache Structure Information

Structure rarely changes, cache it:

typescript
const structureCache = new Map();

async function getStructureCached(filePath) {
  if (!structureCache.has(filePath)) {
    const structure = await get_file_structure({ filePath });
    structureCache.set(filePath, structure);
  }
  return structureCache.get(filePath);
}

3. Validate File Size

Check size before processing:

typescript
const structure = await get_file_structure({ filePath });

if (structure.fileSize > MAX_FILE_SIZE) {
  throw new Error(`File too large: ${structure.fileSize} bytes`);
}

4. Use Statistics for Optimization

Adapt chunk size based on line length:

typescript
const structure = await get_file_structure({ filePath });

const optimalChunkSize = structure.statistics.avgLineLength < 100
  ? 1000  // Small lines, larger chunks
  : 300;  // Large lines, smaller chunks

See Also

Released under the MIT License.