get_file_structure

Analyze file structure and retrieve comprehensive metadata and statistics.

Overview

The get_file_structure tool provides detailed information about a file's structure, including line count, size, encoding, and statistical analysis. This is essential for understanding file characteristics before processing.

Usage

json

{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/data/large-dataset.csv"
  }
}

Parameters

Parameter	Type	Required	Default	Description
`filePath`	string	Yes	-	Absolute or relative path to the file

Response Format

typescript

{
  filePath: string;          // Absolute path to file
  fileName: string;          // File name without path
  fileSize: number;          // Size in bytes
  totalLines: number;        // Total line count
  encoding: string;          // Detected encoding (e.g., 'utf8')
  detectedType: string;      // File type (e.g., 'log', 'csv', 'json')
  chunkSize: number;         // Recommended lines per chunk
  totalChunks: number;       // Total number of chunks
  created: Date;             // File creation date
  modified: Date;            // Last modified date
  statistics: {
    avgLineLength: number;   // Average characters per line
    maxLineLength: number;   // Longest line length
    minLineLength: number;   // Shortest line length
    emptyLines: number;      // Count of empty lines
  };
}

Examples

Basic File Analysis

Analyze a log file:

json

{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/var/log/system.log"
  }
}

Response:

json

{
  "filePath": "/var/log/system.log",
  "fileName": "system.log",
  "fileSize": 10485760,
  "totalLines": 25000,
  "encoding": "utf8",
  "detectedType": "log",
  "chunkSize": 500,
  "totalChunks": 50,
  "created": "2024-01-01T00:00:00.000Z",
  "modified": "2024-01-10T15:30:00.000Z",
  "statistics": {
    "avgLineLength": 120,
    "maxLineLength": 512,
    "minLineLength": 45,
    "emptyLines": 150
  }
}

CSV File Structure

Analyze a CSV dataset:

json

{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/data/transactions.csv"
  }
}

Response:

json

{
  "filePath": "/data/transactions.csv",
  "fileName": "transactions.csv",
  "fileSize": 52428800,
  "totalLines": 500000,
  "encoding": "utf8",
  "detectedType": "csv",
  "chunkSize": 1000,
  "totalChunks": 500,
  "created": "2024-01-05T08:00:00.000Z",
  "modified": "2024-01-10T14:00:00.000Z",
  "statistics": {
    "avgLineLength": 105,
    "maxLineLength": 250,
    "minLineLength": 95,
    "emptyLines": 0
  }
}

Code File Analysis

Analyze a TypeScript file:

json

{
  "tool": "get_file_structure",
  "arguments": {
    "filePath": "/code/app.ts"
  }
}

Response:

json

{
  "filePath": "/code/app.ts",
  "fileName": "app.ts",
  "fileSize": 65536,
  "totalLines": 1250,
  "encoding": "utf8",
  "detectedType": "code",
  "chunkSize": 300,
  "totalChunks": 5,
  "created": "2024-01-01T10:00:00.000Z",
  "modified": "2024-01-10T16:45:00.000Z",
  "statistics": {
    "avgLineLength": 52,
    "maxLineLength": 180,
    "minLineLength": 0,
    "emptyLines": 85
  }
}

File Type Detection

The tool automatically detects file type based on extension:

Extension	Detected Type	Chunk Size	Typical Use Case
.txt	text	500	Plain text files
.log	log	500	Application logs
.csv	csv	1000	Data exports
.json	json	100	Configuration, API data
.xml	xml	200	Structured data
.md	markdown	500	Documentation
.ts, .js, .py, .java	code	300	Source code
.yml, .yaml	config	300	Configuration
.sql	sql	300	Database scripts
.sh, .bash	shell	300	Shell scripts

Use Cases

1. Pre-Processing Assessment

Determine optimal processing strategy:

typescript

const structure = await get_file_structure({
  filePath: "/data/large.csv"
});

if (structure.fileSize > 100_000_000) {
  // Use streaming approach
  console.log("Large file detected, using streaming");
} else {
  // Can load into memory
  console.log("Small file, loading directly");
}

console.log(`Will process in ${structure.totalChunks} chunks`);

2. Resource Planning

Calculate processing time and memory requirements:

typescript

const structure = await get_file_structure({
  filePath: "/logs/app.log"
});

const estimatedMemory = structure.statistics.avgLineLength * structure.chunkSize;
const estimatedTime = structure.totalChunks * 100; // 100ms per chunk

console.log(`Memory per chunk: ${estimatedMemory} bytes`);
console.log(`Estimated processing time: ${estimatedTime}ms`);

3. Data Quality Check

Identify potential issues:

typescript

const structure = await get_file_structure({
  filePath: "/data/import.csv"
});

// Check for unusual line lengths
if (structure.statistics.maxLineLength > 10000) {
  console.warn("Unusually long lines detected");
}

// Check for empty lines
const emptyLinePercent =
  (structure.statistics.emptyLines / structure.totalLines) * 100;

if (emptyLinePercent > 10) {
  console.warn(`${emptyLinePercent}% empty lines`);
}

4. File Comparison

Compare multiple files:

typescript

const file1 = await get_file_structure({
  filePath: "/data/old.csv"
});

const file2 = await get_file_structure({
  filePath: "/data/new.csv"
});

console.log(`Line difference: ${file2.totalLines - file1.totalLines}`);
console.log(`Size difference: ${file2.fileSize - file1.fileSize} bytes`);

5. Archive Decision

Determine if file should be archived:

typescript

const structure = await get_file_structure({
  filePath: "/logs/old.log"
});

const daysSinceModified =
  (Date.now() - structure.modified.getTime()) / (1000 * 60 * 60 * 24);

if (daysSinceModified > 30 && structure.fileSize > 10_000_000) {
  console.log("Consider archiving this file");
}

Statistics Interpretation

Average Line Length

Indicates file structure:

< 50 chars: Likely structured data or code
50-200 chars: Normal text/logs
> 200 chars: Verbose logs or JSON

Max Line Length

Warns about potential issues:

> 1000 chars: May cause performance issues
> 10000 chars: Consider pre-processing

Empty Lines

Indicates formatting:

0%: Dense data files (CSV, JSON)
5-10%: Normal code/text
> 20%: Sparse formatting or issues

Performance

File Size	Analysis Time	Memory Usage	Notes
< 1MB	< 50ms	Minimal	Full scan
1-10MB	50-200ms	< 10MB	Streaming
10-100MB	200-1000ms	< 50MB	Line counting
100MB-1GB	1-5s	< 100MB	Optimized scan
> 1GB	5-30s	< 200MB	Progressive

Error Handling

File Not Found

json

{
  "error": "File not found: /path/to/file.csv",
  "code": "ENOENT"
}

Permission Denied

json

{
  "error": "Permission denied: /root/protected.log",
  "code": "EACCES"
}

Unsupported File Type

json

{
  "error": "Binary file not supported: /data/image.png",
  "code": "UNSUPPORTED_TYPE"
}

Best Practices

1. Always Analyze Before Processing

Check file structure before heavy operations:

typescript

// Good: Analyze first
const structure = await get_file_structure({ filePath });
console.log(`Processing ${structure.totalChunks} chunks`);

// Then process
for (let i = 0; i < structure.totalChunks; i++) {
  await process_chunk(filePath, i);
}

2. Cache Structure Information

Structure rarely changes, cache it:

typescript

const structureCache = new Map();

async function getStructureCached(filePath) {
  if (!structureCache.has(filePath)) {
    const structure = await get_file_structure({ filePath });
    structureCache.set(filePath, structure);
  }
  return structureCache.get(filePath);
}

3. Validate File Size

Check size before processing:

typescript

const structure = await get_file_structure({ filePath });

if (structure.fileSize > MAX_FILE_SIZE) {
  throw new Error(`File too large: ${structure.fileSize} bytes`);
}

4. Use Statistics for Optimization

Adapt chunk size based on line length:

typescript

const structure = await get_file_structure({ filePath });

const optimalChunkSize = structure.statistics.avgLineLength < 100
  ? 1000  // Small lines, larger chunks
  : 300;  // Large lines, smaller chunks

get_file_structure ​

Overview ​

Usage ​

Parameters ​

Response Format ​

Examples ​

Basic File Analysis ​

CSV File Structure ​

Code File Analysis ​

File Type Detection ​

Use Cases ​

1. Pre-Processing Assessment ​

2. Resource Planning ​

3. Data Quality Check ​

4. File Comparison ​

5. Archive Decision ​

Statistics Interpretation ​

Average Line Length ​

Max Line Length ​

Empty Lines ​

Performance ​

Error Handling ​

File Not Found ​

Permission Denied ​

Unsupported File Type ​

Best Practices ​

1. Always Analyze Before Processing ​

2. Cache Structure Information ​

3. Validate File Size ​

4. Use Statistics for Optimization ​

See Also ​

get_file_structure

Overview

Usage

Parameters

Response Format

Examples

Basic File Analysis

CSV File Structure

Code File Analysis

File Type Detection

Use Cases

1. Pre-Processing Assessment

2. Resource Planning

3. Data Quality Check

4. File Comparison

5. Archive Decision

Statistics Interpretation

Average Line Length

Max Line Length

Empty Lines

Performance

Error Handling

File Not Found

Permission Denied

Unsupported File Type

Best Practices

1. Always Analyze Before Processing

2. Cache Structure Information

3. Validate File Size

4. Use Statistics for Optimization

See Also