get_file_structure
Analyze file structure and retrieve comprehensive metadata and statistics.
Overview
The get_file_structure tool provides detailed information about a file's structure, including line count, size, encoding, and statistical analysis. This is essential for understanding file characteristics before processing.
Usage
{
"tool": "get_file_structure",
"arguments": {
"filePath": "/data/large-dataset.csv"
}
}Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
filePath | string | Yes | - | Absolute or relative path to the file |
Response Format
{
filePath: string; // Absolute path to file
fileName: string; // File name without path
fileSize: number; // Size in bytes
totalLines: number; // Total line count
encoding: string; // Detected encoding (e.g., 'utf8')
detectedType: string; // File type (e.g., 'log', 'csv', 'json')
chunkSize: number; // Recommended lines per chunk
totalChunks: number; // Total number of chunks
created: Date; // File creation date
modified: Date; // Last modified date
statistics: {
avgLineLength: number; // Average characters per line
maxLineLength: number; // Longest line length
minLineLength: number; // Shortest line length
emptyLines: number; // Count of empty lines
};
}Examples
Basic File Analysis
Analyze a log file:
{
"tool": "get_file_structure",
"arguments": {
"filePath": "/var/log/system.log"
}
}Response:
{
"filePath": "/var/log/system.log",
"fileName": "system.log",
"fileSize": 10485760,
"totalLines": 25000,
"encoding": "utf8",
"detectedType": "log",
"chunkSize": 500,
"totalChunks": 50,
"created": "2024-01-01T00:00:00.000Z",
"modified": "2024-01-10T15:30:00.000Z",
"statistics": {
"avgLineLength": 120,
"maxLineLength": 512,
"minLineLength": 45,
"emptyLines": 150
}
}CSV File Structure
Analyze a CSV dataset:
{
"tool": "get_file_structure",
"arguments": {
"filePath": "/data/transactions.csv"
}
}Response:
{
"filePath": "/data/transactions.csv",
"fileName": "transactions.csv",
"fileSize": 52428800,
"totalLines": 500000,
"encoding": "utf8",
"detectedType": "csv",
"chunkSize": 1000,
"totalChunks": 500,
"created": "2024-01-05T08:00:00.000Z",
"modified": "2024-01-10T14:00:00.000Z",
"statistics": {
"avgLineLength": 105,
"maxLineLength": 250,
"minLineLength": 95,
"emptyLines": 0
}
}Code File Analysis
Analyze a TypeScript file:
{
"tool": "get_file_structure",
"arguments": {
"filePath": "/code/app.ts"
}
}Response:
{
"filePath": "/code/app.ts",
"fileName": "app.ts",
"fileSize": 65536,
"totalLines": 1250,
"encoding": "utf8",
"detectedType": "code",
"chunkSize": 300,
"totalChunks": 5,
"created": "2024-01-01T10:00:00.000Z",
"modified": "2024-01-10T16:45:00.000Z",
"statistics": {
"avgLineLength": 52,
"maxLineLength": 180,
"minLineLength": 0,
"emptyLines": 85
}
}File Type Detection
The tool automatically detects file type based on extension:
| Extension | Detected Type | Chunk Size | Typical Use Case |
|---|---|---|---|
| .txt | text | 500 | Plain text files |
| .log | log | 500 | Application logs |
| .csv | csv | 1000 | Data exports |
| .json | json | 100 | Configuration, API data |
| .xml | xml | 200 | Structured data |
| .md | markdown | 500 | Documentation |
| .ts, .js, .py, .java | code | 300 | Source code |
| .yml, .yaml | config | 300 | Configuration |
| .sql | sql | 300 | Database scripts |
| .sh, .bash | shell | 300 | Shell scripts |
Use Cases
1. Pre-Processing Assessment
Determine optimal processing strategy:
const structure = await get_file_structure({
filePath: "/data/large.csv"
});
if (structure.fileSize > 100_000_000) {
// Use streaming approach
console.log("Large file detected, using streaming");
} else {
// Can load into memory
console.log("Small file, loading directly");
}
console.log(`Will process in ${structure.totalChunks} chunks`);2. Resource Planning
Calculate processing time and memory requirements:
const structure = await get_file_structure({
filePath: "/logs/app.log"
});
const estimatedMemory = structure.statistics.avgLineLength * structure.chunkSize;
const estimatedTime = structure.totalChunks * 100; // 100ms per chunk
console.log(`Memory per chunk: ${estimatedMemory} bytes`);
console.log(`Estimated processing time: ${estimatedTime}ms`);3. Data Quality Check
Identify potential issues:
const structure = await get_file_structure({
filePath: "/data/import.csv"
});
// Check for unusual line lengths
if (structure.statistics.maxLineLength > 10000) {
console.warn("Unusually long lines detected");
}
// Check for empty lines
const emptyLinePercent =
(structure.statistics.emptyLines / structure.totalLines) * 100;
if (emptyLinePercent > 10) {
console.warn(`${emptyLinePercent}% empty lines`);
}4. File Comparison
Compare multiple files:
const file1 = await get_file_structure({
filePath: "/data/old.csv"
});
const file2 = await get_file_structure({
filePath: "/data/new.csv"
});
console.log(`Line difference: ${file2.totalLines - file1.totalLines}`);
console.log(`Size difference: ${file2.fileSize - file1.fileSize} bytes`);5. Archive Decision
Determine if file should be archived:
const structure = await get_file_structure({
filePath: "/logs/old.log"
});
const daysSinceModified =
(Date.now() - structure.modified.getTime()) / (1000 * 60 * 60 * 24);
if (daysSinceModified > 30 && structure.fileSize > 10_000_000) {
console.log("Consider archiving this file");
}Statistics Interpretation
Average Line Length
Indicates file structure:
- < 50 chars: Likely structured data or code
- 50-200 chars: Normal text/logs
- > 200 chars: Verbose logs or JSON
Max Line Length
Warns about potential issues:
- > 1000 chars: May cause performance issues
- > 10000 chars: Consider pre-processing
Empty Lines
Indicates formatting:
- 0%: Dense data files (CSV, JSON)
- 5-10%: Normal code/text
- > 20%: Sparse formatting or issues
Performance
| File Size | Analysis Time | Memory Usage | Notes |
|---|---|---|---|
| < 1MB | < 50ms | Minimal | Full scan |
| 1-10MB | 50-200ms | < 10MB | Streaming |
| 10-100MB | 200-1000ms | < 50MB | Line counting |
| 100MB-1GB | 1-5s | < 100MB | Optimized scan |
| > 1GB | 5-30s | < 200MB | Progressive |
Error Handling
File Not Found
{
"error": "File not found: /path/to/file.csv",
"code": "ENOENT"
}Permission Denied
{
"error": "Permission denied: /root/protected.log",
"code": "EACCES"
}Unsupported File Type
{
"error": "Binary file not supported: /data/image.png",
"code": "UNSUPPORTED_TYPE"
}Best Practices
1. Always Analyze Before Processing
Check file structure before heavy operations:
// Good: Analyze first
const structure = await get_file_structure({ filePath });
console.log(`Processing ${structure.totalChunks} chunks`);
// Then process
for (let i = 0; i < structure.totalChunks; i++) {
await process_chunk(filePath, i);
}2. Cache Structure Information
Structure rarely changes, cache it:
const structureCache = new Map();
async function getStructureCached(filePath) {
if (!structureCache.has(filePath)) {
const structure = await get_file_structure({ filePath });
structureCache.set(filePath, structure);
}
return structureCache.get(filePath);
}3. Validate File Size
Check size before processing:
const structure = await get_file_structure({ filePath });
if (structure.fileSize > MAX_FILE_SIZE) {
throw new Error(`File too large: ${structure.fileSize} bytes`);
}4. Use Statistics for Optimization
Adapt chunk size based on line length:
const structure = await get_file_structure({ filePath });
const optimalChunkSize = structure.statistics.avgLineLength < 100
? 1000 // Small lines, larger chunks
: 300; // Large lines, smaller chunksSee Also
- Tools Overview - All available tools
- read_large_file_chunk - Read file chunks
- search_in_large_file - Search within files
- get_file_summary - Quick file overview
- Performance Guide - Optimization tips
- Best Practices - Usage recommendations