Skip to content

Automated Metrics Documentation Generator

Overview

This documentation describes the automated scripting process used to extract and document InfluxDB metrics from the Brekz PrestaShop codebase. The script was instrumental in building the comprehensive metrics tables found in the add-checkout-metrics branch.

What This Script Does

The metrics documentation generator performs the following tasks:

  1. Searches the codebase for all InfluxDb::writeMeasurement() calls using ripgrep with multiline pattern matching
  2. Parses measurement data including measurement names, fields, and tags from the PHP code
  3. Generates GitHub permalinks to the exact line of code where each metric is tracked
  4. Formats output as Markdown tables ready to be pasted directly into the documentation

This automated approach ensures that: - All metrics are consistently documented - GitHub links point to the correct source code locations - The documentation format matches the project standards - No metrics are accidentally overlooked during manual documentation

Prerequisites

Before running the script, ensure you have the following installed:

  • Ripgrep (rg) - A fast recursive search tool (installation guide)
  • PHP CLI - Command-line PHP interpreter (version 7.4 or higher)
  • Clipboard utility:
    • macOS: pbcopy (built-in)
    • Linux: xclip or xsel
    • Windows: clip (built-in) or WSL with xclip

How to Use

Step 1: Navigate to Your Codebase

cd /path/to/brekz-prestashop

Step 2: Create the PHP Script

Save the PHP script (shown in the PHP Script Source section below) as getInfluxMeasurementRowArray in a location accessible from your codebase. Make it executable:

chmod +x getInfluxMeasurementRowArray

Step 3: Run the Command

Execute the ripgrep command piped into the PHP script:

macOS:

rg -n --color=always -o 'InfluxDb::writeMeasurement(.|\n)*?\)\s*;' --multiline --with-filename | ./getInfluxMeasurementRowArray | pbcopy

Linux:

rg -n --color=always -o 'InfluxDb::writeMeasurement(.|\n)*?\)\s*;' --multiline --with-filename | ./getInfluxMeasurementRowArray | xclip -selection clipboard

Windows (WSL):

rg -n --color=always -o 'InfluxDb::writeMeasurement(.|\n)*?\)\s*;' --multiline --with-filename | ./getInfluxMeasurementRowArray | clip.exe

Step 4: Paste Into Documentation

The generated Markdown table rows are now in your clipboard. Paste them into your documentation file under the appropriate table header.

Command Breakdown

Let's break down the ripgrep command to understand what each part does:

rg -n --color=always -o 'InfluxDb::writeMeasurement(.|\n)*?\)\s*;' --multiline --with-filename
  • rg - Invokes ripgrep
  • -n - Show line numbers
  • --color=always - Preserve color codes in output (helpful for debugging)
  • -o - Only show the matching part of lines
  • 'InfluxDb::writeMeasurement(.|\n)*?\)\s*;' - Regex pattern to match entire InfluxDB measurement calls spanning multiple lines
  • --multiline - Enable matching across line breaks
  • --with-filename - Include the filename and line number in output

The output is then piped (|) to the PHP script which parses and formats it.

Customizing for Your Project

If you need to update the GitHub repository URL or commit hash, modify the getPermaLink() function in the PHP script:

function getPermaLink(string $fileLine): string
{
    // Update this URL to match your repository and commit
    $githubUrlPart = "https://github.com/your-org/your-repo/blob/COMMIT_HASH/";
    $fileLines     = explode(":", $fileLine, 2);

    return '[Github Link](' . $githubUrlPart . $fileLines[0] . "#L" . ($fileLines[1] ?? '1') . ')';
}

Modifying the Table Format

To change the output table structure, edit the formatAsTableRows() function:

function formatAsTableRows(array $data): array
{
    $result = [];

    foreach ($data as $item) {
        $result[] = sprintf(
                '| <a id="MT-0" href="#MT-0"> MT-0 </a> | %s | Event | InfluxDB | %s | %s |',
                $item['title'],
                $item['permalink'],
                $item['filters/tags']
        );
    }

    return $result;
}

Searching Different Patterns

To search for different metric patterns, adjust the regex in the ripgrep command. For example, to find Google Analytics events:

rg -n --color=always -o 'ga\(.*?\);' --multiline --with-filename

Troubleshooting

No Output Generated

  • Verify that ripgrep is installed: rg --version
  • Check that you're in the correct directory with PHP files
  • Test the ripgrep pattern alone first: rg 'InfluxDb::writeMeasurement' --multiline

PHP Errors

  • Ensure PHP CLI is available: php --version
  • Check script permissions: ls -l getInfluxMeasurementRowArray
  • Run the script directly with test input: echo "test" | ./getInfluxMeasurementRowArray

Incomplete Matches

  • Some measurements may be too complex for the regex pattern
  • Check the error output (stderr) for parsing warnings
  • Manually verify complex measurement calls

Example Output

The script generates Markdown table rows like this:

| <a id="MT-0" href="#MT-0"> MT-0 </a> | CartController: cart - add | Event | InfluxDB | [Github Link](https://github.com/brekz-group/brekz-prestashop/blob/3c9b697e7722eb52b8ed4c1981cc8091e7824013/web/override/controllers/front/CartController.php#L76) | - shop_id: (string)$this->context->shop->id <br> - shop_name: $this->context->shop->name <br> |

PHP Script Source

Below is the complete PHP script used to parse ripgrep output and generate documentation:

#!/usr/bin/env php
<?php

function errorPrint($message)
{
    file_put_contents('php://stderr', "ERROR: " . $message . "\n");
}

function getPermaLink(string $fileLine): string
{
    // Handle cases where fileLine might contain line numbers or fragments
    if (strpos($fileLine, '#L') !== false) {
        $fileLine = substr($fileLine, 0, strpos($fileLine, '#L'));
    }

    $githubUrlPart = "https://github.com/brekz-group/brekz-prestashop/blob/3c9b697e7722eb52b8ed4c1981cc8091e7824013/";
    $fileLines     = explode(":", $fileLine, 2);

    return '[Github Link](' . $githubUrlPart . $fileLines[0] . "#L" . ($fileLines[1] ?? '1') . ')';
}

function getTitle(string $fileLine, array $influxMeasurementArray): string
{
    // Extract filename from file path
    $matches = [];
    if (preg_match('/([^\/]+)\.php/', $fileLine, $matches)) {
        $filename = $matches[1];
    } else {
        // Fallback for unknown files
        $filename = 'unknown';
    }

    // Clean up filename if it contains #L
    $filename = preg_replace('/#L\d+$/', '', $filename);

    // Get the measurement name
    $measurement = $influxMeasurementArray['measurement'];

    $fieldString = '';
    $fields      = $influxMeasurementArray['fields'];
    foreach ($fields as $fieldName => $fieldValue) {
        $fieldString .= ' - ' . $fieldName;
    }

    $reasonString = '';
    if (array_key_exists('reason', $influxMeasurementArray['tags'])) {
        $reasonString = ' - reason: ' . $influxMeasurementArray['tags']['reason'];
    }

    // Don't duplicate field information in title
    return $filename . ': ' . $measurement . $fieldString . $reasonString;
}

function getFiltersAndTags(array $influxMeasurementArray): string
{
    $string = '';
    foreach ($influxMeasurementArray['tags'] as $fieldName => $fieldValue) {
        if (! empty($fieldValue)) {
            $string .= ' - ' . $fieldName . ': ' . $fieldValue . ' <br> ';
        }
    }

    // If no tags were found but we have fields, use the first field as a tag
    if (empty($string) && ! empty($influxMeasurementArray['fields'])) {
        $firstField = array_slice($influxMeasurementArray['fields'], 0, 1);
        $string     = ' - ' . key($firstField) . ': ' . current($firstField) . ' <br> ';
    }

    return $string;
}

function parseArrayString(string $str): array
{
    // First, extract the measurement name
    if (! preg_match("/'([^']+)'\s*,/s", $str, $measurementMatch)) {
        errorPrint("Invalid measurement format: measurement name not found");
        errorPrint("String being parsed: " . substr($str, 0, 500) . (strlen($str) > 500 ? '...' : ''));
        throw new InvalidArgumentException("Invalid measurement format: measurement name not found");
    }
    $measurement = $measurementMatch[1];

    // Initialize arrays
    $fields = [];
    $tags   = [];

    // Try to extract arrays from the string
    $arrayStartPos = strpos($str, '[');
    if ($arrayStartPos === false) {
        errorPrint("Invalid array string format: no arrays found");
        errorPrint("String being parsed: " . substr($str, 0, 500) . (strlen($str) > 500 ? '...' : ''));

        return [
                'measurement' => $measurement,
                'fields'      => $fields,
                'tags'        => $tags,
        ];
    }

    // Try to find the first array (fields)
    $arrayEndPos = strpos($str, ']', $arrayStartPos);
    if ($arrayEndPos !== false) {
        $fieldsStr = substr($str, $arrayStartPos + 1, $arrayEndPos - $arrayStartPos - 1);
        preg_match_all("/\s*'([^']+)'\s*=>\s*([^,]+)/", $fieldsStr, $matches, PREG_SET_ORDER);
        foreach ($matches as $match) {
            $key   = $match[1];
            $value = trim($match[2]);
            if (! empty($value)) {
                $fields[$key] = $value;
            }
        }

        // Try to find a second array (tags)
        $secondArrayStartPos = strpos($str, '[', $arrayEndPos);
        if ($secondArrayStartPos !== false) {
            $secondArrayEndPos = strpos($str, ']', $secondArrayStartPos);
            if ($secondArrayEndPos !== false) {
                $tagsStr = substr($str, $secondArrayStartPos + 1, $secondArrayEndPos - $secondArrayStartPos - 1);
                preg_match_all("/\s*'([^']+)'\s*=>\s*([^,]+)/", $tagsStr, $matches, PREG_SET_ORDER);
                foreach ($matches as $match) {
                    $key   = $match[1];
                    $value = trim($match[2]);
                    if (! empty($value)) {
                        $tags[$key] = $value;
                    }
                }
            }
        }
    }

    return [
            'measurement' => $measurement,
            'fields'      => $fields,
            'tags'        => $tags,
    ];
}

function processMeasurementBlock(string $fileLine, string $measurementString): array
{
    // Clean measurement string by removing anything after | or line breaks
    $cleanString = preg_replace('/\s*\|\s*.*$/', '', $measurementString);
    $cleanString = preg_replace('/\.\.\.$/', '', $cleanString);

    // First, extract the measurement name
    if (! preg_match("/'([^']+)'\s*,/s", $cleanString, $measurementMatch)) {
        errorPrint("Invalid measurement format: measurement name not found");
        errorPrint("Measurement content: " . substr($cleanString, 0, 500) . (strlen($cleanString) > 500 ? '...' : ''));
        throw new InvalidArgumentException("Invalid measurement format: measurement name not found");
    }
    $measurement = $measurementMatch[1];

    // Try to parse the arrays
    try {
        $measurementArray = parseArrayString($cleanString);

        // If we have no fields but have the measurement name, create a default field
        if (empty($measurementArray['fields']) && strpos($measurement, '_') !== false) {
            $action                     = substr($measurement, strrpos($measurement, '_') + 1);
            $measurementArray['fields'] = [$action => '1'];
        }

        return [
                'permalink'    => getPermaLink($fileLine),
                'title'        => getTitle($fileLine, $measurementArray),
                'filters/tags' => getFiltersAndTags($measurementArray),
        ];
    } catch (Exception $e) {
        errorPrint("Error processing measurement at " . $fileLine . ": " . $e->getMessage());
        errorPrint("Measurement content: " . substr($cleanString, 0, 500) . (strlen($cleanString) > 500 ? '...' : ''));
        throw $e;
    }
}

function formatAsTableRows(array $data): array
{
    $result = [];

    foreach ($data as $item) {
        $result[] = sprintf(
                '| <a id="MT-0" href="#MT-0"> MT-0 </a> | %s | Event | InfluxDB | %s | %s |',
                $item['title'],
                $item['permalink'],
                $item['filters/tags']
        );
    }

    return $result;
}

function processRipgrepOutput(string $input): array
{
    // Remove ANSI color codes and debug messages
    $input = preg_replace('/\x1B\[[0-9;]*[a-zA-Z]/', '', $input);
    $input = preg_replace('/DEBUG:.*$/', '', $input);
    $input = preg_replace('/\.\.\.$/', '', $input);

    // Split by file:line prefixes to handle multiline matches
    $lines              = explode("\n", trim($input));
    $result             = [];
    $currentFileLine    = '';
    $currentMeasurement = '';
    $validMeasurements  = 0;

    foreach ($lines as $line) {
        $line = rtrim($line);

        if (empty($line)) {
            continue;
        }

        // Skip debug messages and incomplete lines
        if (strpos($line, 'DEBUG:') !== false || strpos($line, '...') !== false) {
            continue;
        }

        // Check for new file:line prefix
        if (preg_match('/^(.*:\d+):InfluxDb::writeMeasurement/', $line)) {
            // If we have a current measurement, process it
            if ($currentMeasurement !== '') {
                try {
                    $result[] = processMeasurementBlock($currentFileLine, $currentMeasurement);
                    $validMeasurements++;
                } catch (Exception $e) {
                    // Skip invalid measurements but continue processing
                }
            }

            // Start new measurement
            $parts = explode(':', $line, 3);
            if (count($parts) >= 3) {
                $currentFileLine    = $parts[0] . ':' . $parts[1];
                $currentMeasurement = trim($parts[2]);
            } else {
                $currentFileLine    = 'unknown:0';
                $currentMeasurement = $line;
            }
        } // If we're in a measurement, keep adding lines
        else {
            // Remove any leading file:line prefix if it exists
            if (preg_match('/^.*:\d+:/', $line)) {
                $line = preg_replace('/^.*:\d+:/', '', $line);
            }

            // If we're starting a new measurement with a different file, process the current one
            if (preg_match('/^InfluxDb::writeMeasurement/', $line) && $currentMeasurement !== '') {
                try {
                    $result[] = processMeasurementBlock($currentFileLine, $currentMeasurement);
                    $validMeasurements++;
                } catch (Exception $e) {
                    // Skip invalid measurements but continue processing
                }
                $currentMeasurement = $line;
            } else {
                $currentMeasurement .= "\n" . $line;
            }
        }
    }

    // Process the last measurement if it exists
    if ($currentMeasurement !== '') {
        try {
            $result[] = processMeasurementBlock($currentFileLine, $currentMeasurement);
            $validMeasurements++;
        } catch (Exception $e) {
            // Skip invalid measurements
        }
    }

    return formatAsTableRows($result);
}

// Get input from command line arguments or standard input
$input = '';
if ($argc > 1) {
    $input = $argv[1];
} else {
    $input = file_get_contents('php://stdin');
}

// Process the input
try {
    $processedData = processRipgrepOutput($input);
    foreach ($processedData as $row) {
        echo $row . PHP_EOL;
    }
} catch (Exception $e) {
    errorPrint("Fatal error: " . $e->getMessage());
    exit(1);
}

How the Script Works

Parsing Process

The script follows this parsing logic:

  1. Input Processing: Receives ripgrep output via stdin or command-line argument
  2. ANSI Cleanup: Removes color codes and debug messages
  3. Line-by-Line Parsing:
    • Identifies new measurement blocks by detecting file:line:InfluxDb::writeMeasurement patterns
    • Accumulates multiline measurement calls
    • Handles edge cases like truncated output
  4. Data Extraction:
    • Parses measurement names from the first parameter
    • Extracts fields array (metrics to track)
    • Extracts tags array (metadata/dimensions)
  5. Formatting:
    • Generates GitHub permalinks using file paths and line numbers
    • Creates descriptive titles from filename and measurement name
    • Formats output as Markdown table rows

Key Functions

  • getPermaLink() - Constructs GitHub URLs pointing to exact source code lines
  • getTitle() - Generates human-readable metric titles
  • getFiltersAndTags() - Extracts and formats metric dimensions
  • parseArrayString() - Parses PHP array syntax from measurement calls
  • processMeasurementBlock() - Orchestrates parsing of individual metrics
  • processRipgrepOutput() - Main processing loop handling multiline matches

This script was used to generate the comprehensive metrics tables in: - Branch: add-checkout-metrics - File: docs/features/checkout-process/current-situation.md

The generated documentation includes over 100 InfluxDB metrics tracking various aspects of the checkout process, user authentication, contact forms, and payment flows.