Project Overview

This project features two powerful scripts for handling FIX (Financial Information eXchange) protocol messages. The first is a Python script, import_fix_to_db.py, designed to dynamically analyze FIX log files, create a corresponding MySQL database schema, and populate it with the message data. The second is a Bash script, fixfindreplace.sh, which allows for flexible modification of FIX messages, including tag value replacement and the insertion of new tags, while ensuring checksum validity.

Python Script: import_fix_to_db.py - Dynamic FIX to MySQL ETL

This Python script, named import_fix_to_db.py, implements a two-pass approach to process FIX log files and load the data into a MySQL database. The dynamic nature of the script allows it to adapt to different FIX message structures without prior knowledge of all possible tags.

Key Features:

  • Dynamic Schema Generation:
    • Pass 1 (Tag Discovery): Reads the FIX log file to identify all unique tags present in 'New Order - Single' (MsgType=D) and specific 'Execution Report' (MsgType=8 with OrdStatus=1 or 2) messages.
    • Pass 2 (Schema Creation & Data Load): Based on the discovered tags, the script dynamically creates two tables in MySQL: Orders and Executions. Each unique tag becomes a column in the respective table, with an automatically determined SQL data type. Indexes are strategically created for commonly queried tags (e.g., ClOrdID, OrderID, Symbol).
  • Intelligent Data Type Mapping: Includes a mapping of common FIX tags to appropriate SQL data types (e.g., timestamps to DATETIME, quantities/prices to DECIMAL, IDs to VARCHAR). A default VARCHAR(255) is used for unmapped tags.
  • Data Sanitization: Sanitizes tag numbers to ensure they are valid MySQL column names.
  • Robust Parsing: Efficiently parses FIX messages, handling potential variations in log formats.
  • Data Type Conversion: Attempts to convert extracted tag values to their corresponding Python types based on the determined SQL type before insertion.
  • Batch Processing: Inserts data into MySQL in batches (configurable BATCH_SIZE) to improve performance and manage transaction sizes.
  • Error Handling and Logging: Includes comprehensive logging of the process, including warnings for unparseable data and errors during database operations. Failed batch inserts are rolled back.
  • Idempotent Table Creation: Drops existing Orders and Executions tables before recreating them, ensuring a clean schema on each run (note: this will erase existing data).

Database Schema:

The script creates two tables:

  • Orders: Stores information from 'New Order - Single' messages (MsgType=D). Columns include core fields and dynamically added columns for each unique tag found in these messages.
  • Executions: Stores details from 'Execution Report' messages (MsgType=8 with OrdStatus=1 or 2). Columns include core fields and dynamically added columns for each unique tag found in these messages.

Bash Script: fixfindreplace.sh - FIX Message Manipulation

The fixfindreplace.sh Bash script provides a versatile command-line tool for modifying FIX messages. It allows users to replace existing tag values and insert new tags into FIX messages, while automatically recalculating the crucial checksum (tag 10) to maintain message validity.

Key Features:

  • Tag Value Replacement: Enables the replacement of specific tag values by providing the tag number, the old value, and the new value (e.g., 55=MSFT=CSCO to change the security symbol).
  • Tag Insertion: Allows the addition of new tags with specified values (e.g., +43=Y to add a new flag). New tags are inserted before the checksum.
  • Checksum Recalculation: Automatically recalculates the FIX checksum (tag 10) after any modifications to ensure message integrity.
  • SOH Delimiter Handling: Correctly handles the Standard-On-Header (SOH) delimiter (\x01) that separates FIX message fields.
  • Verbose Mode: Includes a -v or --verbose option to display processing details, such as replacements and insertions performed.
  • Clear Usage Instructions: Provides a helpful message with usage examples when the script is run with -h or --help, or with invalid arguments.
  • Input via Pipe: Designed to accept FIX messages via standard input (piping), making it easy to integrate with other command-line tools.

Example Usage:


# Replace MSFT with CSCO in tag 55 and insert tag 43=Y from a log file:
cat fix.log | ./fixfindreplace.sh 55=MSFT=CSCO +43=Y

# Multiple replacements and insertions:
cat another_fix.log | ./fixfindreplace.sh 38=100=200 54=1=2 +58="Custom Text"

# View processing details:
cat test.log | ./fixfindreplace.sh -v 11=OLD_ID=NEW_ID

Demonstrating SQL Queries

Once the FIX data is loaded into the MySQL database using the Python script, it becomes readily available for analysis using SQL. Below are examples of common and powerful SQL queries that can be used to explore and gain insights from the data.

Join Operations:

Demonstrating how to combine data from the Orders and Executions tables based on common fields.

INNER JOIN:

Retrieve all orders that have at least one corresponding execution:


SELECT
    o.tag_11 AS ClOrdID,
    o.tag_55 AS Symbol,
    e.tag_17 AS ExecID,
    e.tag_39 AS OrdStatus
FROM
    Orders o
INNER JOIN
    Executions e ON o.tag_37 = e.tag_37; -- Assuming tag_37 is the OrderID

LEFT JOIN:

Retrieve all orders and their associated executions (if any). Orders without executions will have NULL values for execution-related columns:


SELECT
    o.tag_11 AS ClOrdID,
    o.tag_55 AS Symbol,
    e.tag_17 AS ExecID,
    e.tag_39 AS OrdStatus
FROM
    Orders o
LEFT JOIN
    Executions e ON o.tag_37 = e.tag_37;

RIGHT JOIN:

Retrieve all executions and their associated orders (if any). Executions without matching orders will have NULL values for order-related columns:


SELECT
    o.tag_11 AS ClOrdID,
    o.tag_55 AS Symbol,
    e.tag_17 AS ExecID,
    e.tag_39 AS OrdStatus
FROM
    Orders o
RIGHT JOIN
    Executions e ON o.tag_37 = e.tag_37;

OUTER JOIN (Simulated with UNION):

MySQL does not have a direct FULL OUTER JOIN syntax, but it can be simulated using UNION of LEFT JOIN and RIGHT JOIN:


SELECT
    o.tag_11 AS ClOrdID,
    o.tag_55 AS Symbol,
    e.tag_17 AS ExecID,
    e.tag_39 AS OrdStatus
FROM
    Orders o
LEFT JOIN
    Executions e ON o.tag_37 = e.tag_37
UNION
SELECT
    o.tag_11 AS ClOrdID,
    o.tag_55 AS Symbol,
    e.tag_17 AS ExecID,
    e.tag_39 AS OrdStatus
FROM
    Orders o
RIGHT JOIN
    Executions e ON o.tag_37 = e.tag_37
WHERE o.tag_37 IS NULL; -- Avoid duplicates from the LEFT JOIN part

SELF JOIN:

Illustrating how to join a table with itself. For example, finding orders with the same ClOrdID but potentially different execution status (this might require more specific use-cases based on your data):


SELECT
    o1.tag_11 AS ClOrdID,
    o1.tag_37 AS OrderID1,
    o2.tag_37 AS OrderID2
FROM
    Orders o1
INNER JOIN
    Orders o2 ON o1.tag_11 = o2.tag_11 AND o1.id != o2.id; -- Find orders with the same ClOrdID but different internal IDs

These examples demonstrate the power of SQL joins in relating and analyzing the order and execution data loaded by your Python script.

Video Showcase

Watch demonstrations of my scripts in action:

Clicking these buttons will open a new tab to view the video.