Split line by \n to generate html diff

Previous implementation str.splitlines(keepends=True) was splitting on \n, \r\n, \r, U+2028 LINE SEPARATOR, U+2029, etc. patch library reads with readline() on \n, causing inconsistency in diff parsing and thus throwing error

Observed 25 events in error sink (for ~3 days worth of data), all for this reason. Spot checked a few of these events with this fix, and the diff-apply loop was correctly applied.

Cause

  1. make_unified_diff used str.splitlines(keepends=True).
  2. splitlines() breaks on any Unicode line break: \n\r\n, bare \r, U+2028 LINE SEPARATOR, U+2029, etc.
  3. difflib.unified_diff then treats each piece as a line and fills @@ -a,b +c,d @@ so a / c match that line count.
  4. Serialized diff rows are still one record per \n (what readline() sees in python-patch).
  5. If the only separator between two “lines” for splitlines is e.g. U+2028, they become two logical lines for difflib but one physical \n-terminated row in the file → fewer rows than the hunk header says → the next @@ appears “too early” → parser error (“invalid unified diff format”).

So the diff was internally inconsistent: correct for difflib’s line model, wrong for LF-row-based parsers.

Fix

  • make_unified_diff now explicitly splits on \n instead of splitlines().
  • Adds the newline back at the end of each line to preserve original string
  • As before, adds a \n at the end of the whole string.

Then one LF-terminated row in HTML ⇒ one line for difflib ⇒ hunk counts match what python-patch reads.

Tests

  • New test: test_unified_diff_round_trip_with_line_separator_u2028 — text with U+2028 inside a row must round-trip through make_unified_diff + apply_unified_diff.
  • Existing test_diffs cases still pass.

Bug: T419969

Merge request reports

Loading