Skip to content

Fix subtle bug on image_suggestions when resolving varprop.

Xcollazo requested to merge hotfix-fix-bad-coalesce into main

Adding discussion with @joal of this bug for completeness and future reference:

@xcollazo : Ok I just found this bug on one of my Airflow DAGs, and need another pair of eyes because I can’t believe my own.

(1) I defined the following using a VariableProperty: coalesce_partitions = var_props.get('coalesce_partitions', 6), Notice the extra , at the end, which I added inadvertently.

(2) coalesce_partitions is being used in a python function that formats a string like so:

def generate_spark_to_cassandra_insert_query(cassandra_table, coalesce_num, hive_columns, hive_db, hive_table, week):
    return "\"" + (
        f"INSERT INTO aqs.image_suggestions.{cassandra_table} "
        f"SELECT /*+ COALESCE({coalesce_num}) */ {hive_columns} "
        f"FROM {hive_db}.{hive_table} "
        f"WHERE snapshot='{week}'"
    ) + "\""

(3) And for some strange reason, it gets rendered like this:

== SQL ==
INSERT INTO aqs.image_suggestions.suggestions SELECT /*+ COALESCE((6,)) */ wiki,page_id,id,image,origin_wiki,confidence,found_on,kind,page_rev FROM ...
---------------------------------------------------------------------^^^

My question is: why does Python allow compiling of (1), and why does (2) results in (3) ?

... lamenting non-static languages etc... ...

@joal:

Ok I think I get it In python, writing

a = 6,

leads to the variable a containing a tuple of one element. Proof:

type(a)
<class 'tuple'>

This then gets rendered as (6,) in string:

print(f"test {a}")
test (6,)

Merge request reports