Skip to content

Spark: Redesign API and remove session timeouts

Neil Shah-Quinn (WMF) requested to merge spark_api into main

Created by: nshahquinn

  • Breaking change: the get_session and get_custom_session functions have been renamed to create_session and create_custom_sesson. They will now stop any existing session before creating the new session. This means that the returned session will always reflect the passed settings; previously, the settings were silently ignored if a session already existed. Use the new get_active_session function if you want to non-destructively retrieve the active session.
  • Breaking change: the deprecated "raw" format and the non-deprecated format parameter have been removed from the run function.
  • Breaking change: the run function no longer has the ability to specify Spark settings, as the session_type and extra_settings parameters have been removed. If a session already exists, it will be used. Otherwise, a default "yarn-regular" session will be created and used.
  • Previously, in some cases, the package automatically closed sessions after 30 minutes of apparent inactivity. It no longer does this.
    • Breaking change: the get_application_id, cancel_session_timeout, stop_session, and start_session_timeout functions have been removed. These were just intended for internal use, so it's unlikely that you will have to change code.

Merge request reports