Skip to content
  • Lucas Werkmeister's avatar
    Export English messages to MediaWiki format · 966f578b
    Lucas Werkmeister authored
    This is the first step in migrating the tool’s translations to
    translatewiki.net. The py2mw function added to translations.py converts
    our custom Python syntax to MediaWiki syntax. This conversion is lossy
    in four aspects:
    
    1. the Python syntax has variable names, the MediaWiki syntax only
       numeric indices;
    2. MediaWiki has no equivalent of our !l (list) conversion;
    3. the MediaWiki syntax does not include the plural tag names (“one”,
       “many”, “other”, etc.); and
    4. the MediaWiki syntax does not include the gender markers (“m”, “f”,
       “n”).
    
    The first two aspects can be mitigated by recording the variable names
    and sets of list-type variables during the conversion and printing them
    out at the end; they will later be hard-coded:
    
        {'duplicates_warning': ['lexemes'],
         'duplicates_instructions': ['lexemes'],
         'description_with_forms_and_senses': ['description', 'forms', 'senses'],
         'edit_ambiguous_warning': ['forms'],
         'edit_unmatched_warning': ['forms'],
         'edit_form_list_item': ['form_link',
                                 'grammatical_feature_labels',
                                 'statements']}
    
        {'edit_form_list_item': {'grammatical_feature_labels'}}
    
    As for the plural tag names, MediaWiki takes them from the underlying
    CLDR data and uses their “natural” order. In Python, the Babel library
    gives us the tag names of a language as well – but it doesn’t give us
    their order: we get an unordered set of tag names. However, it turns out
    that order is consistent across all languages, as shown by this script
    which extracts the order from the MediaWiki source files:
    
        xq -r '
          .supplementalData.plurals.pluralRules
          | .[]
          | .pluralRule
          | if type == "array" then . else [.] end
          | map(."@count")
          | join(" ")
        ' languages/data/plurals*.xml \
            | while read -ra plurals; do
            for i in ${!plurals[@]}; do
                for ((j=i+1; j<${#plurals[@]}; j++)); do
                    printf '%s %s\n' "${plurals[i]}" "${plurals[j]}";
                done;
            done;
        done \
            | tsort
    
    So we can hard-code this order later and have that covered as well.
    
    Finally, the gender markers are similar to the plural tag names in that
    they’re always “m”, “f”, and “n”, in that order. (MediaWiki only three
    gender distinctions.)
    966f578b
This project is licensed under the GNU Affero General Public License v3.0. Learn more