diff --git a/mingen/near_miss.py b/mingen/near_miss.py index 2a9c5fa..099a4b5 100755 --- a/mingen/near_miss.py +++ b/mingen/near_miss.py @@ -38,13 +38,13 @@ def generate_wugs(rules): print(f'{len(rimes)} rimes') # Irregular rules - change = 'ɪ -> ʌ' + #change = 'ɪ -> ʌ' #change = 'a ɪ -> o' #change = 'i -> ɛ' #change = 'ɪ -> ɑ' #change = 'e -> o' #change = 'e -> ʊ' - #change = 'i p -> ɛ p t' + change = 'i p -> ɛ p t' A, B = change.split(' -> ') # xxx handle zeros rules = rules[(rules['rule'].str.contains(f'^{change} /'))] rules = rules \ diff --git a/sigmorphon2021_paper/__latexindent_temp.tex b/sigmorphon2021_paper/__latexindent_temp.tex index 8b4d726..62438eb 100644 --- a/sigmorphon2021_paper/__latexindent_temp.tex +++ b/sigmorphon2021_paper/__latexindent_temp.tex @@ -249,7 +249,7 @@ \subsection{Extensions} \subsection{Near misses} -As the organizers of the shared task have emphasized, implemented models can be used not only to predict the results of experiments but also to generate stimuli. Ideally, stimulus items would be designed to test the core tenets of a single model or to probe systematic differences in prediction among models. As part of our implementation, we have developed an automatic method of selecting wug items to investigate the main concern about minimal generalization: namely, that by learning rules in a strictly bottom-up way it will \emph{undergeneralize}, predicting sharp contrasts in inflectional behavior on the basis of slight differences in form. +As the organizers of the shared task have emphasized, implemented models can be used not only to predict the results of experiments but also to generate stimuli. Ideally, stimulus items would be designed to test the core tenets of a single model or to probe systematic differences in prediction among models. As part of our implementation, we have developed an automatic method of selecting wug items to investigate a main concern about minimal generalization: namely, that by learning rules in a strictly bottom-up way it will \emph{undergeneralize}, predicting sharp contrasts in inflectional behavior on the basis of slight differences in form. We illustrate our method with the English irregular pattern \textipa{I} $\to$ \textipa{2}, which attracted new members in the history of English and has elicited relatively high production rates and acceptability ratings in previous wug tests \citep[\emph{e.g.},][]{bybee1983, albright2003}. We extracted all of the onsets and rimes that appear in the bare forms of monosyllabic English verbs and freely combined them to create a large pool of possible stimulus items. We eliminated items that are real verbs, then shrunk the pool to those items that are one (segmental) edit away from some existing irregular verb that undergoes \textipa{I} $\to$ \textipa{2}. We further required each item to share its rime with at least one such irregular verb.\footnote{Studies of English irregular verbs have focused primarily on vowels and codas of monosyllables, though see \citet{bybee1983} on the potential role of onsets.} All of the wugs in the final pool are highly similar, in this sense, to existing irregulars. @@ -257,7 +257,7 @@ \subsection{Near misses} Some of the potential hits and near misses are minimal pairs. For example, \textipa{/lIN/} ($.67$) and \textipa{/SIN/} ($.61$) could potentially undergo \textipa{I} $\to$ \textipa{2} rules with the indicated confidence values. But \textipa{/fIN/} and \textipa{/vIN/} are ineligible for the change according to the model (because no existing irregular verb of this type has a non-coronal fricative immediately before the vowel). Other differences in the onset can also dramatically affect the model's predictions: \textipa{/T\*rINk/} ($.88$) and \textipa{/glIN/} ($.67$) are potential hits but \textipa{/smINk/} and \textipa{/smIN/} are near misses. The second two are phonotactically challenged \citep{davis-1989-cross}, but are \textipa{/T\*r2Nk/} and \textipa{/gl2N/} far superior to \textipa{/sm2Nk/} and \textipa{/sm2N/} when the phonotactic acceptability of their bare forms is factored out? -The same procedure can be applied to any irregular (or indeed regular) change. For \textipa{i} $\to$ \textipa{Ept} (as in \emph{sleep} $\sim$ \emph{slept}), we find that the potential hits include \textipa{/gip/} ($.85$) and \textipa{/flip/} ($.73$, one of Albright \& Hayes's wug items) while \textipa{/fip/}, \textipa{/vip/}, \textipa{/nip/}, and \textipa{/snip/} are among the near misses. Would native English speakers rate the novel past form \textipa{/gEpt/} much higher than \textipa{/fEpt/}, as the model predicts? We look forward to future empirical tests of the minimal generalization model, along these lines and others, as part of the collective effort to find out where we are and how much further we have to go in cognitive modeling of inflection. +The same procedure can be applied to any irregular (or indeed regular) change. For \textipa{i} $\to$ \textipa{Ept} (as in \emph{sleep} $\sim$ \emph{slept}), we find that the potential hits include \textipa{/gip/} ($.85$) and \textipa{/flip/} ($.73$, one of Albright \& Hayes's wug items) while \textipa{/fip/}, \textipa{/vip/}, \textipa{/nip/}, and \textipa{/snip/} are among the near misses. Would native English speakers rate the novel past form \textipa{/gEpt/} much higher than \textipa{/fEpt/}, as the model predicts? We look forward to future empirical tests of minimal generalization, along these lines and others, as part of the collective effort to find out where we are and how much further we have to go in cognitive modeling of inflection. % "attributes of the prototype of this [irregular] class of verbs are: % a final velar nasal (/N/ better than /Nk/) diff --git a/sigmorphon2021_paper/wilsonlisigmorphon2021.fdb_latexmk b/sigmorphon2021_paper/wilsonlisigmorphon2021.fdb_latexmk index f5171c8..ac70f90 100644 --- a/sigmorphon2021_paper/wilsonlisigmorphon2021.fdb_latexmk +++ b/sigmorphon2021_paper/wilsonlisigmorphon2021.fdb_latexmk @@ -1,16 +1,16 @@ # Fdb version 3 -["bibtex wilsonlisigmorphon2021"] 1626290104 "wilsonlisigmorphon2021.aux" "wilsonlisigmorphon2021.bbl" "wilsonlisigmorphon2021" 1626293180 +["bibtex wilsonlisigmorphon2021"] 1626290104 "wilsonlisigmorphon2021.aux" "wilsonlisigmorphon2021.bbl" "wilsonlisigmorphon2021" 1626293424 "./acl_natbib.bst" 1625764624 47709 d9d0ab22329e3318a3d1b2ae6abaab62 "" "anthology.bib" 1626102047 35309921 8480dcb46e31d5b50f6b1c43f1c763ff "" - "wilsonlisigmorphon2021.aux" 1626293180 4964 3e7671715f8ed99128aa4589cf7699b5 "pdflatex" + "wilsonlisigmorphon2021.aux" 1626293423 4964 3e7671715f8ed99128aa4589cf7699b5 "pdflatex" "wilsonlisigmorphon2021.bib" 1626187925 33311 491ac25d71642dc47d3868c1e4c81e75 "" "wilsonlisigmorphon2021extra.bib" 1626191846 845 05024d5221c114497bb19c7eb1aa5b8d "" (generated) "wilsonlisigmorphon2021.blg" "wilsonlisigmorphon2021.bbl" -["pdflatex"] 1626293179 "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.tex" "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf" "wilsonlisigmorphon2021" 1626293180 - "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.aux" 1626293180 4964 3e7671715f8ed99128aa4589cf7699b5 "" - "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.tex" 1626293178 50105 66ebc0dfc9f860d55d012e401cddf9c4 "" +["pdflatex"] 1626293422 "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.tex" "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf" "wilsonlisigmorphon2021" 1626293424 + "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.aux" 1626293423 4964 3e7671715f8ed99128aa4589cf7699b5 "" + "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.tex" 1626293422 50093 c061703749efb7821ec02868e5903b8c "" "/usr/local/texlive/2019/texmf-dist/fonts/enc/dvips/base/8r.enc" 1165713224 4850 80dc9bab7f31fb78a000ccfed0e27cab "" "/usr/local/texlive/2019/texmf-dist/fonts/map/fontname/texfonts.map" 1511824771 3332 103109f5612ad95229751940c61aada0 "" "/usr/local/texlive/2019/texmf-dist/fonts/tfm/adobe/courier/pcrr8r.tfm" 1136768653 1292 bd42be2f344128bff6d35d98474adfe3 "" @@ -137,14 +137,14 @@ "/usr/local/texlive/2019/texmf-var/web2c/pdftex/pdflatex.fmt" 1557342086 4259635 562fc36ec496993777a6c428d8bfb811 "" "/usr/local/texlive/2019/texmf.cnf" 1557341546 577 d150fef99ac436ad1156e86b0892f6ef "" "acl.sty" 1625764624 11158 b6ad1bbc953e18f357974bb34039deb0 "" - "wilsonlisigmorphon2021.aux" 1626293180 4964 3e7671715f8ed99128aa4589cf7699b5 "pdflatex" + "wilsonlisigmorphon2021.aux" 1626293423 4964 3e7671715f8ed99128aa4589cf7699b5 "pdflatex" "wilsonlisigmorphon2021.bbl" 1626290111 10114 1da617a785c3f9e971b63abd0d9ab79f "bibtex wilsonlisigmorphon2021" - "wilsonlisigmorphon2021.out" 1626293179 0 d41d8cd98f00b204e9800998ecf8427e "pdflatex" - "wilsonlisigmorphon2021.tex" 1626293178 50105 66ebc0dfc9f860d55d012e401cddf9c4 "" + "wilsonlisigmorphon2021.out" 1626293423 0 d41d8cd98f00b204e9800998ecf8427e "pdflatex" + "wilsonlisigmorphon2021.tex" 1626293422 50093 c061703749efb7821ec02868e5903b8c "" (generated) - "wilsonlisigmorphon2021.pdf" - "wilsonlisigmorphon2021.out" - "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf" "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.log" - "wilsonlisigmorphon2021.aux" + "/Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf" "wilsonlisigmorphon2021.log" + "wilsonlisigmorphon2021.aux" + "wilsonlisigmorphon2021.out" + "wilsonlisigmorphon2021.pdf" diff --git a/sigmorphon2021_paper/wilsonlisigmorphon2021.log b/sigmorphon2021_paper/wilsonlisigmorphon2021.log index 21ee725..ca24802 100644 --- a/sigmorphon2021_paper/wilsonlisigmorphon2021.log +++ b/sigmorphon2021_paper/wilsonlisigmorphon2021.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019) (preloaded format=pdflatex 2019.5.8) 14 JUL 2021 13:06 +This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019) (preloaded format=pdflatex 2019.5.8) 14 JUL 2021 13:10 entering extended mode restricted \write18 enabled. file:line:error style messages enabled. @@ -492,15 +492,15 @@ Package rerunfilecheck Info: File `wilsonlisigmorphon2021.out' has not changed. Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 284. ) Here is how much of TeX's memory you used: - 11106 strings out of 492616 - 166846 string characters out of 6129482 + 11109 strings out of 492616 + 166876 string characters out of 6129482 294215 words of memory out of 5000000 14564 multiletter control sequences out of 15000+600000 - 66947 words of font info for 244 fonts, out of 8000000 for 9000 + 67172 words of font info for 246 fonts, out of 8000000 for 9000 1141 hyphenation exceptions out of 8191 43i,12n,40p,1662b,485s stack positions out of 5000i,500n,10000p,200000b,80000s {/usr/local/texlive/2019/texmf-dist/fonts/enc/dvips/base/8r.enc} -Output written on /Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf (9 pages, 253104 bytes). +Output written on /Users/colin/Code/Python/mingen/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf (9 pages, 253128 bytes). PDF statistics: 385 PDF objects out of 1000 (max. 8388607) 351 compressed objects within 4 object streams diff --git a/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf b/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf index c29979e..2c0c2ea 100644 Binary files a/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf and b/sigmorphon2021_paper/wilsonlisigmorphon2021.pdf differ diff --git a/sigmorphon2021_paper/wilsonlisigmorphon2021.synctex.gz b/sigmorphon2021_paper/wilsonlisigmorphon2021.synctex.gz index e3d5013..20cd69a 100644 Binary files a/sigmorphon2021_paper/wilsonlisigmorphon2021.synctex.gz and b/sigmorphon2021_paper/wilsonlisigmorphon2021.synctex.gz differ diff --git a/sigmorphon2021_paper/wilsonlisigmorphon2021.tex b/sigmorphon2021_paper/wilsonlisigmorphon2021.tex index 8b4d726..62438eb 100644 --- a/sigmorphon2021_paper/wilsonlisigmorphon2021.tex +++ b/sigmorphon2021_paper/wilsonlisigmorphon2021.tex @@ -249,7 +249,7 @@ \subsection{Extensions} \subsection{Near misses} -As the organizers of the shared task have emphasized, implemented models can be used not only to predict the results of experiments but also to generate stimuli. Ideally, stimulus items would be designed to test the core tenets of a single model or to probe systematic differences in prediction among models. As part of our implementation, we have developed an automatic method of selecting wug items to investigate the main concern about minimal generalization: namely, that by learning rules in a strictly bottom-up way it will \emph{undergeneralize}, predicting sharp contrasts in inflectional behavior on the basis of slight differences in form. +As the organizers of the shared task have emphasized, implemented models can be used not only to predict the results of experiments but also to generate stimuli. Ideally, stimulus items would be designed to test the core tenets of a single model or to probe systematic differences in prediction among models. As part of our implementation, we have developed an automatic method of selecting wug items to investigate a main concern about minimal generalization: namely, that by learning rules in a strictly bottom-up way it will \emph{undergeneralize}, predicting sharp contrasts in inflectional behavior on the basis of slight differences in form. We illustrate our method with the English irregular pattern \textipa{I} $\to$ \textipa{2}, which attracted new members in the history of English and has elicited relatively high production rates and acceptability ratings in previous wug tests \citep[\emph{e.g.},][]{bybee1983, albright2003}. We extracted all of the onsets and rimes that appear in the bare forms of monosyllabic English verbs and freely combined them to create a large pool of possible stimulus items. We eliminated items that are real verbs, then shrunk the pool to those items that are one (segmental) edit away from some existing irregular verb that undergoes \textipa{I} $\to$ \textipa{2}. We further required each item to share its rime with at least one such irregular verb.\footnote{Studies of English irregular verbs have focused primarily on vowels and codas of monosyllables, though see \citet{bybee1983} on the potential role of onsets.} All of the wugs in the final pool are highly similar, in this sense, to existing irregulars. @@ -257,7 +257,7 @@ \subsection{Near misses} Some of the potential hits and near misses are minimal pairs. For example, \textipa{/lIN/} ($.67$) and \textipa{/SIN/} ($.61$) could potentially undergo \textipa{I} $\to$ \textipa{2} rules with the indicated confidence values. But \textipa{/fIN/} and \textipa{/vIN/} are ineligible for the change according to the model (because no existing irregular verb of this type has a non-coronal fricative immediately before the vowel). Other differences in the onset can also dramatically affect the model's predictions: \textipa{/T\*rINk/} ($.88$) and \textipa{/glIN/} ($.67$) are potential hits but \textipa{/smINk/} and \textipa{/smIN/} are near misses. The second two are phonotactically challenged \citep{davis-1989-cross}, but are \textipa{/T\*r2Nk/} and \textipa{/gl2N/} far superior to \textipa{/sm2Nk/} and \textipa{/sm2N/} when the phonotactic acceptability of their bare forms is factored out? -The same procedure can be applied to any irregular (or indeed regular) change. For \textipa{i} $\to$ \textipa{Ept} (as in \emph{sleep} $\sim$ \emph{slept}), we find that the potential hits include \textipa{/gip/} ($.85$) and \textipa{/flip/} ($.73$, one of Albright \& Hayes's wug items) while \textipa{/fip/}, \textipa{/vip/}, \textipa{/nip/}, and \textipa{/snip/} are among the near misses. Would native English speakers rate the novel past form \textipa{/gEpt/} much higher than \textipa{/fEpt/}, as the model predicts? We look forward to future empirical tests of the minimal generalization model, along these lines and others, as part of the collective effort to find out where we are and how much further we have to go in cognitive modeling of inflection. +The same procedure can be applied to any irregular (or indeed regular) change. For \textipa{i} $\to$ \textipa{Ept} (as in \emph{sleep} $\sim$ \emph{slept}), we find that the potential hits include \textipa{/gip/} ($.85$) and \textipa{/flip/} ($.73$, one of Albright \& Hayes's wug items) while \textipa{/fip/}, \textipa{/vip/}, \textipa{/nip/}, and \textipa{/snip/} are among the near misses. Would native English speakers rate the novel past form \textipa{/gEpt/} much higher than \textipa{/fEpt/}, as the model predicts? We look forward to future empirical tests of minimal generalization, along these lines and others, as part of the collective effort to find out where we are and how much further we have to go in cognitive modeling of inflection. % "attributes of the prototype of this [irregular] class of verbs are: % a final velar nasal (/N/ better than /Nk/)