READ ME file Title of dataset: "Wannabe Approximatives: datasets" Authors: Muriel Norde (Humboldt-Universität zu Berlin), Francesca Masini (Università di Bologna); Kristel Van Goethem (Université catholique de Louvain); Daniel Ebner (Humboldt-Universität zu Berlin) Contact email: francesca.masini@unibo.it License: Creative Commons: Attribution 4.0 (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/ ------------------------ Abstract This item contains 6 datasets including annotated concordances of "wannabe" collocations in 6 different languages: Danish, Dutch, English, Finnish, French, Italian. The concordances come from TenTen corpora on SketchEngine (https://www.sketchengine.eu/), specifically: daTenTen20 for Danish; nlTenTen20 for Dutch; enTenTen20 for English; fiTenTen14 for Finnish; frTenTen20 for French; itTenTen20 for Italian. The datasets are structured in the same way, allowing for cross-linguistic comparability. The datasets constitute the underlying data of the study by Norde et al. (2025), mentioned in the References. ------------------------ Content The dataset includes the following files: • README_wannabe.txt (this file) • wannabe_Danish_datenten20_repository.csv • wannabe_Danish_datenten20_repository.xlsx • wannabe_Dutch_nltenten20_repository.csv • wannabe_Dutch_nltenten20_repository.xlsx • wannabe_English_ententen20_repository.csv • wannabe_English_ententen20_repository.xlsx • wannabe_Finnish_fitenten14_repository.csv • wannabe_Finnish_fitenten14_repository.xlsx • wannabe_French_frtenten2020_repository.csv • wannabe_French_frtenten2020_repository.xlsx • wannabe_Italian_ittenten20_repository.csv • wannabe_Italian_ittenten20_repository.xlsx ------------------------ Details Each file per language (provided in csv/Excel format) contains 500 concordance lines with examples sampled from TenTen corpora. Each concordance is annotated according to a variety of parameters that are fully described in Norde et al. (2025). The Excel/csv file contains the following information: • "Reference" = the (partial) url provided by SketchEngine • "Left" = left context of Kwic • "Kwic" = Kwic corresponding to the wannabe construction • "Right" = right context of Kwic • "Cxn" = the type of construction; possible values: "collocation", "derivation", "embedded collocation", "predicative", "wannabe_ADJ", "wannabe_ADV", "wannabe_N", "wannabe_V" • "Bonding" = the type of bonding; possible values: "//" (NA), "bound", "free", "hyphen" • "Head" = the head of the wannabe construction (if wannabe is not the head) • "PoShead" = the part of speech of the head; possible values: "//" (NA), "ADJ", "N", "N-prop", "NP", "PRO" • "Order" = the position of wannabe w.r.t. the head; possible values: "//" (NA), "wannabe-X", "X-wannabe", "X-wannabe-X" • "Inflection" = the presence/type of inflection; possible values: "no", "yes_EN", "yes_native" • "Semcat" = the semantic/ontological class of the head; possible values: "//" (NA), "animate", "human", "inanimate" CSV files use the Western Europe (Windows-1252) character set. The field separator is semicolon (;) and the string delimiter is double quote ("). ------------------------ References Muriel Norde, Francesca Masini, Kristel Van Goethem & Daniel Ebner. 2025. Wannabe Approximatives. Creativity, Routinization or Both? In Sabine Arndt-Lappe & Natalia Filatkina (eds.), Dynamics at the Lexicon-Syntax Interface. Creativity and Routine in Word-Formation and Multi-Word Expressions. De Gruyter Mouton.