Jessica Cristina Menghi Sartorio (Department of Enterprise Engineering “Mario Lucertini”, University of Rome “Tor Vergata”)

Eugenio Bortolini (Department of Cultural Heritage, University of Bologna; Department of Archaeology and Anthropology, Institució Milà i Fontanals de Investigación en Humanidades, CSIC)

Gregorio Oxilia (Department of Cultural Heritage, University of Bologna)


DESCRIPTION

The present digital archive is the outcome of the paper: Oxilia, G., Bortolini, E., Marciani G., Menghi Sartorio JC., . et al. Direct evidence that late Neanderthal occupation precedes a technological shift in southwestern Italy, American Journal of Biological Anthropology.

During the Middle to Upper Palaeolithic transition (50,000 and 40,000 years ago), interaction between Neanderthals and Homo sapiens varied across Europe. In southern Italy, the association between Homo sapiens fossils and non-Mousterian material culture, as well as the mode and tempo of Neanderthal demise, are still vividly debated. This work presents two lower deciduous molars uncovered at Roccia San Sebastiano (Mondragone-Caserta, Italy), stratigraphically associated with Mousterian (RSS1) and Uluzzian (RSS2) artefacts. Using virtual morphometric methods and supervised learning algorithms we show that RSS1, whose Mousterian context appears more recent than 44,800-44,230 cal BP, can be attributed to a Neanderthal, while RSS2, found in an Uluzzian context that we dated to 42,640-42,380 cal BP, is attributed to Homo sapiens. This site therefore yields the most recent direct evidence for a Neanderthal presence in southern Italy and confirms a later shift to Early Upper Palaeolithic technology in southwestern Italy compared to the earliest Uluzzian evidence at Grotta del Cavallo (Puglia, Italy).

The present repository contains R codes for running the analyses on cervical and crown outlines presented in the paper including dataset generation, Generalised Procrustes Analysis, Principal Component Analysis, Permutational Multivariate ANOVA, and probabilistic taxonomic attribution through supervised learning algorithms. The latter comprise:

  1. Flexible Discriminant Analysis (FDA), a flexible extension of Linear Discriminant Analysis (LDA) that uses non-linear combinations of predictors allowing for a low misclassification error when modelling non-linear, non-normal, and non-homogeneous data;

  2. MultiAdaptive Regression Splines (MARS; Friedman 1991, Hastie 2017). This algorithm identifies the value intervals that best discriminate between groups by iteratively running linear regressions for each group and finding the predictor points that minimize within-group total error (knots). These points are then used to link individual linear functions into the final model (Hastie et al., 1994, 2009). Control for overfitting is obtained using generalized cross-validation (GCV), a stepwise process which assesses the ratio between the goodness of fit of the model and the number of parameters.

  3. a Random Forest (RF) classifier (Liaw & Wiener, 2002) which uses recursive binary splitting to grow classification trees carrying out a multiple sampling with replacement at each node and choosing the most commonly occurring model among all predictions based on the sampled subsets (SOM S2).

Results obtained with all methods (FDA, MARS, and RF) are validated through a repeated 10-fold cross validation.


NB: the present digital archive does not include raw data on cervical and crown outlines as they are currently unpublished and are kindly provided by different stakeholders. They will be published separately and independently in forthcoming publications. Therefore, at present, results of our paper cannot be fully reproduced. Nevertheless, we are making the full code available to facilitate replication/development of the methods we used at the best of our current possibilities.


CONTENTS

Main folder “DataTemplates_and_Codes_Mondragone”
  • Commented R Script providing all codes to reproduce all analyses and figures contained in the paper (code.R)

  • The present README text both in .html and .Rmd format

Folder “DataTemplates”
  • A .txt file providing the list of individual samples labels paired with the respective species (label_specie.txt)

  • A .txt file providing the template for the list of cervical outline coordinates to be used in Generalised Procrustes Analysis and following analyses (Lower_dm2_cervix_outline.txt)

  • A .txt file providing the template for the list of crown outline coordinates to be used in Generalised Procrustes Analysis and following analyses (Lower_dm2_cervix_outline.txt)

  • A .txt file providing the list of individual sample labels used in the analysis of cervical outlines (Lower_dm2_label.Cervix.txt)

  • A .txt file providing the list of individual sample labels used in the analysis of crown outlines (Lower_dm2_label.Crown.txt)

  • A .txt file providing the list of species labels attributed to each sample, in the same order as sample lables, for cervical data (Lower_dm2_specie.Cervix.txt)

  • A .txt file providing the list of species labels attributed to each sample, in the same order as sample lables, for crown data (Lower_dm2_specie.Crown.txt)


Licences

Code: MIT (https://choosealicense.com/licenses/mit/ year: 2022, copyright holder: Jessica Cristina Menghi Sartorio, Eugenio Bortolini, and Gregorio Oxilia.


Dependencies

R version 4.1.0 (2021-05-18)

Packages * rgl (v.0.106.8) * shapes (v.1.2.6) * tripack (v.1.3.9.1) * MASS (v.7.3.54) * lmtest (v.0.9.38) * ape (v.5.5) * ade4 (v.1.7.16) * pls (v.2.7.3) * Morpho (v.2.8) * geomorph (v.4.0.0) * geometry (v.0.4.5) * car (v.3.0.10) * grDevices (v.4.1.0) * factoextra (v.1.0.7) * vegan (v.2.5.7) * randomForest (v.4.6.14) * mda (v.0.5.2) * RColorBrewer (v.1.1.2) * caret (v.6.0.88) * earth (v.5.3.0) * RVAideMemoire (v.0.9.79)


References

Adams, D. C., M. L. Collyer, A. Kaliontzopoulou, and E.K. Baken. 2021. Geomorph: Software for geometric morphometric analyses. R package version 4.0.2. https://cran.r-project.org/package=geomorph.

Baken, E. K., M. L. Collyer, A. Kaliontzopoulou, and D. C. Adams. 2021. geomorph v4.0 and gmShiny: enhanced analytics and a new graphical interface for a comprehensive morphometric experience. Methods in Ecology and Evolution. Methods in Ecology and Evolution. 12:2355-2363.

Collyer, M. L. and D. C. Adams. 2021. RRPP: Linear Model Evaluation with Randomized Residuals in a Permutation Procedure. https://cran.r-project.org/web/packages/RRPP

Collyer, M. L. and D. C. Adams. 2018. RRPP: RRPP: An R package for fitting linear models to high‐dimensional data using residual randomization. Methods in Ecology and Evolution. 9(2): 1772-1779.

Bougeard S, Dray S (2018). “Supervised Multiblock Analysis in R with the ade4 Package.” Journal of Statistical Software, 86(1), 1-17. doi: 10.18637/jss.v086.i01 (URL: https://doi.org/10.18637/jss.v086.i01).

Chessel D, Dufour A, Thioulouse J (2004). “The ade4 Package - I: One-Table Methods.” R News, 4(1), 5-10. <URL: https://cran.r-project.org/doc/Rnews/>.

Dray S, Dufour A (2007). “The ade4 Package: Implementing the Duality Diagram for Ecologists.” Journal of Statistical Software, 22(4), 1-20. doi: 10.18637/jss.v022.i04 (URL: https://doi.org/10.18637/jss.v022.i04).

Dray S, Dufour A, Chessel D (2007). “The ade4 Package - II: Two-Table and K-Table Methods.” R News, 7(2), 47-52. <URL: https://cran.r-project.org/doc/Rnews/>.

Ian L. Dryden (2021). shapes: Statistical Shape Analysis. R package version 1.2.6. https://CRAN.R-project.org/package=shapes

John Fox and Sanford Weisberg (2019). An {R} Companion to Applied Regression, Third Edition. Thousand Oaks CA: Sage. URL: https://socialsciences.mcmaster.ca/jfox/Books/Companion/

Kai Habel, Raoul Grasman, Robert B. Gramacy, Pavlo Mozharovskyi and David C. Sterratt (2019). geometry: Mesh Generation and Surface Tessellation. R package version 0.4.5. https://CRAN.R-project.org/package=geometry

Herve, M. 2020. “Aide-memoire de statistique appliquee a la biologie - Construire son etude et analyser les resutats a l’aide du logiciel R” (available on CRAN)

Alboukadel Kassambara and Fabian Mundt (2020). factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R package version 1.0.7. https://CRAN.R-project.org/package=factoextra

Max Kuhn (2022). caret: Classification and Regression Training. R package version 6.0-91. https://CRAN.R-project.org/package=caret

S original by Trevor Hastie & Robert Tibshirani. Original R port by Friedrich Leisch, Kurt Hornik and Brian D. Ripley. Balasubramanian Narasimhan has contributed to the upgrading of the code. (2020). mda: Mixture and Flexible Discriminant Analysis. R package version 0.5-2. https://CRAN.R-project.org/package=mda

Kristian Hovde Liland, Bjørn-Helge Mevik and Ron Wehrens (2021). pls: Partial Least Squares and Principal Component Regression. R package version 2.8-0. https://CRAN.R-project.org/package=pls

Stephen Milborrow. Derived from mda:mars by Trevor Hastie and Rob Tibshirani. Uses Alan Miller’s Fortran utilities with Thomas Lumley’s leaps wrapper. (2021). earth: Multivariate Adaptive Regression Splines. R package version 5.3.1. https://CRAN.R-project.org/package=earth

Duncan Murdoch and Daniel Adler (2021). rgl: 3D Visualization Using OpenGL. R package version 0.108.3. https://CRAN.R-project.org/package=rgl

Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. https://CRAN.R-project.org/package=RColorBrewer

Jari Oksanen, F. Guillaume Blanchet, Michael Friendly, Roeland Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, R. B. O’Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens, Eduard Szoecs and Helene Wagner (2020). vegan: Community Ecology Package. R package version 2.5-7. https://CRAN.R-project.org/package=vegan

Paradis E. & Schliep K. 2019. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526-528

Fortran code by R. J. Renka. R functions by Albrecht Gebhardt. With contributions from Stephen Eglen , Sergei Zuyev and Denis White (2020). tripack: Triangulation of Irregularly Spaced Data. R package version 1.3-9.1. https://CRAN.R-project.org/package=tripack

Schlager S (2017). “Morpho and Rvcg - Shape Analysis in R.” In Zheng G, Li S, Szekely G (eds.), Statistical Shape and Deformation Analysis, 217-256. Academic Press. ISBN 9780128104934.

Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0

Thioulouse J, Dray S, Dufour A, Siberchicot A, Jombart T, Pavoine S (2018). Multivariate Analysis of Ecological Data with ade4. Springer. doi: 10.1007/978-1-4939-8850-1 (URL: https://doi.org/10.1007/978-1-4939-8850-1).

Achim Zeileis, Torsten Hothorn (2002). Diagnostic Checking in Regression Relationships. R News 2(3), 7-10. URL https://CRAN.R-project.org/doc/Rnews/