Crowdsourced Data Preprocessing with R and Amazon Mechanical Turk

Thomas J. Leeper

The R Journal: article published in 2016, volume 8:1

Crowdsourced Data Preprocessing with R and Amazon Mechanical Turk
Thomas J. Leeper , The R Journal (2016) 8:1, pages 276-288.

Abstract This article introduces the use of the Amazon Mechanical Turk (MTurk) crowdsourcing platform as a resource for R users to leverage crowdsourced human intelligence for preprocessing “messy” data into a form easily analyzed within R. The article first describes MTurk and the MTurkR package, then outlines how to use MTurkR to gather and manage crowdsourced data with MTurk using some of the package’s core functionality. Potential applications of MTurkR include construction of manually coded training sets, human transcription and translation, manual data scraping from scanned documents, content analysis, image classification, and the completion of online survey questionnaires, among others. As an example of massive data preprocessing, the article describes an image rating task involving 225 crowdsourced workers and more than 5500 images using just three MTurkR function calls.

Received: 2015-10-30; online 2016-06-13
CRAN packages: MTurkR, MTurkRGUI, tcltk, curl, XML
CRAN Task Views implied by cited CRAN packages: WebTechnologies

This article is licensed under a Creative Commons Attribution 3.0 Unported license .

@article{RJ-2016-020,
  author = {Thomas J. Leeper},
  title = {{Crowdsourced Data Preprocessing with R and Amazon Mechanical
          Turk}},
  year = {2016},
  journal = {{The R Journal}},
  doi = {10.32614/RJ-2016-020},
  url = {https://doi.org/10.32614/RJ-2016-020},
  pages = {276--288},
  volume = {8},
  number = {1}
}

Navigation

Subscribe

The R Journal: article published in 2016, volume 8:1