I’ve been using Amazon’s Mechanical Turk for some data collection and verification recently and it’s a really amazing service. I can get simple rote tasks performed on datasets pretty quickly. So I thought I’d share some of the minor hurdles I had to overcome to get the import of my “input” data to work correctly for me. I started out with an Excel worksheet and I exported from Excel to CSV. Upon trying to upload my first set of data (after I created my MT “human interface task” template), the first error message I encountered was this one:
Header columns should not be blank.
This one is pretty simple — I had a column in my worksheet that looked empty but had some spaces in it. So when Excel was creating the CSV, it was creating an empty “column” and AMT was barfing on this. Fixing this was easy — I just deleted the offending column in Excel and re-saved it as a CSV. The other error message I encountered was this one:
Could not create batch. Invalid input data on line 320. Click here to learn more about acceptable file formats.
So what’s happening here? Basically, Amazon’s Mechanical Turk barfs on special characters… perhaps, only if they aren’t properly encoded? I’m not sure if there’s another solution to this, but what I did was do a search and replace in my trusty text editor (UltraEdit FTW!) and kept re-uploading until I had replaced all the offending characters. My list of offending characters that I had to search and replace:
’ (replace with ‘)
ñ (replace with n)
“ (replace with “)
” (replace with “)
— (replace with -)