A post on Matt Cutt’s blog reminded me to make this post. I guess I’ve been too busy eating gulab jamuns, ladus, halwa, cake and other sweets in the past week. Happy Diwali!
October 30, 2008
Nice summary of “New Business Models for News” event
I wish I had attended this event, but having missed it Chris O’Brien’s summary of the event is a good one. An excerpt (boldfacing added by me):
My main takeaways from the day were far different what I would have expected going in. I’ve been in search of new ways to generate revenue to maintain the newsrooms we have (or some version of them). But the big lesson of the day was to focus on the other side of things: Cost. There was widespread agreement across the day that cost structures of newsrooms need to be dramatically lower. But before you think I’ve become a cheerleader for the rampant corporate cost cutting plaguing us, hear me out.
Update: Jeff Jarvis, one of the organizers for the “New Business Models for News” event at CUNY, just posted his summary of the event.
October 28, 2008
Getting Amazon’s Mechanical Turk to work
I’ve been using Amazon’s Mechanical Turk for some data collection and verification recently and it’s a really amazing service. I can get simple rote tasks performed on datasets pretty quickly. So I thought I’d share some of the minor hurdles I had to overcome to get the import of my “input” data to work correctly for me. I started out with an Excel worksheet and I exported from Excel to CSV. Upon trying to upload my first set of data (after I created my MT “human interface task” template), the first error message I encountered was this one:
Header columns should not be blank.
This one is pretty simple — I had a column in my worksheet that looked empty but had some spaces in it. So when Excel was creating the CSV, it was creating an empty “column” and AMT was barfing on this. Fixing this was easy — I just deleted the offending column in Excel and re-saved it as a CSV. The other error message I encountered was this one:
Could not create batch. Invalid input data on line 320. Click here to learn more about acceptable file formats.
So what’s happening here? Basically, Amazon’s Mechanical Turk barfs on special characters… perhaps, only if they aren’t properly encoded? I’m not sure if there’s another solution to this, but what I did was do a search and replace in my trusty text editor (UltraEdit FTW!) and kept re-uploading until I had replaced all the offending characters. My list of offending characters that I had to search and replace:
’ (replace with ‘)
ñ (replace with n)
“ (replace with “)
” (replace with “)
— (replace with -)
Gmail tip: searching by the first post in a conversation
For those of you who are Gmail (or Google Apps Mail) power users, here’s a little hack that I figured out.
I use the “to:” and “from:” Gmail search parameters all the time to pinpoint an e-mail that I’m looking for. But have you ever wanted to find an e-mail that you know was the *start* of an e-mail thread (thread = what’s referred to in the official Gmail lingo as a conversation)? Well, even though Google Mail hides ‘em from you in conversation view, all reply subject lines still contain the standard “Re:” string and all forwarded e-mail subject lines contain the standard “Fwd:”.So all you have to do is take the search you’re doing and have it exclude any subject line that contains Re: or Fwd:!
For example, if I’m looking for every e-mail from my friend Shashi that he originated (vs. an e-mail that he might have been received and subsequently replied to), I’d do a search like this:
from:shashi -subject:Re: -subject:Fwd:
