Language, Speech and Multimedia Technologies Observatory

http://lingpipe-blog.com/2010/08/10/bing-translate-has-a-great-ui-and-some-nice-nlp-too/
09/13/2010 - 07:56

I’m really digging the user interfaces Bing has put up. Google’s now copying their image display and some of their results refinement. I’ve been working on tokenization in Arabic for the LingPipe book and was using the text from an Arabic Wikipedia page as an example.

Here are links to Bing’s and Google’s offerings:

Yahoo!’s still using Babel Fish in last year’s UI; it doesn’t do Arabic.

Language Detection

First, it uses language detection to figure out what language you’re translating from. Obvious, but oh so much nicer than fiddling with a drop-down menu.

Side-by-Side Results

Second, it pops up the results side-by-side.

Sentence-Level Alignments

Even cooler, if you mouse over a region, it does sentence detection and shows you the corresponding region in the translation. Awesome.

Back at Google

I went back and looked at Google translate and see that they’ve added an auto-detect language feature since my last visit. Google only displays the translated page, but as you mouse over it, there’s a pop-up showing the original text and asking for a correction.

I don’t know who did what first, but these are both way better interfaces than I remember from the last time I tried Google translation. And the results are pretty good NLP-wise, too.


http://www.speechtechmag.com/Articles/News/News-Feature/ANSI-Publishes-Voice-Biometric-Data-Standard-69423.aspx
09/13/2010 - 07:56

The standard lays out the type and format of information that should accompany recordings.

http://www.speechtechmag.com/Articles/News/News-Feature/Worlds-First-Nationwide-Voice-Identification-System-Deployed-in-Mexico-69428.aspx
09/13/2010 - 07:56

The system will help Mexican law enforcement agencies collect, manage, and search the database of hundreds of thousands of voiceprints.

http://hunch.net/?p=1450
09/13/2010 - 07:56

There were several papers that seemed fairly interesting at KDD this year. The ones that caught my attention are:

  1. Xin Jin, Mingyang Zhang, Nan Zhang, and Gautam Das, Versatile Publishing For Privacy Preservation. This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm.
  2. Arik Friedman Assaf Schuster, Data Mining with Differential Privacy. This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006.
  3. David Chan, Rong Ge, Ori Gershony, Tim Hesterberg, Diane Lambert, Evaluating Online Ad Campaigns in a Pipeline: Causal Models At Scale This paper is about automated estimation of ad campaign effectiveness. The double robust estimation technique seems intuitively appealing and plausibly greatly enhances effectiveness.
  4. Naoki Abe et al. Optimizing Debt Collections Using Constrained Reinforcement Learning This is an application paper about optimizing the New York State income tax collection agency. As you might expect, there are several cludgy aspects due to working within legal and organizational constraints. They deal with them, and expect to end up making NY state around $108/year. Too bad I live in NY :)
  5. Vikas C Raykar, Balaji Krishnapuram, and Shinpeng Yu Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost This paper is about a continuization based solution to designing a cost-efficient yet accurate classifier cascade. It’s a step beyond the Viola Jones style boosting with cutouts, but I suspect not yet a final solution.
  6. D. Sculley, Combined Regression and Ranking. There are lots of applications where you want both a correct ordering and an estimated value of each item. This paper shows a simple combined-loss approach to getting both which empirically improves on either metric.

In addition, I enjoyed Konrad Feldman’s invited talk on Quantcast’s data and learning systems which sounded pretty slick.

In general, it seems like KDD is substantially maturing as a conference. The work on empirically effective privacy-preserving algorithms and some of the stats-work is ahead of what I’ve seen at other machine learning conferences. Presumably this is due to KDD being closer to the business side of machine learning and hence more aware of what are real problems there. An annoying aspect of KDD as a publishing venue is that they don’t put the papers on the conference website, due to ACM constraints. A substantial compensation is that all talks are scheduled to appear on videolectures.net and, as you can see, most papers can be found on author webpages.

KDD also experimented with crowdvine again this year so people could announce which talks they were interested in and setup meetings. My impression was that it worked a bit less well than last year, partly because it wasn’t pushed as much by the conference organizers. Small changes in the interface might make a big difference—for example, just providing a ranking of papers by interest might make it pretty compelling.

http://www.speechtechmag.com/Articles/News/News-Feature/A-New-eReader-Reads-to-You-69580.aspx
09/13/2010 - 07:56

Blio will have text-to-speech capabilities powered by Nuance

http://feedproxy.google.com/~r/DataMining/~3/IB6oLSyEias/the-recorded-future-is-here.html
09/13/2010 - 07:56

Recorded Future is a new venture which mines the web for statements that are associated with some time expressions. It then uses this corpus to describe the future in various geographies for various topics. In addition to the application of information extraction methods, they also present this information in creative visual displays.


 

The site is plenty full of jQuery goodness, but I did find the newbie experience a little puzzling (how do I navigate to the data visualization? not clear...)

Finally, I loved this quote from a satisfied customer:

"This definitely reduces time in figuring out what may or may not be happening in the future based on what has been happening in the past. It cuts that time in half. "

Advertising Executive

[HT Sundar]


 

http://www.lt-world.org/kb/communication_and_ipr/news/ltw_news.2009-06-30.8700093261
09/13/2010 - 07:56
http://www.semanticweb.com/natural_language_processing/wikiseer_lets_users_check_into_web_content_before_clicking_on_the_link_171405.asp?c=rss
09/13/2010 - 07:56

wiki3.png



If you’re like most people these days, you simply don’t have a lot of time to scan entire articles to see if they actually have the information you’re looking for. WikiSeer wants to do the work for you.

“When you browse or search on the Net, you often come across so many different kinds of documents and you don’t know what they’re about,” says the co-founder of the new service, still in beta, Sameer Yami. “There’s so much information overload, and we want to reduce that so you can just read what you are interested in.”

continued...

New Career Opportunities Daily: The best jobs in media.

http://www.speechtechmag.com/Articles/Editorial/FYI/Is-Natural-Language-Right-for-You-69667.aspx
09/13/2010 - 07:56

Speakers urge caution when choosing to roll out a natural language app.

http://www.semanticweb.com/news/is_hp_worried_that_former_ceo_hurd_will_spill_its_semantic_web_secrets_to_oracle_172926.asp?c=rss
09/13/2010 - 07:56

hurd.bmp Former HP CEO Mark Hurd’s move to take on the co-president role at Oracle has HP in a tizzy. It’s suing Hurd to keep him out of the role, concerned that it could result in his violating his agreement “to protect HP's trade secrets and confidential information.”

Could any of that concern involve HP Lab’s recently disbanded Semantic Web research program (some ten years in the making), or other efforts at the company that involve semantic technologies? We're not talking open-sourced Jena. But as The Semantic Web Blog noted, Oracle may well be in need of incrementally enhancing its broad portfolio of Fusion Middleware data integration products with semantic technology, or even bringing out “an overarching new integration product that has semantic web technology at the core” to deliver greater cohesiveness around that crowded middleware lineup (See here). And after all, in addition to the widely acclaimed Jena semantic web framework for building semantic web apps, HP had underway other initiatives that clearly added to its semantic web knowledge -- and that, at least conceivably, could contribute to Oracle achieving richer, semantically-enabled data integration order.

continued...

New Career Opportunities Daily: The best jobs in media.

Syndicate content