http://semanticweb.com/going-mainstream-dell-adopts-semantic-web-technologies_b18672?c=rss
26/03/2011 - 01:25
You know Semantic Web technologies are going mainstream when the company that is so closely associated with making PCs mainstream is getting in on the action. That company is Dell, and who knows but that the work it’s pursuing in the Semantic Web today won’t have just as much of an impact as its supply chain innovations did to help drive its success in those early PC days?
The proof-of-concept Semantic Web work at Dell is taking place under the direction of Yijing (Jenna) Zhou, enterprise architecture consultant, and Chary Tamirisa, enterprise architecture senior consultant. What’s the impetus for Dell to pursue this? Zhou and Tamirisa provided some insight into the whys, whats, and hows in an email discussion with The Semantic Web Blog.
“The questions raised initially were: why Semantic Web and how can Dell benefit from its use?” Zhou and Tamirisa note. “Our answer is as follows: Semantic technology is a key enabler for Dell to model enterprise business objects to enable end-to-end mapping and reuse across current and future business models, processes, and systems. We are leveraging enterprise architecture management support for semantic technology and ontology modeling to build broader awareness and knowledge across our business and IT stakeholders. Our long-term plan is to provide tangible value propositions that address current and future business challenges and opportunities. We are also focused on developing the change management strategies required to enable and adopt the techniques and technologies related to semantic-based solutions.”
continued…
New Career Opportunities Daily: The best jobs in media.
http://www.unibertsitatea.net/blogak/ixa/bilaketaz-haruntzago-ezagutza-biomedikoa-lortzen-hizkuntzaren-prozesaketaren-bidez-karin-verspoor-2011-03-18
18/03/2011 - 10:00
http://feedproxy.google.com/~r/blogspot/gJZg/~3/iv74XNcnMZM/building-resources-to-syntactically.html
11/03/2011 - 10:30
Posted by Slav Petrov and Ryan McDonald, Research Team
One major hurdle in organizing the world’s information is building computer systems that can understand natural, or human, language. Such understanding would advance if systems could automatically determine syntactic and semantic structures.
This analysis is an extremely complex inferential process. Consider for example the sentence, "A hearing is scheduled on the issue today." A syntactic parser needs to determine that "is scheduled" is a verb phrase, that the "hearing" is its subject, that the prepositional phrase "on the issue" is modifying the "hearing", and that today is an adverb modifying the verb phrase. Of course, humans do this all the time without realizing it. For computers, this is non-trivial as it requires a fair amount of background knowledge, typically encoded in a rich statistical model. Consider, "I saw a man with a jacket" versus "I saw a man with a telescope". In the former, we know that a "jacket" is something that people wear and is not a mechanism for viewing people. So syntactically, the "jacket" must be a property associated with the "man" and not the verb "saw", i.e., I did not see the man by using a jacket to view him. Whereas in the latter, we know that a telescope is something with which we can view people, so it can also be a property of the verb. Of course, it is ambiguous, maybe the man is carrying the telescope.
 Linguistically inclined readers will of course notice that this parse tree has been simplified by omitting empty clauses and traces.
Computer programs with the ability to analyze the syntactic structure of language are fundamental to improving the quality of many tools millions of people use every day, including machine translation, question answering, information extraction, and sentiment analysis. Google itself is already using syntactic parsers in many of its projects. For example, this paper, describes a system where a syntactic dependency parser is used to make translations more grammatical between languages with different word orderings. This paper uses the output of a syntactic parser to help determine the scope of negation within sentences, which is then used downstream to improve a sentiment analysis system.
To further this work, Google is pleased to announce a gift to the Linguistic Data Consortium (LDC) to create new annotated resources that can facilitate research progress in the area of syntactic parsing. The primary purpose of the gift is to generate data sets that language technology researchers can use to evaluate the robustness of new parsing methods in several web domains, such as blogs and discussion forums. The goal is to move parsing beyond its current focus on carefully edited text such as print news (for which annotated resources already exist) to domains with larger stylistic and topical variability (where spelling errors and grammatical mistakes are more common).
The Linguistic Data Consortium is a non-profit organization that produces and distributes linguistic data to researchers, technology developers, universities and university libraries. The LDC is hosted by the University of Pennsylvania and directed by Mark Liberman, Christopher H. Browne Distinguished Professor of Linguistics.
The LDC is the leader in building linguistic data resources and will annotate several thousand sentences with syntactic parse trees like the one shown in the figure. The annotation will be done manually by specially trained linguists who will also have access to machine analysis and can correct errors the systems make. Once the annotation is completed, the corpus will be released to the research community through the LDC catalog. We look forward to seeing what they produce and what the natural language processing research community can do with the rich annotation resource.


http://www.unibertsitatea.net/blogak/ixa/openmt-2-project
03/03/2011 - 22:55
Hiru berri dakartzagu OPENMT-2 proiektutik (2010-2012):
Gorka Labaka-ren tesiaren ondorioak
Inguruko erdaretatik euskarara itzultzea ez da lan erraza, ez eskuz, ez automatikoki:
- Euskararen morfologia oso aberatsa da. Horrek zailtasun handia ekartzen dio itzulpen estatistikoari. Hitz-formak euskaraz askoz gehiago direnez (etxe, etxea, etxera, etxetik...), zailagoa baita hitz guztientzat agerpen kopuru altuak aurkitzea corpus elebidunetan (lehenago itzulitako testuetan).
- Hitzen ordena oso bestelakoa da.
- Hiztun gutxiko hizkuntza izanik inguruko erdarek baino askoz testu itzuli gutxiago bil daitezke. Eta hori da estatistikaren euskarria!
Egoera horretan Gorka Labakak bi teknika garatu ditu itzulpen estatistikoaren kalitatea hobetzeko:
- Hitzak segmentatzea. Lemak eta atzizkiak banatzea. Lau modu desberdin aztertu ditu, horrela ez-ohiko hitz-formen arazoa bideratzeko.
- Erdarazko hitzak berrordenatzea. Izen-sintagmaren mailan eta esaldi mailan. beren ordainek euskaraz izango duten ordenara erakarriz. Berrantolaketa hau oso lagungarria izaten zaio dekodetzaile estatistikoari itzulpen egokiak bilatzerakoan.
Azkenaldian ikerlari gehienek itzulpen-sistema estatistikoei ematen diete protagonismo osoa, askok erregelan oinarritutako sistemak baztertzen dituzte. Baina Gorka Labakaren emaitzen ebaluazioaren arabera hori ez da jokaera zuzena.
Gorkak, besteak beste, ondorio hauek lortu ditu:
- Erregelatan oinarritutako batek (Matxin) eta 8 miloi hitzeko corpusa darabilen sistema estatistiko estandar batek maila bereko emaitzak lortzen dutela.
- Bere hobekuntzekin egindako EUSMT sistema estatistiko aztertutaak aurreko bi horiek baino emaitza hobeak lortzen dituela (HTER neurrian %10 hobea).
- Sistema hobe bat eraiki daitekeela sistema biak konbinatuz. Beste %10ean hobeago izan liteke sistema "orakulo" bat, sistema bien emaitzak konparatu eta hoberena itzuliko balu.
Aukeren %55ean EUSMTen proposamena hartu beharko luke, %41ean Matxinena, eta gainontzeko %4an itzulpen-memoriatan patroien bidez bilatuta.
Ondorio horiek ikusita, ikerketaren iparra hibridazioan eta postedizioan jarri dugu. Matxin eta EUSMTen emaitzak konbinatzeko modu eraginkorren bila ari gara. Eta ildo horietatik datoz ondoko beste berri biak.
Lluis Marquez ikerlaria gurekin izango dugu udara arte bisitan.
Itzulpen-sistemak konbinatzeko hibridazioan ikertzeko udara arte gurekin izango dugu Lluis Marquez, OPENMT-2
proiektu barruan UPC-ko burua dena. Bera nazioarteko aditua da hizkuntza-teknologian, ikasketa automatikoko teknikak erabiltzen batez ere. Gorka Labakaren esperimentuetan egiaztatu zen aukera badagoela Matxin eta EUSMT sistemak konbinatuz emaitza hobeak lortzeko. Orain konbinazio mota egokiena bilatzen ari gara. |
|
OpenMT-2
proiektuaren barruan informatikari buruzko Wikipediako 50 artikulu luze gehitzeko iniziatiba bat martxan jarri dugu. Matxin itzulpen-sistemak sortuko ditu lehen zirriborroak espainierako Wikipediatik itzulita, eta ondoren hainbat boluntarioren artean, eta eu.wikipedia elkarteak koordinatuta, zirriborro horiek zuzendu (OmegaT programa erabiliz) eta argitaratuko dituzte. |
|
Esperientzia aberasgarria izango da bi norabideetan. Wikipediarentzat esperientzia onuragarria izango da 50 artikulu berri sortuko direlako, eta itzulpen automatikoarentzat ere bai eskuz posteditatutako itzulpenekin 100.000 hitzeko corpusa batuko delako. Corpus hori, itzulpen-sistema automatikoaren kalitatea hobetzeko funtsezko baliabide izango da, teknika estatistikoak erabiliz. (ikus IEB2011-ra bidalitako aurkepena, edo ingelesez Wikimania2010-ra)
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=15168©ownerid=21753
01/03/2011 - 21:55
Semantic Web Journal special issue: Linked Data for Science and Education
http://www.eamt.org/news/news_freerbmt11_success.php
25/02/2011 - 16:45
Read about the FreeRBMT 2011 workshop which was held in Barcelona on January 20th and 21st...
http://www.unibertsitatea.net/blogak/ixa/santiago-de-cubako-centro-de-linguistica-aplicada-k-40-urte
25/02/2011 - 15:35
|
Iñaki Alegria izan da Ixa taldearen ordezkaria Santiago de Cubako Centro de Lingüística Aplicada-k (CLA)
aurten antolatu duen XII Simposium-ean. 10 orduko ikastaro bat eman du
Iñakik morfologiako tresnak erraz inplementatzeko oso baliagarria den Foma tresnaz
Oraintxe bete berri ditu Centro de Lingüística Aplicada horrek 40 urte. Zorionak!
CLA ikergunearen
40. urteurrena izan dela-eta argazkiko eskultura bidali digute IXA taldekoei opari moduan, gure lankidetza ospatzeko edo.
Eskerrik asko. Eta zorionak Eloinari, Julio Viteliori, Leonel-i eta
ikergune hori sortu eta animatzen duten ikerlari horiei guztiei! |
 |
IXA taldea lankidetzan aritu izan da azken 10 urteetan CLA ikergunearekin.
Hortik atera da, adibidez, lehengo urtean argitaratu zen Cubako Diccionario Básico Escolar (DBE) hiztegiaren hirugarren edizioa.
Hiztegia XMLz kodetuta dago, eta hiztegiak editatzeko Ixa taldean garatu zen leXkit izeneko ingurunea erabili zen.
http://www.clarin.eu/events/3426
25/02/2011 - 10:55
Date:
Monday, 11 April, 2011 (All day) - Friday, 15 April, 2011 (All day)
Location:
RIL at the Hungarian Academy of Sciences
Benczúr u. 33
Budapest
Hungary
47° 30' 41.2488" N, 19° 4' 33.8808" E
A Thematic Training Course on Processing Morphologically Rich Languages will be hosted in Budapest by the University of Helsinki and the Hungarian Academy of Sciences. This PhD level course is a part of the thematic training programme offered by the Marie Curie ITN project CLARA. We also welcome other national and international course participants.
Role of CLARIN:
CLARA is a CLARIN-related training programme
CLARIN representative(s):
Csaba Oravecz
Tamás Váradi
Krister Lindén
Kimmo Koskenniemi
http://semanticweb.com/introduction-to-rdf_b17953?c=rss
23/02/2011 - 19:55
 RDF stands for Resource Description Framework and it is a flexible schema-less data model. Do not confuse or compare it with XML (more about this later)! It is one of the core technologies of the Semantic Web and the current W3C standard to represent data on the web. But what is RDF exactly?
As I mentioned, it is a data model. It can be compared to the relational model which is the way you organize data in a relational database: group related things in tables with attributes, create links between tables, etc. RDF is just another way of organizing your data. In which way? As a graph.
New Career Opportunities Daily: The best jobs in media.
http://permalink.gmane.org/gmane.science.linguistics.corpora/12634
21/02/2011 - 15:35
META-NET (http://www.meta-net.eu/) aims to build the technological
foundations of a multilingual European information society.
In order to make this a success this, we need the support and
participation of the Language Technology (LT) community.
By joining the Multilingual European Technology Alliance (META), you can
play a part in this exciting initiative, which aims to bring about a
large-scale increase in funding for Language Technology Research on the
national and international level.
META-NET is a Network of Excellence aiming to forging the Multilingual
Europe Technology Alliance. It has three main lines of action:
1. META-VISION: Building a community with a shared vision and strategic
research agenda for Europe's Language Technology landscape. The agenda
will contain high level recommendations, ideas for visionary LT-based
applications and suggestions for joint actions to be presented to the EC
and national as well as regional bodies.
2. META-SHARE: A sustainable network of repositories
|