Posteado por: knightsbridge | Mayo 7, 2008

Five translation examples by MT systems

According to an article published by Alan K. Melby in Revista tradumàtica, such systems consist of three phases of processing:

  • Analysis of the source text
  • Transfer (to accomodate differences between the source and target languages)
  • Generation of the target text from an intermediate representation

Firsty, I will provide five examples by MT systems applied to closely related languages (Spanish, Catalan, Portuguese, French, or Italian) and less related languages (English, Spanish, Japanese, German, Chinese, or Arabic):

MT systems that tranlate closely related languages:

  • Instituto Cervantes: Its aim is to translate close languages such as Spanish, Portuguese or Catalan.
  • ATS: This company’s goal is to translate close languages such as English, Portuguese or Spanish.

The following MT systems translate less related languages:

  • Reverso: This webpage deals with the translation of languages such as Spanish, English, Russian…
  • Systran: It translates less related languages such as Chinese, Arabic, English or Portuguese.
  • Freetranslation: It translated languages such as Chinese, Arabic, Italian, Portuguese, English.

 

Souces:

 

Posteado por: knightsbridge | Abril 16, 2008

Differences between the following specialized terms (Q.3)

The terms are the following:

Machine translation: Sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Machine-aided translation: It is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.

Multilingual content management: It contains information, mostly in the form of more or less structured text documents, but potentially also including audio clips, video clips and images.

Translation technology:  Translation is the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the “source text,” and the language it is to be translated into is called the “target language“; the final product is sometimes called the “target text.”

Sources:

www.wikipedia.org

Posteado por: knightsbridge | Abril 16, 2008

Characteristics of the translation task (Q.3)

According to the FEMTI report, the characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent (human or otherwise) who receives the translation. The main characteristics are the following:

Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.

Document routing or sorting: The purpose of document routing / sorting is to scan incoming translated documents quickly in order to send them to the appropriate points for further processing or storage.

Information extraction or summarization: The purpose of information extraction or summarization is to extract some portion(s) of the translated text, either manually or automatically, for subsequent processing or storage. Information extraction is typically concerned with filling templates by identifying atomic elements of events. In contrast, summarization aims to provide a self-contained and internally cohesive text which serves as a selective account of the original.

Source:

http://www.issco.unige.ch:8080/cocoon/femti/st-home.html

Posteado por: knightsbridge | Marzo 24, 2008

Explanation of three of the topics (Q.2)

The first topic that I am going to explain is ‘Speech recognition and speech synthesis’, which belongs to the researchs of the Association for Computational Linguistics. The ’speech recognition’ converts spoken words to machine-readable input (for example, to the binary code for a string of character codes). The term voice recognition may also be used to refer to speech recognition, but more precisely refers to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said. The ’speech synthesis’ is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Microsoft has been researching and developing speech technologies for over a decade, and Windows Vista contais both types of speech. Speech recognition is even more complicated than speech synthesis.

The second topic on which I am going to focus is the ’Cross-retail Multi-agent Retail Comparison (CROSSMARC)’. CROSSMARC will develop next-generation technology for electronic-retail product comparison, drawing on techniques from language engineering, machine learning and user modelling. A number of commercial agent-based systems have been developed to help Internet shoppers decide what to buy and where to buy it from. The majority of these systems, however, assume that product names and features are expressed in a uniform and monolingual (English) manner. Furthermore, they need to be configured manually for new product types, a process that requires substantial effort and time. CROSSMARC will develop novel technology for e-retail product comparison. Unlike existing technology, the project will lead to systems able to process pages written in several languages (initially, Greek, English, Italian, and French), and adaptable to new product types via semi-automatic tools. CROSSMARC will exploit language technology and machine learning techniques for information extraction, that will be extended and tailored to the characteristics of e-shopping.

The last topic on which I am going to focus is ‘Pragmatics‘ which is being researched by the Association for Computational Linguistics and Natural Processing Language. Pragmatics is the study of the ability of natural language speakers to communicate more than that which is explicitly stated. The ability to understand another speaker’s intended meaning is called pragmatic competence. An utterance describing pragmatic function is described as metapragmatic. In the ‘Journal of Pragmatics’ states that Linguistic pragmatics has been able to formulate a number of questions over the years that are essential to our understanding of language as people’s main instrument of “natural” and “societal” interaction. By providing possible theoretical foundations for the study of linguistic practice, linguistic pragmatics has helped to increase our knowledge of the forms, functions, and foundations, of human interaction. The ‘Journal of Pragmatics’ identifies with the above general scope and aims of pragmatics.

Sources:

www.wikipedia.org

www.iit.demokritos.gr

http://www.elsevier.com/wps/find/journaldescription.cws_home/505593/description#description

Posteado por: knightsbridge | Marzo 24, 2008

Recent Research Topics on HLT (Q.2)

This article will deal with the most recent research topics mentioned in major sites on Human Language Technologies.

Within the German Research Center for Artificial Intelligence, the following themes are elaborated in research:

  • Exploiting – and automatically extending – ontologies for content processing.
  • Tighter integration of shallow and deep techniques in processing.
  • Enriching deep processing with statistical methods.
  • Combining language checking with structuring tools in document authoring.
  • Document indexing for German and English.
  • Automatically associating recognized information with related information and thus building up collective knowledge.
  • Automatically structuring and visualizing extracted information.
  • Processing information encoded in multiple languages, among them Chinese and Japanese.

 

The Edinburgh Language Technology Group conducts research and development in the following areas:

  • Combining Shallow Semantics and Domain Knowledge (EASIE).
  • Text Mining for Biomedical Content Curation (TXM).
  • Cross-retail Multi-agent Retail Comparison (CROSSMARC).
  • Smart Qualitalive Data: Methods and Community tools for Data Mark-up (SQUAD).
  • Machine Learning for Named Entity Recognition (SEER).
  • Integrated Models and Tools for Fine-Grained Prosody in Discourse (Synthesis).
  • Joint Action Science and Technology (JAST).
  • AMI consorting projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location.
  • Study of how pairs collaborate when in planning a route on a map (Collaborating using diagrams)

 

The Common Language Resources and Technology Infrastructure wants to achieve a number of goals, which are:

  • They need a broad and deep understanding of the goals of CLARIN by everyone involved. Yet they cannot assume that the knowledge is already sufficiently spread.
  • They need to start the interaction with everyone involved and interested and to take up the comments and ideas from all the experts.
  • They need to spread the relevant messages about the different layers of the work that is involved when setting up a research infrastructure in particular since it involves aspects that were not yet topic of the general discussions in our field.
  • We need to create a positive atmosphere and an enthusiasm which will be important to meet our challenging goals.
  • They need to start the actual work in the working groups and invite all experts to participate.
  • Of course those who are partners in the EC funded project need to understand the rules of the game. In particular the double funding scheme – national and EC funding – needs careful attention from all of them. Other members need to be informed about the national groups.

The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio) invite student researchers to submit their work to the workshop. The research being presented can come from any topic area within computational linguistics including the following topic areas:

  • Pragmatics, discourse, semantics, syntax and the lexicon.
  • Phonetics, phonology and morphology.
  • Linguistic, mathematical and psychological models of language.
  • Information retrieval, information extraction, question answering.
  • Summarization and paraphrasing.
  • Speech recognition, speech synthesis.
  • Corpus-based language modeling.
  • Multi-lingual processing, machine translation, translation aids.
  • Spoken and written natural language interfaces, dialogue systems.
  • Multi-modal language processing, multimedia systems.
  • Message and narrative understanding systems.

Sources:

Posteado por: knightsbridge | Marzo 17, 2008

Definition of Human Language Technologies (Q.1)

The definitions of Human Language Technologies which can be found on the Net are numerous. I have chosen one by Wikipedia, that refers to the term as Natural language processing (NLP) and says:

‘It is a subfield of artificial intelligence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate’.

and another one by Hans Uszkoreit, out of his study in 2007 called ‘What is Language Technology?‘. Uszkoreit defines the term as:

‘Language technology — sometimes also referred to as human language technology — comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics’.
Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin and the University of Texas at Austin. During this time in Austin he also worked as an research associate in a large translation project at the Linguistics Research Center. In 1998 Uszkoreit was a appointed to a newly created chair of Computational Linguistics at Saarland University and started at the Department of Computational Linguistics and Phonetics. Uszkoreit is permanent member of the International Comittee of Computational Linguistics (ICCL, member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, member of the Executive Board of the European Network of Language and Speech.

Nowadays, he is a professor of Computational Linguistics at the Department of Computational Linguistics and Phonetics of Saarland University at Saarbrücken, at the same time he serves as Scientific Director at the German Research Center of Artificial Intelligence (DFKI) where he leads the DFKI Language Technology Laboratory.By cooptation, he is also Professor of the Computer Science Department.

Among his most relevant projects and publications, we can quote:
  • Uszkoreit, H. (1999) ‘Hauptartikel Grammatikmodelle , sowie mehrere Lang-und Kurzartikel zum Themenbereich Grammatiktheorie’ . In G. Strube (Ed.) Wörterbuch der Kognitionswissenschaft. Klett-Cotta, Stuttgart 1996.
  • Uszkoreit, H. (1997) ‘Overview: Formal Tools and Methods’. In R.A Cole et al. (Eds.), Survey of the State of the Art in Human Language Technology, Cambridge University Press and Giardini.
  • Uszkoreit, H. (2007) ‘Methods and Applications for Relation Detention’. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing 2007.
Some of the European Research centres for Human Language Technologies are the following:
  • National Centre for Language Technology (NCLT) : Located in Dublin,the centre “carries out basic researchs and develops applications’.
  • OFAI Language Technology Group: An Australian center that ‘conduct research in modelling and processing human languages, especially for German’.
  • Edimburgh Language Technology Group (LTG) : The LTG has been working since the early 1990s and they focus on ‘building practical solutions to real problems in text processing’.
  • Language Technology Documentation Centre in Finland: This centre has been developing ‘in order to make speech-to-speech translation real’.

Sources:

 

 

Posteado por: knightsbridge | Enero 9, 2008

Educación a distancia: Internet

La educación online o educación a través de Internet, es, según la Wikipedia, una modalidad educativa en la que los estudiantes no necesitan asistir físicamente a ningún aula, el material es enviado por correo electrónico directamente desde las sedes hasta nuestros ordenadores personales y a través de los cuales podemos consultar dudas con los tutores del curso. Otra opción consiste en la inscripción en un curso online donde la información y los contenidos del curso se presentan “colgados” en la página del mismo. Nos encontramos ante un avance relevante en el que nuestros antepasados, y sin ir mas lejos, nuestros padres, no podían ni soñar. educación online

La educación a distancia es el desarrollo de los cursos por correspondencia, que empezó de la necesidad de impartir enseñanza a alumnos en lugares aislados, en los que no era posible construir un colegio. Actualmente se ha extendido tanto que ya existen universidades que cuentan con más de 100.000 alumnosque reciben sus clases a través de Internet.

Una de las ventajas que hacen a esta modalidad particularmente atractiva, es su flexibilidad de horarios. El estudiante se organiza su período de estudio por sí mismo, lo cual requiere cierto grado de autodisciplina. Sus desventajas se refieren a la desconfianza que se genera ante la falta de comunicación entre el profesor y sus alumnos, sobre todo en el proceso de evaluación del aprendizaje del alumno.

En España, contamos con la UNED (Universidad Nacional de Educación a Distancia), la universidad con el mayor número de alumnos matriculados. El futuro de la educación está en Internet, la red de redes, y debemos aceptarlo tal como viene y no mostrarnos reacios a sus avances.

Posteado por: knightsbridge | Enero 3, 2008

XML: Some goals and terminology

As I have said in the other article XML is the abbreviated form of Extensible Markup Language and it describes a class of data objects called XML documents and partially describes the behaviour of computer programs which process them. Once knew the meaning and how it works, let’s start listing some of its goals:

  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

The terminology used to describe XML documents is defined in the body of this specification. The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when EMPHASIZED, are to be interpreted as described in RFC 2119. For example, MUST means that the definition is an absolute requirement of the specification or MUST NOT/ SHALL NOT, mean that the definition is an absolute prohibition of the specification. I list some of the terminology used:

  • Error: A violation of the rules of this specification; results are undefined. Unless otherwise specified, failure to observe a prescription of this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT, SHALL and SHALL NOT is an error.
  • At user option: Conforming software MAY or MUST (depending on the modal verb in the sentence) behave as described; if it does, it MUST provide users a means to enable or disable the behavior described.
  • Well-formedness constraint: A rule which applies to all well-formed XML documents. Violations of well-formedness constraints are fatal errors.
  • For compatibility: Definition: Marks a sentence describing a feature of XML included solely to ensure that XML remains compatible with SGML.

The list could continue, but i will just finish by saying that the future is already among us and we had better get ready for the IT advances that are to come.

 

Sources:

Posteado por: knightsbridge | Enero 2, 2008

XML: a decade among us

XML stands for Extensible Markup Language. It is asimple, very flexible text format derived from SGML (Standard Generalized Markup Language) and originallyxml designed to meet the challenges of large-scale electronic publishing and used both to encode documents and serialize data. XML is is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. There are two levels of correctness of an XML document:

  1. Well-formed. A well-formed document conforms to all of XML’s syntax rules. For example, if a start-tag appears without a corresponding end-tag, it is not well-formed. A document that is not well-formed is not considered to be XML
  2. Valid. A valid document additionally conforms to some semantic rules. For example, if a document contains an undefined element, then it is not valid.

XML is not alone, there are plenty of technologies, possibilities… around it, easier and more interesting ways of working with data, and, in short, an advance while working on information, which is, in fact, the aim of IT. XML is not a language, it is several, it is not a syntax, but several and it is not a new way of working but a more refined one that will allow the previous ones to communicate between them because data are no longer meaningless.

It is a fact that most of web pages that deal with news are based upon XML. The most common thing is the information to be stored in database, then turn it into XML and then transform it so as to serve it to the custom.

 

Sources:

Posteado por: knightsbridge | Diciembre 26, 2007

Hipermedios

El término “hipermedia” surge de la fusión de los conceptos de: hipertexto y multimedia. Los sistemas de hipermedios podemos entenderlos, según la Wikipedia, como el término con que se designa al conjunto de métodos o procedimientos para escribir, diseñar, o componer contenidos que tengan texto, video, audio, mapas u otros medios, y que además tenga la posibilidad de interactuar con los usuarios.

Los enlaces constituyen el núcleo fundamental de los sistemas de hipermedia. La capacidad de crear estructuras jerárquicas o asociativas permite al usuario una estructuración lógica y en ocasiones conceptual del contenido de los documentos. La noción de búsqueda de información dentro de los sistemas hipermedia es algo ambiguo. La búsqueda de información se trata de una operación implícita a los procesos de navegación ya que las acciones del usuario en cuanto a la activación de enlaces o consulta de redes semánticas se realizan para buscar información.

La Elaboración Documental constituye el núcleo creativo de un sistema hipermedia. Este aspecto constituye un 25% del éxito del entorno. Además de la funcionalidad, una aplicación de este tipo debe ofrecer una serie de características que faciliten la tarea de escritura. Entre los tipos de hipermedia se encuentran:

  1. Hipertexto
  2. Hiperfilmes.
  3. Hipergrama

El primer sistema hipermedia creado fue el Aspen Movie Map. Actualmente ejemplos de hipermedia pueden ser:

  • La World Wide Web,
  • Las películas almacenadas en un DVD.
  • Las presentaciones en Powerpoint o en Flash, o productos informáticos similares.

 

Fuentes:

Entradas antiguas »

Categorías