Laputan Logic

Thursday, July 31, 2003

IAA report: From the horse's mouth This is an official summary of the Israel Antiquities Authority's final report into the Yehoash Inscription and the James Ossuary.

Word of the almost simultaneously discovery of the bone box known as the “James Ossuary” and the Yehoash inscription, from an unknown source (not from an methodical excavation), together with the emotions raised by the finds and extensive public interest amongst Jews and Christians, obliged the Israel Antiquities Authority (IAA), the body responsible for all archaeological activities in Israel, to take action, to examine the finds and formulate a position on the subject. The IAA agreed to a short exhibit of the ossuary in Canada.
Numerous articles, all appearing within a short period of time, either confirm or deny the authenticity of the items. If the pieces are authentic (particularly the Yehoash inscription), then they are of great scientific value. The IAA was thus bound to do everything possible to arrive at the truth and present its conclusions...
-- FINAL REPORT OF THE EXAMINING COMMITTEES FOR THE YEHOASH INSCRIPTION AND JAMES OSSUARY

¶ 12:26 pm

Statistical translation

"Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, a computer scientist in the USC School of Engineering's Information Sciences Institute.
Och spoke after the 2003 Benchmark Tests for machine translation carried out in May and June of this year by the U.S. Commerce Department's National Institute of Standards and Technology.
Och's translations proved best in the 2003 head-to-head tests against 7 Arabic systems (5 research and 2 commercial-off-the-shelf products) and 14 Chinese systems (9 research and 5 off-the-shelf). In the previous, 2002 evaluations they had proved similarly superior.
The researcher discussed his methods at a NIST post-mortem workshop on the benchmarking held July 22-23 at Johns Hopkins University in Baltimore, Maryland.
Och is a standout exponent of a newer method of using computers to translate one language into another that has become more successful in recent years as the ability of computers to handle large bodies of information has grown, and the volume of text and matched translations in digital form has exploded, on (for example) multilingual newspaper or government web sites.
Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones.
"Our approach uses statistical models to find the most likely translation for a given input," Och explained
"It is quite different from the older, symbolic approaches to machine translation used in most existing commercial systems, which try to encode the grammar and the lexicon of a foreign language in a computer program that analyzes the grammatical structure of the foreign text, and then produces English based on hard rules," he continued.
"Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system it with a parallel corpus, that is, a collection of texts in the foreign language and their translations into English.
"The computer uses this information to tune the parameters of a statistical model of the translation process. During the translation of new text, the system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."
This method ignores, or rather rolls over, explicit grammatical rules and even traditional dictionary lists of vocabulary in favor of letting the computer itself find matchup patterns between a given Chinese or Arabic (or any other language) texts and English translations.
[Romancing the Rosetta Stone]

Speaking of Rosetta Stones: Egypt is now demanding that the original one be returned. The British as always with these things are unmoved.

Thanks Peter ¶ 12:24 am

Wednesday, July 30, 2003

Sanskrit dictionary

For three generations, they have compiled and argued, agonized and transcribed — toiling in monastic tedium to turn an intricate, 44-letter language into six volumes, so far, of word after long-forgotten word.
They have delved into the grammatical roots of "antahpravesakama" and debated the pun hidden in "anangada." They've done a brain-numbingly complete dissection of "anekakrta."
Now, 55 years after a group of scholars began composing the authoritative dictionary of Sanskrit, the long-dead language of India's ancient glory, they are almost done — with the first letter.
"Sanskrit," sighed Vinayaka Bhatta, chief editor of Deccan College's dictionary project, "is not easy to translate."
[After 55 years of toil, Sanskrit dictionary not even close]

¶ 12:05 pm

Linguist baiting And here's a little snippet about the language spoken by Adam and Eve (well, not exactly, but...). It probably won't surprise anyone to learn that Adam was, in fact, a Basque.

Gene research is helping clear up the mystery of the origins of the Basque people, a culture that apparently came out of East Africa 50,000 years ago and passed through the Middle East on the way to Western Europe, a University of Nevada researcher says.

That's one of the reasons when reviewing documents written in the ancient Sumerian language, "you would swear you are reading Basque," said Joxe Mallea-Olaetxe, adjunct professor for the Center for Basque Studies at the University of Nevada, Reno.

It's also why some cities in the Middle East have names that could be Basque-related, such as Ur, Uruk and Mari, he said. The name of a Basque goddess is Mari.

"The Basque came out of East Africa 50,000 or so years ago and passed through the Middle East," Mallea-Olaetxe said during a recent presentation at Northeastern Nevada Museum as part of the National Basque Festival in Elko.

Mallea-Olaetxe said scientists traced the female gene back 150,000 years to East Africa but for research purposes followed the male Y chromosome that enables researchers to trace human whereabouts.

They started tracing the male gene to 60,000 years ago in East Africa, and then through the Middle East to Central Asia some 40,000 years ago, the professor said.

Linguists suspected long before the genetic research that an old language in Central Asia "looked suspiciously like Basque," Mallea-Olaetxe said. That language, Burushaski, is dead now, he said.

So genetic research is proving the linguists right, he said.

[Genes help solve mystery of Basque origins]

I'm sure they'll be greatly reassured, Joxe. ¶ 12:18 am

First Americans

An archaeological site in Siberia -- long thought to be the original jumping-off point for crossing the Bering land bridge into North America -- is actually much younger than previously believed, shaking the theory that the first Americans migrated overland during the final cold snap of the last great ice age.

Using radiocarbon dating, scientists found that the Ushki site, the remains of a community of hunters clustered around Ushki Lake in northeastern Russia, appears to be only about 13,000 years old -- 4,000 years younger than originally thought.

The new date places the Ushki settlement in the same time period as the Clovis site, an ancient community found in New Mexico, making it highly unlikely that people could have traversed the thousands of miles from Siberia in such a short period.

"This was the last site out there in Siberia that could have been an ancestor for the Clovis," said Michael Waters, co-author of the research appearing today in the journal Science. "We have to think bigger now and start thinking outside the box."

[New questions about migration of first Americans]

The subject of when humans first arrived in America is hotly contested by academics.

On one side of the argument are researchers who claim America was first populated around 13,000 years ago, toward the end of the last Ice Age. On the other are those who propose a much earlier date for colonisation of the continent - possibly around 30,000-40,000 years ago.

The authors of the latest study reject the latter theory, proposing that humans entered America no earlier than 18,000 years ago.

[Date limit set on first Americans]

¶ 12:08 am

Tuesday, July 29, 2003

Roman Cosmetics

A Museum of London conservator shows the contents of a Roman tin box after opening it for the first time since its discovery in London, July 28, 2003. Archaeologists excavating the site of a major Roman temple in London found the box containing a white cream still bearing the finger marks of the person who last used it, nearly 2,000 years ago, museum officials said. [more]

Almost 2,000 years ago, at a temple in Roman London, someone with slender fingers took a small tin box, scooped a blob of white paste into the lid, and used that as a palette to smear the paste on to ... a face? Hands? An image of a god? The archaeologists jostling for position yesterday, as the box was opened for the first time in almost 2,000 years, had no idea.
The beautifully made box was easier to open than a new jar of Marmite. There was a gasp as conservator Liz Barham gently twisted off the lid to reveal perfectly preserved fingerprints, so small they may have been those of a woman or even a child. There was a second gasp as the smell hit the company.
"Asses' milk?" wondered Francis Grew, the curator of archaeology at the Museum of London. "Asses' yoghurt," retorted Hedley Swaine, the keeper of early London archaeology.
"A somewhat sulphurous smell, highly characteristic of waterlogged deposits from that site," Ms Barham said carefully. "And cheesy," she added, unable to stop her nose from wrinkling as the paste warmed under the camera lights.
[2,000-year-old pot opened]

¶ 11:51 pm

Monday, July 28, 2003

A Poverty of Evidence

Linguistic nativism or the innateness hypothesis is the claim, advanced by Chomsky (1986) and Pinker (1994) amongst others, that human beings are endowed with an innately, presumably genetically, specified domain specific knowledge of language. This knowledge is tacit, that is to say not accessible to conscious thought, and it specifies in some detail the nature of possible human languages, including a set of syntactic categories, a set of possible phrase structure rules, constraints on admissible transformations and so on. The primary argument for this bold hypothesis is the so-called Argument from the Poverty of the Stimulus (APS), that the linguistic input or evidence available to the infant child is so impoverished and degenerate that no general, domain-independent learning algorithm could possibly learn a plausible grammar without assistance.
An obvious refutation of this argument is to demonstrate the existence of an algorithm that can learn a reasonable grammar, from that amount of data. It is that issue that this thesis is intended to study. Nonetheless the algorithms presented here are I hope of general interest as pieces of computational linguistics or machine learning research.
[more (PDF)]

¶ 11:27 am