Additional fields in a Glossary (CafeTran support)

Technical forums » CafeTran support »
Additional fields in a Glossary
Track this topic

Additional fields in a Glossary

Thread poster: Selcuk Akyuz

Selcuk Akyuz

Türkiye
Local time: 15:18
English to Turkish
+ ...

Jan 17, 2012

http://www.cafetran.com/handbook.html#part5

Dictionaries and Glossaries

CafeTran offers a flexible interface to access and update your dictionaries in the workflow.

Glossaries are more specialized dictionaries such as terminology lists. They enable an automatic and fast look-up for any specific terminology that should be used in the translation.

This distinction between glossaries and dictionaries in CafeTran only affects the resource integration in the workflow. The lookup in glossaries is done "on the fly" each time you take a new segment, whereas the dictionary check happens when you click on the Search button in the main toolbar.

I want "on the fly" term recognition then I need to convert my existing Term Base from another CAT tool into a Glossary. Clear information and takes less than 5 minutes to create my first Glossary.

Now I would like to test the pipe character but it seems that it only works in TMX memories ("Memory for Terms" a different use of TMX files in CafeTran).

Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.

So back to my Glossary, but there is a problem. On the fly term recognition is fast but what happened to my additional fields, e.g. definition, context, subject, client, date? I cannot see them in the glossary window.

Perhaps it works with the Dictionary feature. So I created a Dictionary in the Library Menu. Problem partly solved, now I can see most of the additional fields (provided that each term has one translation). But now I have another problem, Dictionary search is not performed on the fly.

My Term Base was created in DVX (with several additional fields) but it may be any CAT tool, MemoQ or MultiTerm term bases also store additional fields. So how can we benefit from these additional and valuable information in CafeTran?

Selcuk Akyuz

Türkiye
Local time: 15:18
English to Turkish
+ ...

TOPIC STARTER

conversion: glossary into tmx

Jan 17, 2012

Selcuk Akyuz wrote:

Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.

It seems that CafeTran has a solution for it:

http://cafetran4mac.blogspot.com/2010/11/importing-glossary-ii.html

To make the integration of your base with CafeTran complete, you may convert it to a TMX file. Create a new memory(menu Memory | New memory), and then select Memory | Conversions | Import glossary entries. When import is finished, save the memory to a tmx file (Memory | Save). This also solves the problem of duplicate entries since CT adds to the TMX memory only the latest duplicate.

So far, so good! But what is a duplicate in CafeTran? Only two identical source segments (actually terms now) with identical (or maybe different) translations.

What happens in the case of two identical source terms with identical translations but with different subject or client information (yes, additional information issue again).

I think I will find the answer myself but CT freezes when segments are loaded (after terms are successfully imported to the Memory).

-------------------

OK, 11k out of 14k term pairs were imported into the new Memory which means a loss of approx. 3k term pairs with different translations (or identical translations but with different meta information). I am searching for another tab delimited text file to TMX converter now (which will not delete any segments). By the way all meta information were saved as notes in the TMX file but I could not find a way to display them (any solution?).

Honestly, these results are not satisfactory for me. I will continue testing other features of CafeTran (there are many good features indeed) and wait for improvements to be made in the Glossary/Dictionary/Memory for Terms features.

Selcuk

[Edited at 2012-01-17 02:31 GMT]

Selcuk Akyuz

Türkiye
Local time: 15:18
English to Turkish
+ ...

TOPIC STARTER

continued...

Jan 17, 2012

Tested the Glossary feature in a new project, to my surprise additional information were displayed at least for some terms.

Used the super tool UniCSVed to join all additional fields separated by tabs. And tested again in my project. Additional fields were displayed for terms with a single translation, but if a term has several meanings then additional fields were not displayed (img. 1). So I removed the tab between the target term and additional information (img. 2). But I am aware... See more

Normally we do not need additional information for terms with a single meaning, we just use them. But when we add a second meaning for a term, we need such additional information, it may be the subject or definition which helps us to select one or the other meaning. Unfortunately CT does not display additional information when a term has several meanings. ▲ Collapse

Igor Kmitowski
Local time: 14:18
SITE STAFF

Additional fields in a Glossary

Jan 17, 2012

Hi Selcuk,

Selcuk Akyuz wrote:

I want "on the fly" term recognition then I need to convert my existing Term Base from another CAT tool into a Glossary. Clear information and takes less than 5 minutes to create my first Glossary.

Now I would like to test the pipe character but it seems that it only works in TMX memories ("Memory for Terms" a different use of TMX files in CafeTran).

The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.

Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.

So back to my Glossary, but there is a problem. On the fly term recognition is fast but what happened to my additional fields, e.g. definition, context, subject, client, date? I cannot see them in the glossary window.

CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?

Igor

Igor Kmitowski
Local time: 14:18
SITE STAFF

conversion: glossary into tmx

Jan 17, 2012

You can keep the duplicate entries when converting from a tab delimited glossary to a TMX memory. Just check Keep all duplicates box in Memory | New Memory | Filter when you create the new TMX memory for the conversion.

Selcuk Akyuz wrote:

So far, so good! But what is a duplicate in CafeTran? Only two identical source segments (actually terms now) with identical (or maybe different) translations.

What happens in the case of two identical source terms with identical translations but with different subject or client information (yes, additional information issue again).

I think I will find the answer myself but CT freezes when segments are loaded (after terms are successfully imported to the Memory).

CT may freeze when you load huge TMX or glossary files into RAM memory assigned to CafeTran. You can increase this value. See Edit | Options | Memory tab | Java memory size (MB). Remember not to go over the actual RAM memory on your system.

-------------------

OK, 11k out of 14k term pairs were imported into the new Memory which means a loss of approx. 3k term pairs with different translations (or identical translations but with different meta information). I am searching for another tab delimited text file to TMX converter now (which will not delete any segments). By the way all meta information were saved as notes in the TMX file but I could not find a way to display them (any solution?).

When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit

Selcuk Akyuz

Türkiye
Local time: 15:18
English to Turkish
+ ...

TOPIC STARTER

on Edit Tu menu and others

Jan 18, 2012

Igor Kmitowski wrote:

The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.

Clear information, thanks! My glossary structure is Source Term TAB Target Term TAB Additional fields (all separated with tabs). No pipe characters, but I don't know how does CT consider my additional fields.

CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?

Sure, they are. I am good with csv files thanks to UniCSVed.

Well, but as I have stated above in my third message, Glossary file displays additional fields provided that you do not have duplicate source terms (with identical or different translations). I want to see the additional fields for duplicate terms and therefore Glossary is not so useful for me.

Thanks, it worked. No data loss now!

I have 2GB only and 1GB is assigned to Java. I assume it will not be a good idea to increase it to 2GB. Other programs may freeze.

When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit

Scroll down the list with the mouse to find the term, click on the number, click on "Edit Tu", select "Edit note" to display it, then click on X (or press Esc three times) to close the window. IMO, it is time consuming and excessive use of mouse for a program. (Generally speaking, after testing CT for 3 days, I feel many operations in CT requires use of mouse)

I still did not test the External DB function, I have to make some research before using H2, MySQL, Oracle 10g, HSQLDB 2.0 or Derby (Java DB). Use of a term list with any of these databases may be better (for speed and hopefully for GUI).

But as for the other features I tested for terminology management (Glossary, Dictionary, Memory for Terms), sorry but I am really lost in them. IMO, Dictionary is useless because there is no on-the-fly search. Memory for Terms requires too much mouse and keyboard use. Most promising one is Glossary but it needs some improvements that I have discussed in this thread. By the way I loved docking and undocking of tabs

Kind regards,

Selcuk

[Edited at 2012-01-18 03:38 GMT]

Igor Kmitowski
Local time: 14:18
SITE STAFF

on Edit Tu menu and others

Jan 18, 2012

Is there any standard for fields in tab delimited text files? Currently, CT follows this scheme:

source TAB target|alternative target|alternative target TAB additional fields

I implemented the above based on the users' request. It seems to me that there is no common agreement on how to treat the fields. In your case the issue is with alternative targets. They are not pipe separated but set in other fields.

As for the mouse operations, all basic workflow operations have keyboard shortcuts. For example, press F2 to list the matched terms and press the term number to insert it. The same holds true for autotranslation and fuzzy matches (F1 key). Yes, to reach additional meta information such as segment/terms notes in Memory you need to use the mouse.

Igor

Selcuk Akyuz wrote:

Igor Kmitowski wrote:

The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.

CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?

Thanks, it worked. No data loss now!

I have 2GB only and 1GB is assigned to Java. I assume it will not be a good idea to increase it to 2GB. Other programs may freeze.

When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit

Kind regards,

Selcuk

[Edited at 2012-01-18 03:38 GMT]

▲ Collapse

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie	[Call to this topic]

You can also contact site staff by submitting a support request »

Additional fields in a Glossary

Forum rules

Help and orientation

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators. Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way. More info »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

Additional fields in a Glossary

Additional fields in a Glossary

You have native languages that can be verified

Your current localization setting

Select a language