Additional fields in a Glossary Thread poster: Selcuk Akyuz
| Selcuk Akyuz Türkiye Local time: 03:39 English to Turkish + ...
http://www.cafetran.com/handbook.html#part5
Dictionaries and Glossaries
CafeTran offers a flexible interface to access and update your dictionaries in the workflow.
Glossaries are more specialized dictionaries such as terminology lists. They enable an automatic and fast look-up for any specific terminology that should be used in the translation.
This distinction between glossaries and dictionaries in CafeTran only affects the resource integration in the workflow. The lookup in glossaries is done "on the fly" each time you take a new segment, whereas the dictionary check happens when you click on the Search button in the main toolbar.
I want "on the fly" term recognition then I need to convert my existing Term Base from another CAT tool into a Glossary. Clear information and takes less than 5 minutes to create my first Glossary.
Now I would like to test the pipe character but it seems that it only works in TMX memories ("Memory for Terms" a different use of TMX files in CafeTran).
Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.
So back to my Glossary, but there is a problem. On the fly term recognition is fast but what happened to my additional fields, e.g. definition, context, subject, client, date? I cannot see them in the glossary window.
Perhaps it works with the Dictionary feature. So I created a Dictionary in the Library Menu. Problem partly solved, now I can see most of the additional fields (provided that each term has one translation). But now I have another problem, Dictionary search is not performed on the fly.
My Term Base was created in DVX (with several additional fields) but it may be any CAT tool, MemoQ or MultiTerm term bases also store additional fields. So how can we benefit from these additional and valuable information in CafeTran? | | | Selcuk Akyuz Türkiye Local time: 03:39 English to Turkish + ... TOPIC STARTER conversion: glossary into tmx | Jan 17, 2012 |
Selcuk Akyuz wrote:
Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.
It seems that CafeTran has a solution for it:
http://cafetran4mac.blogspot.com/2010/11/importing-glossary-ii.html
To make the integration of your base with CafeTran complete, you may convert it to a TMX file. Create a new memory(menu Memory | New memory), and then select Memory | Conversions | Import glossary entries. When import is finished, save the memory to a tmx file (Memory | Save). This also solves the problem of duplicate entries since CT adds to the TMX memory only the latest duplicate.
So far, so good! But what is a duplicate in CafeTran? Only two identical source segments (actually terms now) with identical (or maybe different) translations.
What happens in the case of two identical source terms with identical translations but with different subject or client information (yes, additional information issue again).
I think I will find the answer myself but CT freezes when segments are loaded (after terms are successfully imported to the Memory).
-------------------
OK, 11k out of 14k term pairs were imported into the new Memory which means a loss of approx. 3k term pairs with different translations (or identical translations but with different meta information). I am searching for another tab delimited text file to TMX converter now (which will not delete any segments). By the way all meta information were saved as notes in the TMX file but I could not find a way to display them (any solution?).
Honestly, these results are not satisfactory for me. I will continue testing other features of CafeTran (there are many good features indeed) and wait for improvements to be made in the Glossary/Dictionary/Memory for Terms features.
Selcuk
[Edited at 2012-01-17 02:31 GMT] | | | Selcuk Akyuz Türkiye Local time: 03:39 English to Turkish + ... TOPIC STARTER continued... | Jan 17, 2012 |
Tested the Glossary feature in a new project, to my surprise additional information were displayed at least for some terms.
Used the super tool UniCSVed to join all additional fields separated by tabs. And tested again in my project. Additional fields were displayed for terms with a single translation, but if a term has several meanings then additional fields were not displayed (img. 1). So I removed the tab between the target term and additional information (img. 2). But I am aware... See more Tested the Glossary feature in a new project, to my surprise additional information were displayed at least for some terms.
Used the super tool UniCSVed to join all additional fields separated by tabs. And tested again in my project. Additional fields were displayed for terms with a single translation, but if a term has several meanings then additional fields were not displayed (img. 1). So I removed the tab between the target term and additional information (img. 2). But I am aware that this is not functional, additional information should be displayed but separated by a tab. Otherwise we can not use it for "auto-completion".

Normally we do not need additional information for terms with a single meaning, we just use them. But when we add a second meaning for a term, we need such additional information, it may be the subject or definition which helps us to select one or the other meaning. Unfortunately CT does not display additional information when a term has several meanings. ▲ Collapse | | | Additional fields in a Glossary | Jan 17, 2012 |
Hi Selcuk,
Selcuk Akyuz wrote:
I want "on the fly" term recognition then I need to convert my existing Term Base from another CAT tool into a Glossary. Clear information and takes less than 5 minutes to create my first Glossary.
Now I would like to test the pipe character but it seems that it only works in TMX memories ("Memory for Terms" a different use of TMX files in CafeTran).
The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.
Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.
So back to my Glossary, but there is a problem. On the fly term recognition is fast but what happened to my additional fields, e.g. definition, context, subject, client, date? I cannot see them in the glossary window.
CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?
Igor | |
|
|
conversion: glossary into tmx | Jan 17, 2012 |
You can keep the duplicate entries when converting from a tab delimited glossary to a TMX memory. Just check Keep all duplicates box in Memory | New Memory | Filter when you create the new TMX memory for the conversion.
Selcuk Akyuz wrote:
So far, so good! But what is a duplicate in CafeTran? Only two identical source segments (actually terms now) with identical (or maybe different) translations.
What happens in the case of two identical source terms with identical translations but with different subject or client information (yes, additional information issue again).
I think I will find the answer myself but CT freezes when segments are loaded (after terms are successfully imported to the Memory).
CT may freeze when you load huge TMX or glossary files into RAM memory assigned to CafeTran. You can increase this value. See Edit | Options | Memory tab | Java memory size (MB). Remember not to go over the actual RAM memory on your system.
-------------------
OK, 11k out of 14k term pairs were imported into the new Memory which means a loss of approx. 3k term pairs with different translations (or identical translations but with different meta information). I am searching for another tab delimited text file to TMX converter now (which will not delete any segments). By the way all meta information were saved as notes in the TMX file but I could not find a way to display them (any solution?).
When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit | | | Selcuk Akyuz Türkiye Local time: 03:39 English to Turkish + ... TOPIC STARTER on Edit Tu menu and others | Jan 18, 2012 |
Igor Kmitowski wrote:
The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.
Clear information, thanks! My glossary structure is Source Term TAB Target Term TAB Additional fields (all separated with tabs). No pipe characters, but I don't know how does CT consider my additional fields.
CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?
Sure, they are. I am good with csv files thanks to UniCSVed.
Well, but as I have stated above in my third message, Glossary file displays additional fields provided that you do not have duplicate source terms (with identical or different translations). I want to see the additional fields for duplicate terms and therefore Glossary is not so useful for me.
You can keep the duplicate entries when converting from a tab delimited glossary to a TMX memory. Just check Keep all duplicates box in Memory | New Memory | Filter when you create the new TMX memory for the conversion.
Thanks, it worked. No data loss now!
CT may freeze when you load huge TMX or glossary files into RAM memory assigned to CafeTran. You can increase this value. See Edit | Options | Memory tab | Java memory size (MB). Remember not to go over the actual RAM memory on your system.
I have 2GB only and 1GB is assigned to Java. I assume it will not be a good idea to increase it to 2GB. Other programs may freeze.
When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit

Scroll down the list with the mouse to find the term, click on the number, click on "Edit Tu", select "Edit note" to display it, then click on X (or press Esc three times) to close the window. IMO, it is time consuming and excessive use of mouse for a program. (Generally speaking, after testing CT for 3 days, I feel many operations in CT requires use of mouse)
I still did not test the External DB function, I have to make some research before using H2, MySQL, Oracle 10g, HSQLDB 2.0 or Derby (Java DB). Use of a term list with any of these databases may be better (for speed and hopefully for GUI).
But as for the other features I tested for terminology management (Glossary, Dictionary, Memory for Terms), sorry but I am really lost in them. IMO, Dictionary is useless because there is no on-the-fly search. Memory for Terms requires too much mouse and keyboard use. Most promising one is Glossary but it needs some improvements that I have discussed in this thread. By the way I loved docking and undocking of tabs
Kind regards,
Selcuk
[Edited at 2012-01-18 03:38 GMT] | | | on Edit Tu menu and others | Jan 18, 2012 |
Is there any standard for fields in tab delimited text files? Currently, CT follows this scheme:
source TAB target|alternative target|alternative target TAB additional fields
I implemented the above based on the users' request. It seems to me that there is no common agreement on how to treat the fields. In your case the issue is with alternative targets. They are not pipe separated but set in other fields.
As for the mouse operations, all basic workflow ope... See more Is there any standard for fields in tab delimited text files? Currently, CT follows this scheme:
source TAB target|alternative target|alternative target TAB additional fields
I implemented the above based on the users' request. It seems to me that there is no common agreement on how to treat the fields. In your case the issue is with alternative targets. They are not pipe separated but set in other fields.
As for the mouse operations, all basic workflow operations have keyboard shortcuts. For example, press F2 to list the matched terms and press the term number to insert it. The same holds true for autotranslation and fuzzy matches (F1 key). Yes, to reach additional meta information such as segment/terms notes in Memory you need to use the mouse.
Igor
Selcuk Akyuz wrote:
Igor Kmitowski wrote:
The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.
Clear information, thanks! My glossary structure is Source Term TAB Target Term TAB Additional fields (all separated with tabs). No pipe characters, but I don't know how does CT consider my additional fields.
CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?
Sure, they are. I am good with csv files thanks to UniCSVed.
Well, but as I have stated above in my third message, Glossary file displays additional fields provided that you do not have duplicate source terms (with identical or different translations). I want to see the additional fields for duplicate terms and therefore Glossary is not so useful for me.
You can keep the duplicate entries when converting from a tab delimited glossary to a TMX memory. Just check Keep all duplicates box in Memory | New Memory | Filter when you create the new TMX memory for the conversion.
Thanks, it worked. No data loss now!
CT may freeze when you load huge TMX or glossary files into RAM memory assigned to CafeTran. You can increase this value. See Edit | Options | Memory tab | Java memory size (MB). Remember not to go over the actual RAM memory on your system.
I have 2GB only and 1GB is assigned to Java. I assume it will not be a good idea to increase it to 2GB. Other programs may freeze.
When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit
Scroll down the list with the mouse to find the term, click on the number, click on "Edit Tu", select "Edit note" to display it, then click on X (or press Esc three times) to close the window. IMO, it is time consuming and excessive use of mouse for a program. (Generally speaking, after testing CT for 3 days, I feel many operations in CT requires use of mouse)
I still did not test the External DB function, I have to make some research before using H2, MySQL, Oracle 10g, HSQLDB 2.0 or Derby (Java DB). Use of a term list with any of these databases may be better (for speed and hopefully for GUI).
But as for the other features I tested for terminology management (Glossary, Dictionary, Memory for Terms), sorry but I am really lost in them. IMO, Dictionary is useless because there is no on-the-fly search. Memory for Terms requires too much mouse and keyboard use. Most promising one is Glossary but it needs some improvements that I have discussed in this thread. By the way I loved docking and undocking of tabs
Kind regards,
Selcuk
[Edited at 2012-01-18 03:38 GMT] ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Additional fields in a Glossary Protemos translation business management system |
---|
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| CafeTran Espresso |
---|
You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |