Pages in topic:   < [1 2 3]
How to convert TMX to tab-delimited?
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Nice Oct 16, 2022

Samuel Murray wrote:

Yes, so you have to first replace all whitespace characters (except spaces, duh) with replacement characters.
...
= horizontal tab


Thank you for reminding me of that one! I'll add a rule to the TextFactory.


Or just replace \n with ① and replace \t with ② throughout the file -- no need to restrict it to segments, for since you're not going to use the TMX file after this


I'll use the tab-delimited file for several purposes. One of them is ... creating a cleaned and smaller TMX. That TextFactory will be pretty straightforward.

BTW:
BBEdit introduces the Text Factory, which allows you to assemble a list of text transformations that will be applied in order to either the current document or selection (when invoked as a filter), or to a specified list of files and folders (when invoked via the Scripts menu).


Screen Shot 2022-10-16 at 07.54.11


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Solution for Mac Oct 27, 2022

For a solution for Mac, see: https://www.proz.com/post/2975518#2975518

 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
Why not use CafeTran Espresso? Oct 27, 2022

Could you simply use CafeTran Espresso for that conversion?

1. Create or open a project with the required language pair.
2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled
3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab)
4. Memory menu > Export > Export segments to glossary.
... See more
Could you simply use CafeTran Espresso for that conversion?

1. Create or open a project with the required language pair.
2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled
3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab)
4. Memory menu > Export > Export segments to glossary. A dialog will ask you to select which memory to import segments from. And if the currently selected/opened tab is not a glossary, it will first ask you to select one.

That's it.

CafeTran also includes some TM Filter options, including one called "Clean and replace foreign codes": Some TMX files from third-party tools have unusual codes in the segments such as codes inside the curly brackets or emdash, endash, tab code. CafeTran clears or replaces them with equivalent unicode characters.

https://github.com/idimitriadis0/TheCafeTranFiles/wiki/3-TM-options#tm-filter-options

If needed, prior TMX editing (including search and replace, with or without regular expressions) can also be done from within CafeTran.

[Edited at 2022-10-27 05:50 GMT]
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Too slow Oct 27, 2022

Jean Dimitriadis wrote:

Could you simply use CafeTran Espresso for that conversion?

1. Create or open a project with the required language pair.
2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled
3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab)
4. Memory menu > Export > Export segments to glossary. A dialog will ask you to select which memory to import segments from. And if the currently selected/opened tab is not a glossary, it will first ask you to select one.

That's it.

CafeTran also includes some TM Filter options, including one called "Clean and replace foreign codes": Some TMX files from third-party tools have unusual codes in the segments such as codes inside the curly brackets or emdash, endash, tab code. CafeTran clears or replaces them with equivalent unicode characters.

https://github.com/idimitriadis0/TheCafeTranFiles/wiki/3-TM-options#tm-filter-options

If needed, prior TMX editing (including search and replace, with or without regular expressions) can also be done from within CafeTran.

[Edited at 2022-10-27 05:50 GMT]


I am familiar with this procedure. However, it is extremely slow.

Screen Shot 2022-10-27 at 09.17.09

This takes ages.

Besides that, I like to have an alternative solution that I can use as a framework and possibly integrate in my workflows.



[Edited at 2022-10-27 07:18 GMT]


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 02:23
Member (2014)
Japanese to English
Huh Oct 27, 2022

Stepan Konev wrote:
If that MacOS text editor can mark the match, you can use the following regex:
to mark and then copy all segments to clipboard

Although I've only tried it on one file, this typically clever solution from Stepan seems to work well in Notepad++ here - much appreciated. Given that we already have the regex, it looks like an obvious choice for a tiny script in the programming language of one's choice (probably just from the command line in Perl!). I have never actually needed to convert TMX to tab-delimited, but it's nice to know that it's possible. Thanks to Hans and other contributors for the topic.

Dan


 
Pages in topic:   < [1 2 3]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to convert TMX to tab-delimited?







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »