How to handle large XML file Thread poster: Peter Sass
| Peter Sass Germany Local time: 19:52 English to German + ...
Hi there,
From a client I've received a single large XML file (700,000 words according to Trados Studio) containing a whole website.
For sure, this must be split up using a XML split programme.
1) Should the splitting be done on the client side preferably, as to make sure they can piece it together again from the translation files OR could I do this just as well?
2) Which XML split programme (preferable Freeware or Shareware) would you recommend? ... See more Hi there,
From a client I've received a single large XML file (700,000 words according to Trados Studio) containing a whole website.
For sure, this must be split up using a XML split programme.
1) Should the splitting be done on the client side preferably, as to make sure they can piece it together again from the translation files OR could I do this just as well?
2) Which XML split programme (preferable Freeware or Shareware) would you recommend?
3) Is there anyother way to 'shrink' the file in some way?
Thanks for your advice in advance! ▲ Collapse | | | Samuel Murray Netherlands Local time: 19:52 Member (2006) English to Afrikaans + ... Post in the Trados forum | Jan 22, 2015 |
Peter Sass wrote:
From a client I've received a single large XML file (700,000 words according to Trados Studio) containing a whole website. For sure, this must be split up using a XML split programme.
If your CAT tool can handle it, why do you need to split it? I suggest you post this question also in the Trados forum.
That said, if it was me, I would try to split it by section or by page, since it is a web site with (presumably) separate pages. Sorry, I know of no XML splitter (yet... as I would have been googling like crazy and installing a whole range of programs just to try it out).
Are you sure about the word count? | | |
Sam, most of the today's websites are databases, so it is quite difficult to split a solid massive of data. However, any database can be exported into a flat file (ttx, xml). The topic starter looks like having this exported content at hand. I think any CAT-tool can handle it today (all you need is a fast PC, like i5 or i7 with SSD and a lot of RAM, which is a must today to avoid latency, as the TMs are quite huge). It will take time, the PC will look halted, but it will do it (you can take coff... See more Sam, most of the today's websites are databases, so it is quite difficult to split a solid massive of data. However, any database can be exported into a flat file (ttx, xml). The topic starter looks like having this exported content at hand. I think any CAT-tool can handle it today (all you need is a fast PC, like i5 or i7 with SSD and a lot of RAM, which is a must today to avoid latency, as the TMs are quite huge). It will take time, the PC will look halted, but it will do it (you can take coffee or walk with the dog in the meanwhile). Then, in the CAT-tool, it will turn into a database again, and will work much faster, than the source flat file. Also, it can be split into smaller CAT-files (there is a corresponding tool for SDL Studio).
[Редактировалось 2015-01-22 21:21 GMT] ▲ Collapse | | |
Samuel Murray wrote:
That said, if it was me, I would try to split it by section or by page, since it is a web site with (presumably) separate pages. Sorry, I know of no XML splitter (yet... as I would have been googling like crazy and installing a whole range of programs just to try it out).
Are you sure about the word count?
XML is a TEXT file, it can be split into smaller text files even using DOS command. And then merged back into one file afterwards.
As to the word count, it can be smaller in the end. I would not judge before I have the file at hand. | |
|
|
Samuel Murray Netherlands Local time: 19:52 Member (2006) English to Afrikaans + ... DOS command won't split XML smartly | Jan 22, 2015 |
Sergei Leshchinsky wrote:
Samuel Murray wrote:
Sorry, I know of no XML splitter...
XML is a TEXT file, it can be split into smaller text files even using DOS command. And then merged back into one file afterwards.
No, a DOS command might split a piece of translatable text right down the middle (in fact, the DOS commands that I know will happily split a word in two). Or it might split a tag in two, which would cause the CAT tool to misinterpret the tag (or worse: try to fix it). And even if it doesn't split a segment or a tag in two, it might not split nested tags cleanly, which may also affect the way the CAT tool interprets the XML. | | | Peter Sass Germany Local time: 19:52 English to German + ... TOPIC STARTER Thanks so far | Jan 23, 2015 |
..for all your comments!
Actually, the problem is that Trados Studio cannot process the file properly because it is simply too big (and yes I do have a proper PC with i5 processor + 8 GB RAM).
From previous website translations I recollect that there would normally be a set of separate translation files that followed the structure of the website.
As far as I delved into the matter now, one needs a proper XML split programme to preserve this structure (header tags etc.),... See more ..for all your comments!
Actually, the problem is that Trados Studio cannot process the file properly because it is simply too big (and yes I do have a proper PC with i5 processor + 8 GB RAM).
From previous website translations I recollect that there would normally be a set of separate translation files that followed the structure of the website.
As far as I delved into the matter now, one needs a proper XML split programme to preserve this structure (header tags etc.), so I couldn't just split the XML file in a text editor.
I'll see what the client thinks.. ▲ Collapse | | | |
I'd be wary about splitting a huge XML with a random tool off the internet. After you stitch it back together at the end (I presume you plan to do that), it may not be exactly the same as before. The client's software might complain about it. Perhaps you could ask the client to export the site in several reasonably-sized chunks, and note that the alternative option is for you to use xml splitter XXX. | |
|
|
| To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to handle large XML file Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |