[Solved] Batch search / replace hyperlinks
[Solved] Batch search / replace hyperlinks
I have 150 Writer documents that each has lots of hyperlinks in the text.
#1: Now I need to replace all hyperlinks (e.g. http://www.oldsite.com/112.pdf to http://www.newsite.com/112.pdf). Is there a way to perform this in a batch operation?
#2: Afterwards all Writer need to be converted to PDF's. Is this also possible to do in one fell swoop?
#1 is the most important issue
Thanks for any suggestions!
#1: Now I need to replace all hyperlinks (e.g. http://www.oldsite.com/112.pdf to http://www.newsite.com/112.pdf). Is there a way to perform this in a batch operation?
#2: Afterwards all Writer need to be converted to PDF's. Is this also possible to do in one fell swoop?
#1 is the most important issue
Thanks for any suggestions!
Last edited by Hagar Delest on Sun Apr 26, 2009 3:56 pm, edited 1 time in total.
Reason: tagged [Solved].
Reason: tagged [Solved].
OOo 3.0.X on Ubuntu 8.x + XP
-
- Posts: 2
- Joined: Thu Feb 12, 2009 5:00 pm
Re: Batch search / replace [Hacking Open Document Files]
Well, I have a method, but you may not like it! Hopefully, someone else will have written a macroevking wrote:I have 150 Writer documents that each has lots of hyperlinks in the text.
#1: Now i need to replace all hyperlinks (e.g. http://www.oldsite.com/112.pdf to http://www.newsite.com/112.pdf). Is there a way to perform this in a batch operation?
that can do this for you, otherwise....
This is a general approach that I use to overcome various shortcomings in Ooo. Your level of expertise with your operating system tools & utitlities will determine how "automated" the process will be. If you are unable to "script-up" some or all of these operations, then this method will almost certainly NOT be quicker than editing the files manually!
My method relies on the fact that the OOo file format is actually a .zip file full of .xml (and other) files which contain the documents text, style information, pictures and other stuff.
In overview, my method is:
1. Unzip the .odt file(s)
2. Edit the .xml (text) files directly using a decent text editor or other tools that can do global change and replace across multiple files e.g. the excellent and free CONtext editor under Windows, or command line text editing tools like sed, awk, or perl scripts (*nix/cross-platform)
3. Zip 'em back up again.
Now in more detail - I'm assuming Windows (although the technique translates to *nix etc easily)
Also, I'd definitely experiment with a single .ODT file first to get the hang of it.
1. Make a backup-up of your files - Of course, you always keep a proper backup
anyway, don't you?
2. Rename your (e.g.) "Document.odt" file to be (e.g.) "Document.zip"
(This is not scrictly necessary, particularly if you use a command line unzipper
like info-zip's unzip.exe, however many windows GUI unzippers won't easily unzip
the .odt files unless you rename them).
3. Unzip (e.g) Document.zip into a directory/folder called (e.g) "Document"
The built-in windows shell zip folder support works well for this because it creates the (e.g.) "Document" "root" folder for you. You want to end up with a folder/directory structure something like this:
Document\ <- The directory name corresponds to your "Document" name,
Document\Configurations2
Document\content.xml <- This file contains your document's text
Document\layout-cache
Document\META-INF
Document\meta.xml
Document\mimetype
Document\Pictures <- This directory/folder contains any images in your doc (e.g. 3 below)
Document\Pictures\1000000000000060000000509E4D9BDA.jpg
Document\Pictures\10000000000000640000007FACEF61E3.jpg
Document\Pictures\1000000000000082000000825D82B018.gif
Document\settings.xml
Document\styles.xml <- This file contains your style information
Document\Thumbnails
Document\Thumbnails\thumbnail.png <- This is the document's thumbnail image
Document\Configurations2\accelerator
Document\Configurations2\floater
Document\Configurations2\images
Document\Configurations2\menubar
Document\Configurations2\popupmenu
Document\Configurations2\progressbar
Document\Configurations2\statusbar
Document\Configurations2\toolbar
Document\Configurations2\accelerator\current.xml
Document\Configurations2\images\Bitmaps
Document\META-INF\manifest.xml
4. Delete or rename Document.zip (we're gonna recreate it in a minute)
5. Now you can edit the text in (all your) "content.xml" file(s)
Use a text editor or tools of your choice. Notepad will do for a single file, but can't do a global find & replace across multiple files, however all serious editors can.
(You can also easily and accurately edit styles and colours etc, by editing the text in the "styles.xml" files).
6. Zip up the "Document" directory/folder to recreate Document.zip.
YOU CANNOT USE WINDOWS' IN-BUILT ZIP FOLDER SUPPORT FOR THIS
Neither can you use 7-zip. OOo will NOT be able to understand the file! I have absolutely no idea why not!
I use infozip's command line zip.exe (which works, is free, and I can supply)
e.g. Run up a windows "Command Prompt" (cmd.exe)
cd to the (e.g.) "Document" directory/folder
zip ..\Document *.*
That will recreate the Document.zip file in the parent directory/folder
7. Rename the new Document.zip back to Document.odt and open the file with OOo to check it's all ok!
Now you're a fledgling Open Document File format hacker!
I find this technique particularly useful for changing and matching colours (by editing the hex RGB
values directly) and making fine adjustments to table and picture dimensions to match-up
accurately.
HTH
Regards
Luc
OOo 2.4.X on Ms Windows XP
Re: Batch search / replace
Luc, that was an amazing answer! Really kind of you to make such a write up to my question
I understand the method, and your article should probably be saved as a great crash course to the ODT file format.
It should not be too difficult to make a script to do it, but as you say someone may already have done it - an ODT search/replace batch machine....
Perhaps I should still let the subject stand open for a while; maybe someone will add to it.
Again, thank you for your explanation. Really nice of you!
I understand the method, and your article should probably be saved as a great crash course to the ODT file format.
It should not be too difficult to make a script to do it, but as you say someone may already have done it - an ODT search/replace batch machine....
Perhaps I should still let the subject stand open for a while; maybe someone will add to it.
Again, thank you for your explanation. Really nice of you!
OOo 3.0.X on Ubuntu 8.x + XP
-
- Posts: 2
- Joined: Thu Feb 12, 2009 5:00 pm
Re: Batch search / replace
No problem at all - it's nice to be appreciatedevking wrote:Luc, that was an amazing answer! Really kind of you to make such a write up to my question
I would - my method is a little "non-optimal"!Perhaps I should still let the subject stand open for a while; maybe someone will add to it.
Do you know any perl? I'm afraid I don't otherwise I'd be tempted to have a go (my scripting expertise is mainly confined to IBM's old REXX); there seem to be perl tools that would allow you to script this, but as before, I think it is likely to be more work to develop the scripts than to tediously edit each file!
If you register, you can vote for this enhancement at: or possibly: but I wouldn't hold your breath!
Check out Document Converter (about 1/3 of the way down the page)#2: Afterwards all Writer need to be converted to PDF's. Is this also possible to do in one fell swoop?
- Document Converter
Author: Danny Brewer / Dan Horwood
DocConverter is a utility to convert a batch of documents from any supported OOo format into any other supported OOo format. It could, for example, be used to convert a batch of OOo Writer documents into PDFs. Simple to use, with an interface similar to OOo's AutoPilots.
Latest release: Version 2.0 (June 10, 2006)
Luc
OOo 2.4.X on Ms Windows XP
Re: Batch search / replace
If you're on Unix/Linux, this is quite easy to script, using only the standard command-line tools.
Just yell if you could use something like that.
Just yell if you could use something like that.
AOO4/LO5 • Linux • Fedora 23
Re: Batch search / replace
Of course I'm using Linux, acknak
A batch search replace script would be great, and certainly useful for lots of people so please consider yourself yelled at
A batch search replace script would be great, and certainly useful for lots of people so please consider yourself yelled at
OOo 3.0.X on Ubuntu 8.x + XP
Re: Batch search / replace
Quick and dirty; no warranty; use at your own risk, and all that...
Especially beware, this will change any text in the file, including the document's xml encoding. You can seriously ruin a document with this if you don't know what you're doing. Be sure to check that the modified documents will still open before you toss the backup copies.
However, with something like a URL, which would never be part of the xml syntax--it's always going to be data, and unique--it should be ok.
So, for your example, you can do
$ fnr http://www.oldsite.com http://www.newsite.com *.odt
and get away with it.
Something like
$ fnr text context *.odt
would be certain death to those documents. That's why the script makes a backup copy of each file.
Code: Select all
#!/bin/sh
usage="usage: fnr find replace files..."
find="$1"
replace="$2"
if [ -n "$find" -a -n "$replace" ]
then
shift; shift
else
echo "missing argument"
echo "$usage"
exit 2
fi
# exit immediately if something fails
set -e
for f in "$@"
do
# keep a copy of the original file
cp -p "$f" "$f.bak"
# extract content.xml and make the changes
unzip -p "$f" content.xml | sed -e "s$find$replaceg" > content.xml
# update the document archive
zip "$f" content.xml
rm -f content.xml
done
Edit: PS: |
However, with something like a URL, which would never be part of the xml syntax--it's always going to be data, and unique--it should be ok.
So, for your example, you can do
$ fnr http://www.oldsite.com http://www.newsite.com *.odt
and get away with it.
Something like
$ fnr text context *.odt
would be certain death to those documents. That's why the script makes a backup copy of each file.
AOO4/LO5 • Linux • Fedora 23
Re: Batch search / replace
Acknak and Luc, you are true gentlemen!
All documents are modified successfully, both the search/replace and conversion to PDF.
The SED command did not show correctly on my screen so I had to google a bit. (The "#" characters displays wrongly)
Thanks again, you've saved me hours of tedious editing.
All documents are modified successfully, both the search/replace and conversion to PDF.
The SED command did not show correctly on my screen so I had to google a bit. (The "#" characters displays wrongly)
Thanks again, you've saved me hours of tedious editing.
OOo 3.0.X on Ubuntu 8.x + XP
Re: Batch search / replace
Good!
The reason to use such a strange character is to avoid any potential for conflicts with the $find and $replace strings. You can use any character, so I picked something that's very unlikely to appear in the strings.
I should have mentioned: those characters are ctrl-v, and they may look a little strange, but will be handled correctly by the shell, and any decent text editor.The SED command did not show correctly on my screen so I had to google a bit. (The "#" characters displays wrongly)
The reason to use such a strange character is to avoid any potential for conflicts with the $find and $replace strings. You can use any character, so I picked something that's very unlikely to appear in the strings.
AOO4/LO5 • Linux • Fedora 23
Re: [Solved] Batch search / replace hyperlinks
I know this is old news by now, but I just found this method, which I was missing for a while. I have been able to update hundreds of links in a few minutes.
Luc, this is great, thank you so much!
Luc, this is great, thank you so much!
OpenOffice 3.3.0 on Windows 7
Re: [Solved] Batch search / replace hyperlinks
This post is the gift that keeps giving. I was trying to work out on my own how to hack up a script and ran across this one by acknak worked great for me. For anyone interested, I dislike having backups mixed into a working folder. So I create a backup folder when declaring variables like this:
then the line saving the backups looks like this
As always, test any additions to your scripts because . . no warranties!
Cheers
Code: Select all
OUT_DIR=backup
# testing output directory exist. if not create it.
if [ ! -d ${OUT_DIR} ] ; then
mkdir ${OUT_DIR}
fi
Code: Select all
# keep a copy of the original file
cp -p "$f" ${OUT_DIR}/"$f.bak"
Cheers
Libre Office 5.1.6.2 Ubuntu 16.04