Batch convert PDF to Writer

Java, C++, C#, Delphi... - Using the UNO bridges
Post Reply
johnbmatz
Posts: 5
Joined: Fri Sep 14, 2018 9:01 pm

Batch convert PDF to Writer

Post by johnbmatz »

How can I batch convert Adobe files to Open Office 4.1

 Edit: Changed subject, was Batch converting Adobe files to Open Office 
Make your post understandable by others 
-- MrProgrammer, forum moderator 
Last edited by MrProgrammer on Mon Dec 25, 2023 6:24 pm, edited 2 times in total.
Reason: Edited topic's subject
Open Office 4.1.5 Windows 10
User avatar
RoryOF
Moderator
Posts: 34791
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Batch converting Adobe files to Open Office

Post by RoryOF »

What type of Adobe files?
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
johnbmatz
Posts: 5
Joined: Fri Sep 14, 2018 9:01 pm

Re: Batch converting Adobe files to Open Office

Post by johnbmatz »

I think they are in Adobe acrobat
Open Office 4.1.5 Windows 10
johnbmatz
Posts: 5
Joined: Fri Sep 14, 2018 9:01 pm

Re: Batch converting Adobe files to Open Office

Post by johnbmatz »

Just ordinary PDF files
Open Office 4.1.5 Windows 10
User avatar
Hagar Delest
Moderator
Posts: 32857
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Batch converting Adobe files to Open Office

Post by Hagar Delest »

AOO has limited capacity for that. Don't remember if there is still a need of an extension. LibreOffice has included this kind of extension and can import documents but page by page IIRC. And the result may be weird because it won't recognize paragraphs, just bunches of text, especially if there are objects in the pages like pictures, captions and so on.
You may be quicker to redo the document by copy-paste.
LibreOffice 24.8 on Xubuntu 24.10 and 24.8 portable on Windows 10
User avatar
RoryOF
Moderator
Posts: 34791
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Batch converting Adobe files to Open Office

Post by RoryOF »

I normally put PDF files through an OCR application (usually gimagereader driving Tesseract, running on linux Xubuntu) and reformat completely, but such PDFs are in my case plain text without illustrations or tables.

There is at least one Windows application that will attempt to preserve the original format, but I've forgotten its name as I don't use Windows. An OCR application that produces hOCR output may give a reasonable XML coded output that preserves the layout. I have no experience with hOCR output from PDF.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
RoryOF
Moderator
Posts: 34791
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Batch converting Adobe files to Open Office

Post by RoryOF »

I found this online:
There is a means to convert PDF files to Word files. You can then save them as ODT, though if there is special formatting in Word, the conversion may not be exact.

Foxit has a PDF Editor, as well as a free Reader version. The Editor is often available for a free trial period after you download the free Reader. A line from their web page describes the process of PDF to Word conversion:

      1.  Open the pdf file with Foxit PDF Editor, go to Convert tab>To MS office> Word or File tab>Export>To MS Office>Word>Save As, Save As window will pop up.
I do not have the Editor version, so don't know if it will also convert directly to ODT.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
MrProgrammer
Moderator
Posts: 5100
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Batch convert PDF to Writer

Post by MrProgrammer »

johnbmatz wrote: Tue Dec 19, 2023 9:34 pm How can I batch convert Adobe files to Open Office 4.1
You can't. OpenOffice does not provide that feature.

Portable Document Format (PDF) is intended to be a final format, suitable only for viewing or printing, though it is portable and can be reliably copied to other systems for viewing or printing. Attempts to convert PDF into some other document type (text, spreadsheet, presentation, etc.) are blocked because the information necessary to do that is not present in the PDF.

If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the Subject field. Select the green checkmark icon at the same time.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
jep
Posts: 22
Joined: Wed Sep 29, 2010 4:53 pm

Re: Batch convert PDF to Writer

Post by jep »

Not Writer, but Draw!
Look for "PDF Import Extension for Apache OpenOffice 0.1.1" to import drawings (quite complicated files) text in PDF files etc, and adjust them as needed.
Yes, combined with macros (ooRexx) or manually, very handy!
You can create a macro that open PDF-files and save them to .odg
Apache OpenOffice 4.1.13 on ArcaOS 5.0.7
User avatar
RoryOF
Moderator
Posts: 34791
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Batch convert PDF to Writer

Post by RoryOF »

Be aware that the PDF Import Extension is suitable only for minor cosmetic changes to PDF files, and may also only handle smaller PDF files.

Mr Programmer has published a Perl script to extract the text from PDF files that have such text embedded in them (not necessarily _all_ PDF files).

viewtopic.php?p=410366#p410366
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
Lupp
Volunteer
Posts: 3621
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Batch convert PDF to Writer

Post by Lupp »

See also:
"Consolidate Text" in
https://wiki.documentfoundation.org/Rel ... ess_&_Draw ,
the enhancement discussion under
https://bugs.documentfoundation.org/sho ... ?id=118370
and the much older suggestions posted to
https://ask.libreoffice.org/t/pdf-to-dr ... iter/13805

Among these suggestions was a workaround by myself which I had sketched out of a mood without an intention to use it myself. And I never tried to get something like a "batch conversion" based on that. (I only mention this old post here because there seemingly were users judging from the upvotes.)

Anyway we should see clearly that all this cannot accomplish impossible tasks. We cannot convert a pdf file into a Writer file because it does not contain lots of information which would be needed for such a process. We also cannot convert a Writer file to pdf. We only can export the Writer thing to a file capable of telling a printer what should be output on paper. That's what essentially pdf is made for. If you want to get really editable pdf files you need to use a pdf editor (like Acrobat), and to accept the shortcomings of this proceeding.

In short: we can convert water to ice and back. We can not convert iron to gold. And a printer doesn't "convert" a pdf file to printed paper. It just prints. And what you may get from the print using "OCR" isn't a converted file.

Also: Don't wait for AOO to implement a feature like "Consolidate Text".
On Windows 10: LibreOffice 24.8.3 and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
Post Reply