Recovering Corrupted .docx Files – closingtags </>
Categories
Programming Security

Recovering Corrupted .docx Files

Did you know that files with the extension .docx are really just .zip packages?

#LibreOffice #Word #Office

Did you know that files with the extension .docx are really just .zip packages? I knew that’s essentially what .apk files (Android Package) were but I never really made the connection that this could apply to other common file types as well.

But I recently came across an issue where a document I had spent a significant amount of time on was corrupted. No matter, right? Just grab the backup. Unfortunately, I had overwritten my backed up copy with the corrupted version of the file.

🤦

If I couldn’t find a way to recover it, I would have had to reproduce my changes all over again, effectively doubling my work.

The error I was shown when attempting to open my .docx file in LibreOffice. When selecting No, LibreOffice Writer would still attempt to open but crash with another error. Selecting Yes will recover the document up to the point where the error was detected.
Selecting “No” from the previous error prompt shows this error message.

Fortunately, I am very good at finding information on the Internet and so after a brief search, I came across this thread in the LibreOffice forums:

Take a copy of the ODT and open the copy using an archive manager (rename the file to a .zip extension if necessary). Extract the content.xml file. Open this file with a suitable XML editor. It will be likely that row 2 contains a very long line of XML. Scroll to character 18067. It is likely at this point there will be an obvious mistake in the XML. It may be minor or major. Once you fix it, simple reinsert the fixed version of content.xml into the ODT copy, again using an archive manager. Try and open the amended ODT using LO.

owenghttps://ask.libreoffice.org/t/how-do-i-fix-a-libreoffice-document-that-is-corrupted/17399/2

If .odt files are packages of XML files, it seems likely that .docx files are as well, right? Turns out, they are! It is possible to simply change the extension of the document to .zip. From that point, the archived files can be opened and browsed like any normal directory and I can use the information provided in previous error prompts to locate the troublesome XML. The error prompt showed that issue was located in the file word/document.xml on line 2, column 19055. Sure enough, when going there, I found the following:

<w:rPr><w:del w:id="9" w:author="Dylan Hildenbrand" w:date="2023-02-13T14:15:30Z"></w:del></w:ins></w:rPr>

Removing the extra closing tag </w:ins>, saving the file, and renaming the .zip back to a .docx resolved this issue for me. My file was recovered and precious hours were not wasted. And yes dear reader, the irony of finding an extra closing tag that could have cost me several hours is not lost on me; the operator of a website titled closingtags.com.

By Dylan Hildenbrand

Author and full stack web developer experienced with #PHP, #SvelteKit, #JS, #NodeJS, #Linux, #WordPress, and #Ansible. Check out my book at sveltekitbook.dev!

Do you like these posts? Consider sponsoring me on GitHub!

9 replies on “Recovering Corrupted .docx Files”

Yes! The triumph of recovering data, and time, must have been fantastic.

Very nice of LibreOffice to offer such a specific error message. I wonder if MS Word would provide the same?

Great post, man.

It was a feeling of great success. I had to share it in case anyone else came across a similar problem.

As for Office, I have no clue what it offers these days. Switched to LibreOffice and haven’t looked back.

Thanks for the comment!

The corruption occurred Sep 1 2:04am. It was a case of lack of storage space (full phone storage) which caused the Ms word applications I use to work not save my doc automatically after I edited a few things from the doc, prompting the msg “file cannot be saved” (Ms word is set to auto-save). I manually tried to save the file to my sd card as phone storage was full. It was at this the ms word forcefully closed. When I cleared my phone memory to return to my Doc , my Doc couldn’t be opened again by the ms word, prompting the msg ” some content in your file are not readable. Fix if you trust the source. I tried it but i received an error command ” file cant be opened”.

Type of device
Phone

OS
Android 11

Interesting. I would recommend copying the corrupted file to a computer and attempting to open it with LibreOffice just as I mentioned in the article. From there, it may be able to automatically recover the file for you. But if not, you can read the error message, change the file extension to .zip, open the folder, and correct the error as shown in the message.

If you don’t mind can I send the file to you pls? Am not a programmer and the attempt I made though not with libreoffice i can’t even understand what I was doing. The libreoffice I have I was asked to purchase which for now am not in the capacity to do it. If you don’t mind, pls can u drop am email address I could send the file to. It’s a plea. Thanks so much

If you don’t mind can I send the file to you pls? Am not a programmer and the attempt I made though not with libreoffice i can’t even understand what I was doing. The libreoffice I have I was asked to purchase which for now am not in the capacity to do it. pls can u drop an email address I could send the file to. It’s a plea. Thanks so much

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.