Use Subfile Information

File Content Extraction can provide subfile information in two ways:

  • Before extraction. In this case, File Content Extraction provides information that it can retrieve quickly. For example, it might provide an estimate of subfile size if the container does not store the uncompressed size.

  • After extraction. In this case, File Content Extraction includes additional information that it obtains during extraction, which can be more accurate.

For example, in some file formats, subfiles are embedded in an OLE wrapper inside the parent container. Some information about the subfile is contained only inside this OLE wrapper, such as its original file name and whether it is an external file. Parsing the wrapper to get this information can be time-consuming compared to getting the information that is stored in the parent container. File Content Extraction therefore parses this wrapper only when it extracts the subfile, not when it retrieves subfile info. In these cases, you can only see the actual name of the subfile after extraction.

Retrieving information before extraction is faster, so this information is useful in workflows where you can use fast estimates. For example, you might use the pre-extraction file size to avoid extracting subfiles larger than 1GB, or to flag any container with a subfile with a certain name.

If you need to rely on accurate information, you must use the structure returned by the extraction operation, by setting the allExternalSubfiles argument to True.