Process Subfiles

To filter all files in a container file, you can open the container as a Document object, then open its subfiles as Document objects. After you open a subfile as a Document, you can call its methods to filter the data.

You can iterate over subfile objects by calling the subfiles method on a document. Each element returned by the iterator contains information about the subfile, and a method to open it as a document.

In the following example, we open each subfile as a document, and filter some text from it.

Copy

for (const auto& subfile : doc.subfiles())
{
    auto [child, info] = subfile.open();
    if (child)
    {
        child->filter(output);
    }
}

Extract Subfiles

In some cases, you might need to access the subfiles directly, for example to archive the subfiles or process them using a different tool. In this case, you can open the container as a Document object, then open its subfiles as Document objects. After you open a subfile as a Document, you can call its methods to filter the data.

You can iterate over subfile information by calling the subfiles method on a document. Each element returned by the iterator contains information about the subfile, and a method to extract it:

Copy

auto myinput = keyview::io::InputFile{ std::string("InputFile.zip") };
auto doc = session.open(myinput);

for (const auto& subfile : doc.subfiles())
{
    if (!subfile.is_folder())
    {
        auto myoutput = keyview::io::OutputFile{ generateOutputFilePath() };
        subfile.extract(myoutput);
    }
}

In this example, generateOutputFilePath() is a function that returns the name you want to use for the extracted subfile. If the name of the subfile does not matter (for example, the subfile will being passed into File Content Extraction for further processing) you could use a unique identifier like a GUID. If you instead choose to base the filename on subfile.rawname() - the path the container file provides - you should ensure you protect against directory traversal attacks (where the name of the subfile contains a relative or absolute path).

NOTE: This very simple example does not account for folders within container files. For a more complete example, see the extract sample program.

NOTE: The subfiles method returns an instance of the keyview::Container class, defined in Keyview_Container.hpp. This provides access to information about the container, and access to each subfile. The container maintains a reference to the input file, and so cannot be used after the input file has been destroyed.

NOTE: Some options change the order in which subfiles are retrieved, such as enabling the root node. However, for each combination of options, the subfile order is consistent across multiple runs.