A regular question on the Alteryx forums is how to unzip a file in Alteryx. This is another situation well suited to the Run Command tool. The Run Command tool is a way to run command line scripts in Alteryx, and it’s possible to unzip using the command line.
In an earlier post I covered a different Run Command example. That was how to use the Alteryx Run Command tool to publish a Tableau Data Extract to the Tableau Server. That Alteryx workflow called TabCmd from the a Run Command tool.
Unzipping a file in Alteryx using the Run Command is similar to publishing to Tableau Server.
The Run Command tool is able to call “exe” programs. Therefore any exe file type that can run in the command prompt can run in Alteryx using Run Command.
Credit to this needs to be given to jdunkerley79 who kindly supplied the solution in the Alteryx forum: http://community.alteryx.com/t5/Data-Preparation-Blending/Can-Alteryx-unzip-a-file-as-part-of-the-workflow/td-p/10604
How to unzip a file in Alteryx
Begin by installing software to unzip, such as 7-zip (http://www.7-zip.org/). This can run via the command prompt.
Using 7-zip in Alteryx
The following demonstrates how to use 7-zip in Alteryx.
To begin, use a Text Input tool to create a field FilePath. This field should be the full file path to the zip file.
Write a command to unzip
Next, use a formula tool to write the unzip command.
In the Expression write full command text to run in the command prompt. Name the Output Field “cmd”.
I strongly advise testing this in the command prompt first; it is simple to make a minor syntax error.
The unzip command calls 7 Zip (the unzipping software) and also provides the location of the file to unzip.
If you recall the location of the file to unzip is set above in the variable named FilePath.
“C:\Program Files\7-Zip\7z.exe” e “[FilePath]” (with FilePath containing the file path to the zipped file, created in Step 1)
Using only the above command text unfortunately isn’t the full answer.
This will unzip the file, however, once unzipped, the output will land in the same directory as the batch file – not necessarily where we want them.
However, the command line has a lot of flexibility, including creating new folders, meaning we can extract the files wherever we want.
Create a script to unzip AND manage the files
A combination of Text Input and Union tools enable writing of a larger script.
Remember in the command prompt we can’t refer to a folder via the UNC \\server\directory.
For this reason we use the “pushd” and “popd” commands.
This enables us to navigate up and down the current directory.
Firstly, set the command prompt to run from the folder where the .bat file will reside.
Use %~dp0 for this, which identifies the ‘current’ Alteryx Engine folder.
Select that directory as the location to create a new folder to store the extracted files.
Next, create a new folder for the extracted files. For this use the mkdir command.
In this example we name the folder “Extract”. As we are saving the extracted files to the “Extract” folder we need to move to the new folder. Do this using pushd.
Now we can run the unzip command, created in Step 2.
This will extract the unzipped files to the ‘current’ directory, which we set as the newly created Extract folder.
Therefore, the zip file contents will extract to the folder called Extract.
Once extracted, use popd to jump back up one level in the directory – back to the Alteryx Engine.
Get a list of the extracted file names
Part of the output we want is a list of the extracted file names.
Again, we can add another line to the .bat file.
This time the command is “dir /B Extract > ziplist.txt”. This command copies the contents of the folder called Extract into a file called ziplist.txt.
This file will be the output of the Run Command tool. It contains the list of all files unzipped in Alteryx.
To add these different commands to the script use Text Input tools both before and after the unzip command.
The Union tool combines the output, the commands, of these tools into one large block of text.
This text contains the commands to create the directories, unzip the file and record the zip file contents.
This text is the .bat file to execute in Run Command.
This is what each line of the script is doing:
Notice the names of the new fields from the Text Inputs and Select tool. They all share the same name, “cmd”, to enable the union to work properly.
Run the script using Run Command
We now have the full command line script. Load it into the Run Command tool.
Loading a script into the Run Command tool automatically creates and runs a .bat file. A bat file is the script in a file format.
Name and run this bat file in the Run Command tool…which is far simpler than I make it sound.
a. The union field is the only thing taken into the Run Command tool – i.e. only the script, no other fields.
b. The Write Source, prominently (and perhaps misleadingly…) called Output is where the script is ‘saved as’ a .bat file. Here we call the .bat file “zip.bat”.
c. In this example we save it to the Alteryx temp (%temp%) directory. Set file type as csv, first row doesn’t contain field names and Never Quote Output Fields
d. Next run the zip.bat file in the Run External Program – Command section.
e. In Line 6 of the script we named the output of the unzip command as ziplist.txt. This file contains the names of the extracted files, there it is the output we want from the Run Command tool. To open an output of the Run Command tool use the Read Results section. Enter the format of the extraction results file by clicking on the Input button. In this case it’s a text file so use a file type of CSV.
While creating the Run Command none of the files exist. Therefore you will be presented with errors; these errors should go when running the workflow.
Get the file paths of the extracted files
The next step is to output the full file paths of the unzipped files.
Using the Formula tool, create the file path to the location of the file containing these file paths.
If you remember we write the contents of the extracted folder and output the contents from the Run Command tool. We called the file ziplist.txt.
The zip contents is output as text from the Run Command tool in a field Alteryx names [Field_1].
This field contains the actual extracted / unzipped file names – i.e. outputfile1.txt, outputfile2.csv, etc.
Therefore our formula to create the full extracted file paths is:
[Engine.TempFilePath] + 'Extract\' + [Field_1]
This is what the Alteryx engine directory looks like once the workflow has run, with the actual file inside the Extract folder:
I created this as a macro, with the full workflow here: