A regular question on the Alteryx forums is how to unzip a file in Alteryx. This is another situation well suited to the Run Command tool.
In an earlier post I covered a different Run Command example. That was how to use the Alteryx Run Command tool to publish a Tableau Data Extract to the Tableau Server in an Alteryx workflow using TabCmd.
Unzipping a file in Alteryx using the Run Command is similar to publishing to Tableau Server. The Run Command tool is able to call “exe” programs. Therefore any exe file type that can run in the command prompt can run in Alteryx using Run Command.
Credit to this needs to be given to jdunkerley79 who kindly supplied the solution in the Alteryx forum: http://community.alteryx.com/t5/Data-Preparation-Blending/Can-Alteryx-unzip-a-file-as-part-of-the-workflow/td-p/10604
How to unzip a file in Alteryx
Begin by installing software to unzip, such as 7-zip (http://www.7-zip.org/). This can run via the command prompt.
The following guides how to use 7-zip in Alteryx:
1. Use a Text Input tool to create a field FilePath. This field should be the full file path to the zip file.
2. Next use a formula tool to write the unzip command. In the Expression write full command text to run in the command prompt. Name the field “cmd”. I strongly advise testing this in the command prompt first; it is simple to make a minor syntax error.
The unzip command calls 7 Zip (the unzipping software) and also provides the location of the file to unzip. If you recall the location of the file to unzip is set above in the variable named FilePath.
“C:\Program Files\7-Zip\7z.exe” e “[FilePath]” (with FilePath containing the file path to the zipped file, created in Step 1)
3. Using only the above command text unfortunately isn’t the full answer. This will unzip the file, however, once unzipped, the output will land in the same directory as the batch file – not necessarily where we want them. However the command line has a lot of flexibility, including creating new folders, meaning we can extract the files wherever we want.
Create a script to both unzip then manage the files
A combination of Text Input and Union tools enable writing of a larger script. Remember in the command prompt we can’t refer to a folder via the UNC \\server\directory. For this reason we use the “pushd” and “popd” commands to move up and down the current directory.
Firstly set the command prompt to run from the folder where the .bat file will reside. Use %~dp0 for this, which identifies the ‘current’ Alteryx Engine folder. Select that directory as the location to create a new folder to store the extracted files.
Next create a new folder for the extracted files. For this use the mkdir command. In this example we name the folder “Extract”. As we will save the extracted files to this location we next need to set the focus to the new folder. Do this using pushd again.
Now the unzip command, created in Step 2, can run. This will extract the unzipped files to the ‘current’ directory, which we have set as the new Extract folder. Therefore the zip file contents will extract to the folder called Extract. Once extracted use popd to jump back up one level in the directory – back to the Alteryx Engine.
Get a list of the extracted file names
As part of the output we want a list of the extracted file names. Again we can add another line to the .bat file. This time the command is “dir /B Extract > ziplist.txt”. This command copies the contents of the folder called Extract into a file called ziplist.txt. This file will be the output of the Run Command tool. It contains the list of all files unzipped in Alteryx.
To add these different commands to the script use Text Input tools both before and after the unzip command. The Union tool combines the output, the commands, of these tools into one large block of text. This text contains the commands to create the directories, unzip the file and record the zip file contents. This text is the .bat file to execute in Run Command.
This is what each line of the script is doing:
4. Notice the names of the new fields from the Text Inputs and Select tool. They all share the same name, “cmd”, to enable the union to work properly.
Run the script using Run Command
5. We now have the full command line script. Next load it into the Run Command tool. Loading a script into the Run Command tool automatically creates and runs a .bat file, which is the script in a file format. This file requires naming and subsequently running in the Run Command tool…which is far simpler than I make it sound.
a. The union field is the only thing taken into the Run Command tool – i.e. only the script, no other fields.
b. The Write Source, prominently (and perhaps misleadingly…) called Output is where the script is ‘saved as’ a .bat file. Here we call the .bat file “zip.bat”.
c. In this example we save it to the Alteryx temp (%temp%) directory. Set file type as csv, first row doesn’t contain field names and Never Quote Output Fields
d. Next run the zip.bat file in the Run External Program – Command section.
e. In Line 6 of the script we named the output of the unzip command as ziplist.txt. This file contains the names of the extracted files, there it is the output we want from the Run Command tool. To open an output of the Run Command tool use the Read Results section. Enter the format of the extraction results file by clicking on the Input button. In this case it’s a text file so use a file type of CSV.
While creating the Run Command none of the files exist. Therefore you will be presented with errors; these errors should go when running the workflow.
Get the file paths of the extracted files
6. The next step is to output the full file paths of the unzipped files. Use the Formula tool to create the file path to the location of the file contaning this information. If you remember we write the contents of the extracted folder and output the contents from the Run Command tool. We called the file ziplist.txt. This is output as text from the Run Command tool in a field Alteryx names [Field_1]. This contains the actual extracted / unzipped file names – i.e. outputfile1.txt, outputfile2.csv, etc. Therefore our formula to create the full extracted file paths is:
[Engine.TempFilePath] + ‘Extract\’ + [Field_1]
I created this as a macro, with the full workflow here.