[Tex/LaTex] Arara rule to delete files with wildcard

arara

How can I define a Arara rule to delete with wildcard? The following should delete all log files in the folder of the LaTeX file.

%!TEX TS-program = Arara
% arara: pdflatex:  { synctex: yes }
% arara: cleanwildcard: { files: [*.log]}

\documentclass{minimal}

\begin{document}

Hello World!

\end{document}

Best Answer

Quack

Lasciate ogne speranza, voi ch'intrate.

(La Divina Commedia, Dante Alighieri)

Time for one the most insane answers I ever came up with. :) You are right that arara does not allow wildcards, and the reason for this is a restriction of the underlying execution library. Wildcard usage might be understood as some sort of subshell expansion and there is no way to (directly) ask arara and its inner mechanisms to make a command like rm *.log work. Personally, I'd favour a more indulgent execution layer, but so far, the current execution library (namely Apache Commons Exec) works in the general case. There is actually a bug in the library when trying to parse command arguments with spaces (Nicola and I tracked the error down and found out it was an already well-known bug - I'm watching their bugtrackers very closely), but the library itself works fine in most of the cases. In the future, I might write my own execution layer, but time is problematic right now. :)

The arara userbase has been growing way out of my expectations and people are using the tool for tasks I've never imagined. For example, I never thought I'd need some sort of FileSystem helper methods, but apparently this would be a nice addition for rulewriters.

Now, back to your question. :)

Short answer: No, it's not possible to ask arara to perform wildcard expansion in the task execution context.

Long answer: As mentioned in the prologue, the underlying execution library does not allow this since it's a form of subshell expansion, and subshells (roughly speaking, calls inside calls) are definitely forbidden.

Insane answer: I can make it work. The most sneaky way ever possible to mankind. Here's how. Beware, the following lines may get really complicated.

In the rule context, we have orb tags that allow an underlying expression language to be interpreted. What I can do is exploiting this feature in order to run an arbitrary method from the Java API that does the task I want. The problem is, the expression language has a lot of restrictions, and I cannot inject code the way I want.

A way to deal with these restrictions is to provide a method chain that returns a set of valid elements in evaluation context. More precisely, arara makes use of some interesting libraries, and I can use their methods if I know their full namespaces.

Sadly, there is a big problem: arara does not have an IO library (at least, version 3.0 from CTAN). Then we need to make some magic happen.

Disclaimer: in order to make this answer work, we cannot use the CTAN (or installer) version alone, so some sort of new batch command in order to wrap the full line is required (at least, for convenience). I'll do things locally here.

I will use the Apache Commons IO library for this trick. The link will resolve to a file named commons-io-2.4-bin.zip. We just need one file from the whole pack: commons-io-2.4.jar

Another way of running arara instead of the usual

$ java -jar arara

is

$ java -cp arara.jar com.github.arara.Arara

in which we provide the main class of the application based on classpath lookup. Now, I will make Commons IO part of the application classpath by doing this:

$ java -cp commons-io-2.4.jar:arara.jar com.github.arara.Arara

If I recall correctly, the path separator in Linux is :, but in Windows, it's ;, so your mileage may vary. Note that I'm considering both .jar files to be in the same directory; it might be a good idea to write the full path or, if you prefer, you can add both to the CLASSPATH variable in your user/system environment variables, then the lookup becomes easier. Anyway.

Now that I injected an IO library, let's write a rule:

!config
identifier: cleanpattern
name: CleanPattern
command: <arara> @{remove} @{pattern}
arguments:
- identifier: remove
  default: <arara> @{isWindows("cmd /c del", "rm -f")}
- identifier: pattern
  flag: "@{'\"'.concat(org.apache.commons.lang3.StringUtils.join(org.apache.commons.io.FileUtils.listFiles(new java.io.File(\".\"), new org.apache.commons.io.filefilter.WildcardFileFilter(parameters.pattern), org.apache.commons.io.filefilter.FalseFileFilter.INSTANCE), \"\\\" \\\"\")).concat('\"').replaceAll(\"\\\"\\\\.\".concat(java.io.File.separator), \"\\\"\")}"

Beautiful, isn't it? :) And it's untested, woohoo! I didn't try it on Windows, so let's see if things get really bad pretty soon. :P

This rule injects Java code inside the expression language context, and it does:

  1. list all files in the current directory that match a certain pattern (provided by parameters.pattern which you'll set in your directive)
  2. join the list of filenames as a big string separated by " ".
  3. apply regex to remove leading dots and separators (OS stuff).
  4. return a list of files matching the provided pattern in the form "a" "b" "c".

Long story short: I did the wildcard expansion instead of asking arara to do it (it wouldn't do it anyway). Quite scary and error-prone, but hey, it's funny! :)

Time to test it! :)

paulo@alexandria sandbox$ touch aaa.xml bbb.xml ccc.xml
paulo@alexandria sandbox$ cat test.tex 
% arara: cleanpattern: { pattern: '*.xml' }

paulo@alexandria sandbox$ java -cp commons-io-2.4.jar:arara.jar com.github.arara.Arara test.tex 
  __ _ _ __ __ _ _ __ __ _ 
 / _` | '__/ _` | '__/ _` |
| (_| | | | (_| | | | (_| |
 \__,_|_|  \__,_|_|  \__,_|

Running CleanPattern... SUCCESS
paulo@alexandria sandbox$ ls
arara.jar  commons-io-2.4.jar  test.tex

It even works with patterns:

paulo@alexandria sandbox$ touch aaa.xml bbb.xml ccc.xml
paulo@alexandria sandbox$ cat test.tex 
% arara: cleanpattern: { pattern: 'a*.xml' }

paulo@alexandria sandbox$ java -cp commons-io-2.4.jar:arara.jar com.github.arara.Arara test.tex 
  __ _ _ __ __ _ _ __ __ _ 
 / _` | '__/ _` | '__/ _` |
| (_| | | | (_| | | | (_| |
 \__,_|_|  \__,_|_|  \__,_|

Running CleanPattern... SUCCESS
paulo@alexandria sandbox$ ls
arara.jar  bbb.xml  ccc.xml  commons-io-2.4.jar  test.tex

However, there is a problem. By design.

For version 3.0, rule expansion happens before the execution of all rules (It's worth noting that I want to change this behaviour in later versions). So, if you have

% arara: pdftex
% arara: cleanpattern: { pattern: '*.log' }
Hello world.
\bye

after running arara hello.tex, you will end up with hello.tex, hello.pdf and hello.log! Why? Let's see what arara does in this case:

  1. The tool finds two directives in the source file.
  2. It expands the pdftex rule, which will become pdftex hello.tex.
  3. It expands the cleanpattern rule, which will become rm -rf because at this moment, there's no hello.log yet! After all, this rule depends on the former.

Nothing I can do about it, I am afraid. Sorry. :( At least with the current version 3.0. I'm working hard on changing this. Note that the original decision was not a bad idea per se, but as I said, arara grew up to a huge userbase with cases that the tool wasn't originally prepared to cover.

Summary: It's possible to get wildcard expansion through a (kinda hackish) code injection, but since the event happens on rule expansion and not on execution, it won't deal with eventual file dependencies generated by previous rule executions.

Related Question