You can transform your XML using the luaxml-transform
library, but this task is made more difficult by the request to split the table into two floats, based on the tab-row-break
attribute. We can use the luaxml-domobject
library to preprocess XML, split the table and then we can easily use the transform library.
This is the full code, I will describe it bellow:
\documentclass{article}
\usepackage{fontspec}
\setmainfont{Linux Libertine O}
\usepackage{luacode}
% this is a modified version of \@makecaption from LaTeX classes
% it doesn't print : between table number and caption
\makeatletter
\newcommand\tablecaption[2]{%
\vskip\abovecaptionskip
\sbox\@tempboxa{\textbf{#1} #2}%
\ifdim \wd\@tempboxa >\hsize
\textbf{#1} #2\par
\else
\global \@minipagefalse
\hb@xt@\hsize{\hfil\box\@tempboxa\hfil}%
\fi
\vskip\belowcaptionskip
}
\makeatother
\begin{document}
\begin{luacode*}
local domobject = require "luaxml-domobject"
local transform = require "luaxml-transform"
sample = [[
<?xml version="1.0" encoding="utf-8"?>
<art>
<title>Scattering of flexural waves an electric current</title>
<p>From these observations, another important result is that the individual masses of such observed black hole–black hole (BBHs) binaries can be much larger than what were expected previously both theoretically and observationally [14], and various scenarios have been proposed [<xref ref-type="bibr" rid="cqgab7bbabib2">2</xref>, <xref ref-type="bibr" rid="cqgab7bbabib26">26</xref>]. In particular, observations of the same signal in two different detectors provides an efficient independent way to cross check and validate the instruments, which is particularly valuable for a space-based detector behavior of these parameters is presented in the table <xref ref-type="table" rid="cqgab7bbat1">1</xref>.</p>
<table-wrap id="tab1" position="float" tab-row-break="5"><label>Table 1.</label><caption id="tab1"><p>The large <italic>x</italic> behavior for different <italic>w</italic>, where <italic>F</italic> means finite.</p></caption><table><colgroup><col align="left"/><col align="center"/><col align="center"/><col align="center"/><col align="center"/><col align="center"/><col align="center"/></colgroup><thead><tr><th>Parameters:</th><th>e<sup>2<italic>γ</italic></sup></th><th><italic>r</italic><sup>2</sup></th><th><italic>ρ</italic></th><th><italic>L</italic></th><th><italic>V</italic></th><th><italic>E</italic></th></tr></thead><tbody><tr><td>A1</td><td>∞</td><td>∞</td><td>0</td><td>∞</td><td>∞</td><td><italic>F</italic></td></tr><tr><td>B2</td><td>0</td><td>∞</td><td><italic>F</italic></td><td><italic>F</italic></td><td><italic>F</italic></td></tr><tr><td>C3</td><td>∞</td><td>0</td><td><italic>F</italic></td><td><italic>F</italic></td><td><italic>F</italic></td></tr><tr><td>D4</td><td>0</td><td>∞</td><td><italic>F</italic></td><td><italic>F</italic></td><td><italic>F</italic></td><td><italic>F</italic></td></tr><tr><td>E5</td><td>0</td><td>∞</td><td>∞</td><td><italic>F</italic></td><td><italic>F</italic></td><td><italic>F</italic></td></tr><tr><td>F6</td><td>∞</td><td>∞</td><td>0</td><td>∞</td><td>∞</td><td><italic>F</italic></td></tr><tr><td>G7</td><td>∞</td><td>∞</td><td>0</td><td>∞</td><td>∞</td><td><italic>F</italic></td></tr><tr><td>H8</td><td>∞</td><td>∞</td><td>0</td><td>∞</td><td>∞</td><td><italic>F</italic></td></tr><tr><td>I9</td><td>∞</td><td>∞</td><td>0</td><td>∞</td><td>∞</td><td><italic>F</italic></td></tr><tr><td>J10</td><td>∞</td><td>∞</td><td>0</td><td>∞</td><td>∞</td><td><italic>F</italic></td></tr></tbody></table></table-wrap>
</art>
]]
local dom = domobject.parse(sample)
-- prepare tables
for _, wrap in ipairs(dom:query_selector("table-wrap[tab-row-break]")) do
local row_break = wrap:get_attribute("tab-row-break")
local tables = wrap:query_selector("table") or {}
-- we assume that there is just one table as <table-wrap> children
local tbl = tables[1]
if tbl then
-- convert <col> elements to latex specification and save it as an attribute
local align = {}
local align_convert = {left = "l", right = "r", center = "c"}
for _, col in ipairs(tbl:query_selector("col")) do
local al = col:get_attribute("align") or "left"
align[#align+1] = align_convert[al]
end
tbl:set_attribute("align", table.concat(align, " "))
end
-- create floats
-- first save children
local children = wrap:get_children()
-- it will contain floats
wrap._children = {}
-- this is needed to fix a bug in LuaXML
local function fix_parents(el)
for k,v in ipairs(el._children or {}) do
if v:is_element() then
v._parent = el
fix_parents(v)
end
end
end
for i = 1, 2 do
local float = wrap:create_element("float")
wrap:add_child_node(float)
-- add saved children
for _, child in ipairs(children) do
-- save copy of the original child
float:add_child_node(child:copy_node())
end
fix_parents(float)
end
-- now split tables
local tbl = wrap:query_selector("table")
-- there should be two tables now
if #tbl == 2 then
local tbody = tbl[1]:query_selector("tbody tr")
-- remove spurious rows from the first table
for i = row_break + 1, #tbody do
tbody[i]:remove_node()
end
-- remove spurious lines from the second table
local tbody = tbl[2]:query_selector("tbody tr")
for i = 1, row_break do
tbody[i]:remove_node()
end
end
-- place (continued...) text to second caption
local captions = wrap:query_selector("caption")
local caption = captions[2]
if caption then
-- remove original text
caption._children = {}
local par = caption:create_element("p")
local text = par:create_text_node("(continued...)")
par:add_child_node(text)
caption:add_child_node(par)
end
end
transformer = transform.new()
transformer:add_action("title", "\\section{@<.>}")
transformer:add_action("p", "@<.>\n\n")
transformer:add_action("table-wrap float", "\\begin{table}\n@<.>\n\\end{table}\n")
-- you need to define this command in your TeX file
transformer:add_action("table-wrap label", "\\tablecaption{@<.>}")
-- this is a second argument to \tablecaption
transformer:add_action("table-wrap caption", "{@<.>}\n\n")
transformer:add_action("italic", "\\textit{@<.>}")
transformer:add_action("sub", "\\textsubscript{@<.>}")
transformer:add_action("sup", "\\textsuperscript{@<.>}")
transformer:add_action("table", "\\begin{tabular}{@{align}}\n@<.>\\end{tabular}\n")
transformer:add_action("tr", "@<.>\\\\\n")
transformer:add_action("th", "@<.> &")
transformer:add_action("th:last-of-type", "@<.>")
transformer:add_action("tbody td", "@<.> &")
transformer:add_action("tbody td:last-of-type", "@<.>")
local content = transformer:process_dom(dom)
-- print(content)
-- print the transformed XML to LaTeX
transform.print_tex(content)
-- print(dom:serialize())
\end{luacode*}
\end{document}
This is the part that deals with tables:
local dom = domobject.parse(sample)
-- prepare tables
for _, wrap in ipairs(dom:query_selector("table-wrap[tab-row-break]")) do
local row_break = wrap:get_attribute("tab-row-break")
local tables = wrap:query_selector("table") or {}
-- we assume that there is just one table as <table-wrap> children
local tbl = tables[1]
if tbl then
-- convert <col> elements to latex specification and save it as an attribute
local align = {}
local align_convert = {left = "l", right = "r", center = "c"}
for _, col in ipairs(tbl:query_selector("col")) do
local al = col:get_attribute("align") or "left"
align[#align+1] = align_convert[al]
end
tbl:set_attribute("align", table.concat(align, " "))
end
-- create floats
-- first save children
local children = wrap:get_children()
-- it will contain floats
wrap._children = {}
-- this is needed to fix a bug in LuaXML
local function fix_parents(el)
for k,v in ipairs(el._children or {}) do
if v:is_element() then
v._parent = el
fix_parents(v)
end
end
end
for i = 1, 2 do
local float = wrap:create_element("float")
wrap:add_child_node(float)
-- add saved children
for _, child in ipairs(children) do
-- save copy of the original child
float:add_child_node(child:copy_node())
end
fix_parents(float)
end
-- now split tables
local tbl = wrap:query_selector("table")
-- there should be two tables now
if #tbl == 2 then
local tbody = tbl[1]:query_selector("tbody tr")
-- remove spurious rows from the first table
for i = row_break + 1, #tbody do
tbody[i]:remove_node()
end
-- remove spurious lines from the second table
local tbody = tbl[2]:query_selector("tbody tr")
for i = 1, row_break do
tbody[i]:remove_node()
end
end
-- place (continued...) text to second caption
local captions = wrap:query_selector("caption")
local caption = captions[2]
if caption then
-- remove original text
caption._children = {}
local par = caption:create_element("p")
local text = par:create_text_node("(continued...)")
par:add_child_node(text)
caption:add_child_node(par)
end
end
First of all, it converts the column alignment information to LaTeX tabular specification and saves it as an attribute for the <table>
element. It makes the transformation easier. It then makes two float elements, and copies the original table content to them. We then remove spurious table rows from both copies, and set (continued...)
as table caption for the second float.
This DOM object can then be transformed using the following rules:
transformer = transform.new()
transformer:add_action("title", "\\section{@<.>}")
transformer:add_action("p", "@<.>\n\n")
transformer:add_action("table-wrap float", "\\begin{table}\n@<.>\n\\end{table}\n")
-- you need to define this command in your TeX file
transformer:add_action("table-wrap label", "\\tablecaption{@<.>}")
-- this is a second argument to \tablecaption
transformer:add_action("table-wrap caption", "{@<.>}\n\n")
transformer:add_action("italic", "\\textit{@<.>}")
transformer:add_action("sub", "\\textsubscript{@<.>}")
transformer:add_action("sup", "\\textsuperscript{@<.>}")
transformer:add_action("table", "\\begin{tabular}{@{align}}\n@<.>\\end{tabular}\n")
transformer:add_action("tr", "@<.>\\\\\n")
transformer:add_action("th", "@<.> &")
transformer:add_action("th:last-of-type", "@<.>")
transformer:add_action("tbody td", "@<.> &")
transformer:add_action("tbody td:last-of-type", "@<.>")
There is nothing too special, only see that we need to handle last items in rows, in order to prevent insertion of the extra &
character, it would cause compilation error. This we need the tbody td:last-of-type
rule.
Also note that we use the align
attribute that we defined by the DOM processing function earlier, to set the correct tabular declaration.
We also expect that <label>
and <caption>
elements are next to each other, because they produce the \tablecaption
command, and it would break if they weren't at their expected places.
Lastly, see that you need to use a font that supports all special characters, like Linux Libertine in my example.
This is the result:
Best Answer
Try this:
It uses LuaXML's DOM object. The
dom:traverse_elements
function loops over all elements, detects if the element name starts withstep
and save it's text content to the\jobname-step.tex
file.This is the result: