[Tex/LaTex] How to create a visually pleasing tree hierarchy from a difficult to read CSV File (MySQL Export)

csvdatabasefontspeclinguisticsxetex

Ok here is the deal: I am working on an endangered language project. The data is contained in a MySQL database. I can run a query to produce a hierarchy, but the hierarchy is difficult for humans to read.

See the MySQL query output here:

"Body Parts",NULL,"Human","Kopf",NULL,NULL
"Clothing","flechten",NULL,NULL,NULL,NULL
"Numbers","zwei",NULL,NULL,NULL,NULL
"Body Parts",NULL,"Human","Haar",NULL,NULL
"Body Parts",NULL,"Human","Auge",NULL,NULL

Output Structure

The structure of the CSV output is as follows, but the tree is arbitrarily large:

FOLDER, LEXEME, SUBFOLDER, LEXEME, SUBSUBFOLDER, LEXEME

The nodes function much like folders on a filesystem. They contain both folders and files (the lexemes/words). For the example output below, I will make folders bold and lexemes italicized. This would also be good to have in the latex output, in order to differentiate between lexemes and folders (categories).

Goal

I would like the output to be ordered alphabetically (although I might be able to do work that into my query, but it would still be nice to have). Keep in mind there could be lexemes and folders inside every folder:

Visually Pleasing Tree:

Body Parts
- Animal
- Human
  - Auge
  - Haar
  - Kopf
Clothing
- flechten
Numbers
- zwei

I should mention that I am using UTF-8 encoding and compiling with xelatex with fontspec. This is a project dealing with phonetics, therefore the IPA will be used.

Best Answer

I spent the last hour on the following Python script, which should give an overview how to tackle this problem.

I use SQLite, I could get Python to connect to my MySQL server. I just retrieve each row and check, if the corresponding column is "None" (strangely SQLite does not return NULL) and use easylist's simple list setup to avoid all the \begin{itemize}/\end{itemize} hazzle.

In your case this will be a bit more complicated as e.g. you need to select all the items below 'Human'. One way could be to sort the result set after all columns, then you'd have to check at each row if the last row contained an item at this level with the same name and omit this.

It is solveable however takes a lot of time & brain...

#! /usr/bin/python
# -*- coding: utf-8 -*- 
import sqlite3

conn = sqlite3.connect(":memory:")
c = conn.cursor()

# create the table
c.execute('''CREATE TABLE "lex" ("ID" INTEGER PRIMARY KEY  AUTOINCREMENT  NOT NULL  UNIQUE , "Folder" VARCHAR,
        "Lexeme1" VARCHAR, "Subfolder" VARCHAR, "Lexeme2" VARCHAR, "Subsubfolder" VARCHAR, "Lexeme3" VARCHAR)''')

c.execute("Insert into lex Values(1,'Body Parts',NULL,'Human','Kopf',NULL,NULL)")
c.execute("Insert into lex Values(2,'Clothing','flechten',NULL,NULL,NULL,NULL)")
c.execute("Insert into lex Values(3,'Numbers','zwei',NULL,NULL,NULL,NULL)")
c.execute("Insert into lex Values(4,'Body Parts',NULL,'Human','Haar',NULL,NULL)")
c.execute("Insert into lex Values(5,'Body Parts',NULL,'Human','Auge',NULL,NULL)")


# Commit
conn.commit()
c.execute("SELECT * FROM lex")

print("\\begin{easylist}[itemize]")
# print all the first cell of all the rows
for row in c.fetchall():
    c1 = str(row[1])
    c2 = str(row[2])
    c3 = str(row[3])
    c4 = str(row[4])
    c5 = str(row[5])
    c6 = str(row[6])
    if c1!="None":
        print("& " + c1)
    if c2!="None":
        print("&& " + c2)
    if c3!="None":
        print("&&& " + c3)
    if c4!="None":
        print("&&&& " + c4)
    if c5!="None":
        print("&&&&& " + c5)
    if c6!="None":
        print("&&&&&& " + c6)

print("\\end{easylist}")

# Close the connection

conn.close()

Output:

\begin{easylist}[itemize]
& Body Parts
&&& Human
&&&& Kopf
& Clothing
&& flechten
& Numbers
&& zwei
& Body Parts
&&& Human
&&&& Haar
& Body Parts
&&& Human
&&&& Auge
\end{easylist}

Related Solutions

[Tex/LaTex] How to read a csv file having #

As @wipet wrote, the problem is the catcode of the # sign. With the recent version 1.12 (2014/07/14) of csvsimple, you can neutralize the # sign with the respect sharp option:

\documentclass{article}
\usepackage{csvsimple}
\begin{document}
\csvautotabular[respect sharp]{initial.csv}
\end{document}

There are some more respect xy options which also may be interesting for other use cases, e.g. respect and (for &) and respect dollar (for $).

Read variables from csv file based on context

The following code works:

\documentclass{article}

\usepackage{csvsimple}

\begin{filecontents*}[overwrite]{630556.csv}
function,value
1. pos,some text
2. pos,some other text
\end{filecontents*}

\begin{document}

\newcommand{\firstvar}{tmpvalone}
\newcommand{\secondvar}{tmpvaltwo}

\csvreader[head to column names]{630556.csv}{}{
  \ifcsvstrequal{\function}{1. pos}{ \edef\firstvar{\value} } { }
  \ifcsvstrequal{\function}{2. pos}{ \edef\secondvar{\value} }{ }
}

\firstvar \quad \secondvar

\end{document}

Best Answer

Related Solutions

[Tex/LaTex] How to read a csv file having #

Read variables from csv file based on context

Related Question