The problem is given by:
$$ \arg \min_{X \in \mathcal{T} } \frac{1}{2} {\left\| X B - A \right\|}_{F}^{2} $$
Where $ \mathcal{T} $ is the set of Lower Triangular Matrices.
The set $ \mathcal{T} $ is a Convex Set.
Moreover, the orthogonal projection onto the set of a given matrix $ Y \in \mathbb{R}^{m \times n} $ is easy:
$$ X = \operatorname{Proj}_{\mathcal{T}} \left( Y \right) = \operatorname{tril} \left( Y \right) $$
Namely, zeroing all elements above the main diagonal of $ Y $.
By utilizing the Projected Gradient Descent it is easy to solve this problem:
$$
\begin{align*}
{X}^{k + 1} & = {X}^{k} - \alpha \left( X B {B}^{T} - A {B}^{T} \right) \\
{X}^{k + 2} & = \operatorname{Proj}_{\mathcal{T}} \left( {X}^{k + 1} \right)\\
\end{align*}
$$
The full MATLAB code with CVX validation is available in my StackExchnage Mathematics Q2876283 GitHub Repository.
The solution is very similar to the solution in Q2421545 - Solve Least Squares (Frobenius Norm) Problem with Diagonal Matrix Constraint.
Remark
I think you can also get a closed form solution for each element in $ X $ if you go through deriving the derivative with respect to each element $ X $.
Another approach would be developing the Linear Operator which operates on $ \frac{ \left( n - 1 \right) n }{2} $ elements and creates an $ n \times n $ Triangular Matrix.
The problem is given by:
$$\begin{aligned}
\arg \min_{x} \quad & \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \gamma {\left\| x \right\|}_{1} \\
\text{subject to} \quad & \boldsymbol{1}^{T} x = b
\end{aligned}$$
The Lagrangian is given by:
$$ L \left( x, \beta \right) = \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \gamma {\left\| x \right\|}_{1} + \beta \left( \boldsymbol{1}^{T} x - b \right) $$
The solution must obey KKT Conditions (The problem is Convex and Slater Conditions are satisfied) which are given by:
$$\begin{align*}
\nabla L \left( x, \beta \right) = x - y + \gamma \partial {\left\| x \right\|}_{1} + \beta \boldsymbol{1} & = 0 & \text{(1)} \\
\boldsymbol{1}^{T} x & = b & \text{(2)} \\
\end{align*}$$
From (1) is is clear that $ x = \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( y - \beta \boldsymbol{1} \right) $ (As a shifted solution to the $ {L}_{1} $ Prox). Using (2) one would get:
$$ \boldsymbol{1}^{T} \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( y - \beta \boldsymbol{1} \right) = b \Rightarrow b - \boldsymbol{1}^{T} \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( y - \beta \boldsymbol{1} \right) = 0 $$
Using the explicit solution of $ \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( \cdot \right) $ as the Soft Threshold operator yields:
$$ g \left( \beta \right) = \sum_{i = 1}^{n} \operatorname{sign} \left( {y}_{i} - \beta \right) { \left( \left| {y}_{i} - \beta \right| - \gamma \right) }_{+} - b $$
Now the problem becomes finding the root of the function $ g \left( \beta \right) $ and the plug it into the Prox.
This is (The function $ g \left( \beta \right) $) a monotonic decreasing function of $ \beta $ which we're looking for its zero (Which solves the problem).
One could solve this within any 1D solver within the range $ \left[ \max(y) + b, \min(y) - b \right] $ which must contain zero.
More efficient approach is based on the fast this is a Piece Wise Linear function (Of $ \beta $) with break points at $ \left| {y}_{i} - \beta \right| = \gamma $ (Hence there are $ 2 n $ points).
Hence by utilizing Bi-Section method which moves the points within the sorted $ 2 n $ break points one could easily find the section which the function has constant slope within it and have the zero value in it.
Let's mark this section by $ \left[ {\beta}_{min}, {\beta}_{max} \right] $, then $ \forall {\beta}_{i}, {\beta}_{j} \in \left[ {\beta}_{min}, {\beta}_{max} \right] : \; \operatorname{sign} \left( y - {\beta}_{i} \right) = \operatorname{sign} \left( y - {\beta}_{j} \right) = e $. This also implies that the support, $ \mathcal{S} = \left\{ i \mid { \left( \left| {y}_{i} - \beta \right| - \gamma \right) }_{+} \neq 0 \right\} $ is constant within this section (As otherwise a new break point would be created).
This means the equation becomes:
$$\begin{aligned}
0 & = \sum_{i \in \mathcal{S}} {e}_{i} \left( \left| {y}_{i} - \beta \right| - \gamma \right) - b & \text{} \\
& = \sum_{i \in \mathcal{S}} {y}_{i} - \beta - \gamma {e}_{i} - b & \text{ $ {e}_{i} \left| {y}_{i} - \beta \right| = \operatorname{sign} \left( {y}_{i} - \beta \right) \left| {y}_{i} - \beta \right|= {y}_{i} - \beta $ } \\
& \Rightarrow \sum_{i \in \mathcal{S}} \beta = \sum_{i \in \mathcal{S}} {y}_{i} - \sum_{i \in \mathcal{S}} \gamma {e}_{i} - b & \text{} \\
& \Rightarrow \beta = \frac{1}{ \left| \mathcal{S} \right| } \left( \sum_{i \in \mathcal{S}} \left( {y}_{i} - \gamma {e}_{i} \right) - b \right)
\end{aligned}$$
The full MATLAB code with CVX validation is available in my StackExchnage Mathematics Q2886713 GitHub Repository.
Remark
Knowing the solution to the above problem (Deriving the Prox Operator) gives us an efficient method (Proximal Gradient Descent) to solve:
$$\begin{aligned}
\arg \min_{x} \quad & \frac{1}{2} {\left\| A x - b \right\|}_{2}^{2} + \gamma {\left\| x \right\|}_{1} \\
\text{subject to} \quad & \boldsymbol{1}^{T} x = b
\end{aligned}$$
With the ADMM framework we can even tackle more general cases.
Reference
The solution is based on Efficient Solvers for Sparse Subspace Clustering.
Best Answer
The Problem
Stating the problem in more general form:
$$ \arg \min_{S} f \left( S \right) = \arg \min_{S} \frac{1}{2} \left\| A S {B}^{T} - C \right\|_{F}^{2} $$
The derivative is given by:
$$ \frac{d}{d S} \frac{1}{2} \left\| A S {B}^{T} - C \right\|_{F}^{2} = A^{T} \left( A S {B}^{T} - C \right) B $$
Solution to General Form
The derivative vanishes at:
$$ \hat{S} = \left( {A}^{T} A \right)^{-1} {A}^{T} C B \left( {B}^{T} B \right)^{-1} $$
Solution with Diagonal Matrix
The set of diagonal matrices $ \mathcal{D} = \left\{ D \in \mathbb{R}^{m \times n} \mid D = \operatorname{diag} \left( D \right) \right\} $ is a convex set (Easy to prove by definition as any linear combination of diagonal matrices is diagonal).
Moreover, the projection of a given matrix $ Y \in \mathbb{R}^{m \times n} $ is easy:
$$ X = \operatorname{Proj}_{\mathcal{D}} \left( Y \right) = \operatorname{diag} \left( Y \right) $$
Namely, just zeroing all off diagonal elements of $ Y $.
Hence one could solve the above problem by Project Gradient Descent by projecting the solution of the iteration onto the set of diagonal matrices.
The Algorithms will be:
$$ \begin{align*} {S}^{k + 1} & = {S}^{k} - \alpha A^{T} \left( A {S}^{k} {B}^{T} - C \right) B \\ {S}^{k + 2} & = \operatorname{Proj}_{\mathcal{D}} \left( {S}^{k + 1} \right)\\ \end{align*} $$
The code:
Solution with Diagonal Structure
The problem can be written as:
$$ \arg \min_{s} f \left( s \right) = \arg \min_{s} \frac{1}{2} \left\| A \operatorname{diag} \left( s \right) {B}^{T} - C \right\|_{F}^{2} = \arg \min_{s} \frac{1}{2} \left\| \sum_{i} {s}_{i} {a}_{i} {b}_{i}^{T} - C \right\|_{F}^{2} $$
Where $ {a}_{i} $ and $ {b}_{i} $ are the $ i $ -th column of $ A $ and $ B $ respectively. The term $ {s}_{i} $ is the $ i $ -th element of the vector $ s $.
The derivative is given by:
$$ \frac{d}{d {s}_{j}} f \left( s \right) = {a}_{j}^{T} \left( \sum_{i} {s}_{i} {a}_{i} {b}_{i}^{T} - C \right) {b}_{j} $$
Note to Readers: If you know how vectorize this structure, namely write the derivative where the output is a vector of the same size as $ s $ please add it.
By vanishing it or using Gradient Descent one could find the optimal solution.
The code:
Remark
The direct solution can be achieved by:
$$ {s}_{j} = \frac{ {a}_{j}^{T} C {b}_{j} - {a}_{j}^{T} \left( \sum_{i \neq j} {s}_{i} {a}_{i} {b}_{i}^{T} - C \right) {b}_{j} }{ { \left\| {a}_{j} \right\| }_{2}^{2} { \left\| {b}_{j} \right\| }_{2}^{2} } $$
Summary
Both methods works and converge to the optimal value (Validated against CVX) as the problem above are Convex.
The full MATLAB code with CVX validation is available in my StackExchnage Mathematics Q2421545 GitHub Repository.