Slide to be parsed
Here's the breakdown of the text runs
Text Before: Token fun
Text Before: Sometimes a [@Token] string will get broken up.
Text Before: When spell check flags a misspelled word the
Text Before: [@
Text Before: MisspelledToken
Text Before: ] gets split between multiple text blocks
Text Before: Sometimes a [@Token] string will get broken up.
Text Before: When spell check flags a misspelled word the
Text Before: [@
Text Before: MisspelledToken
Text Before: ] gets split between multiple text blocks
As you can see there is a token that gets spanned across multiple text runs and thus doesn't get replaced. This can be a major hang up. A quick solution for this one example is that I know when there is a start-token at the end of a text run then the next couple text runs are the token itself and then the end-token. So I can append those three together and then replace the token as necessary.
void SubstituteText(SlidePart slidePart, string tokenStart, string tokenEnd, string tokenValue, string subValue)
{
string token = tokenStart + tokenValue + tokenEnd;
int tokenStartId = 0;
//Collect all the Paragraph sections containing the token
ListparagraphList = slidePart.Slide
.Descendants()
.Where(t => t.InnerText.Contains(token)).ToList();
foreach (Drawing.Paragraph p in paragraphList)
{
//Collect all the Text
ListtextList = p
.Descendants()
.ToList();
//Iterate and find tokenStart Text block or replace text if whole token found
foreach (Drawing.Text t in new List(p.Descendants ().ToList()))
{
if (t.Text.EndsWith(tokenStart))
{
tokenStartId = textList.IndexOf(t);
//append next two text segments and remove them
Drawing.Text appendText = textList[tokenStartId + 1];
t.Text = t.Text + appendText.Text;
//must remove at the Drawing.Run level
appendText.Parent.Remove();
appendText = textList[tokenStartId + 2];
t.Text = t.Text + appendText.Text;
//must remove at the Drawing.Run level
appendText.Parent.Remove();
textList.RemoveAt(tokenStartId + 1);
textList.RemoveAt(tokenStartId + 2);
}
//substitute text
t.Text = t.Text.Replace(token, subValue);
}
}
}
After running this I get the following output as desired.
.pptx after substitution
Breakdown of the text runs after substitution
Text After: Token fun
Text After: Sometimes a TOKEN string will get broken up.
Text After: When spell check flags a misspelled word the
Text After: BROKEN TOKEN gets split between multiple text blocks
Text After: Sometimes a TOKEN string will get broken up.
Text After: When spell check flags a misspelled word the
Text After: BROKEN TOKEN gets split between multiple text blocks
Now this doesn't cure all ails. I have also seen it where after going back and making changes to existing text that sometimes the start-token and token are together but the end-token is seperated to another text run. We also can't just take all the runs of a paragraph and smash them together into a single run and then substitute. This would ignore any text styling that was done within the paragraph. I have tried parsing a paragraph such that it looks at the RunProperties between consequetive runs to see if it's valid to append them together. I am still working on this and will present the final solution at a later time.
Hopefully this post has revealed some nuances of doing text substitution and that it isn't always straight forward.
1 comment:
Powerpoint may also split the token if the user clicks in the middle of the word and for some other uncontrollable events. I had to deal with that at work. I used a normaliser approach there where I renormalize every run to either contain exactly the whole token or no token. If you care about formating you had to copy the rPr tag over to new runs. Take care of the endPara Properties. I ended with a relatively complex finite state machine to do it. But I had to support roundtrip editing ppt -> out app -> ppt -> ... so you might get away with a simpler version.
Post a Comment