Tuesday, December 16, 2008

SpreadsheetML - Convert column integer index to column alpha value(i.e. A, AA, BH)

Here's a quick snippet I found in a forum post at http://stackoverflow.com/questions/181596/how-to-convert-a-column-number-eg-127-into-an-excel-column-eg-aa. Many times in Excel I want to use a column index value instead of the letter representation(i.e A, AA, BH). I don't have control of the Excel so I can't change the representation. This method returns the letter representation from an integer column value.



protected string GetExcelColumnName(int columnNumber)
{
int dividend = columnNumber;
string columnName = String.Empty;
int modulo;

while (dividend > 0)
{
modulo = (dividend - 1) % 26;
columnName = Convert.ToChar(65 + modulo).ToString() + columnName;
dividend = (int)((dividend - modulo) / 26);
}

return columnName;
}

SpreadsheetML - Retrieving SharedString values

Retrieving cell values in Excel spreadsheets is very straight forward, especially when string values are stored inline. Other times the string values are stored in the SharedStringTable collection. If the CellType is a SharedString. then the cell value references the index in the SharedStringTable. Below is a quick snippet that will return the SharedString value if that is the case, otherwise the inline string value.

protected string GetCellValue(WorkbookPart wbPart, string targetCell)
{
string result = null;
WorksheetPart wsPart = wbPart.WorksheetParts.First();
SheetData sheetData = wsPart.Worksheet.GetFirstChild();
SharedStringTablePart sstPart = wbPart.GetPartsOfType().First();
SharedStringTable ssTable = sstPart.SharedStringTable;

try
{
Cell cell = wsPart.Worksheet.Descendants()
.Where(c => targetCell.Equals(c.CellReference))
.First();

if (cell.DataType != null && cell.DataType == CellValues.SharedString)
{
result = ssTable.ChildElements[Convert.ToInt32(cell.CellValue.Text)].InnerText;
}
else
{
if (cell.CellValue != null)
{
result = cell.CellValue.Text;
}
}
}
catch (Exception)
{

}

return result;
}

Monday, November 10, 2008

Powerpoint spell check and text substitution problem

A situtation can occur when identifying substitution text in a presentation. In my example the text token is surrounded with a start/end token(i.e. [@token]). If the token is a word in the Office dictionary then there is no problem. On the other hand if the token word isn't in the dictionary and gets the red underline from spellcheck then Powerpoint breaks the string into multiple text runs. It's put into multiple runs because the spellchecked text has a RunProperty that identifies it as being a spelling error and must be repressented by the schema as such. The following will show the issue.


Slide to be parsed


Here's the breakdown of the text runs
Text Before: Token fun
Text Before: Sometimes a [@Token] string will get broken up.
Text Before: When spell check flags a misspelled word the
Text Before: [@
Text Before: MisspelledToken
Text Before: ] gets split between multiple text blocks


As you can see there is a token that gets spanned across multiple text runs and thus doesn't get replaced. This can be a major hang up. A quick solution for this one example is that I know when there is a start-token at the end of a text run then the next couple text runs are the token itself and then the end-token. So I can append those three together and then replace the token as necessary.


void SubstituteText(SlidePart slidePart, string tokenStart, string tokenEnd, string tokenValue, string subValue)
{
string token = tokenStart + tokenValue + tokenEnd;
int tokenStartId = 0;

//Collect all the Paragraph sections containing the token
List paragraphList = slidePart.Slide
.Descendants()
.Where(t => t.InnerText.Contains(token)).ToList();

foreach (Drawing.Paragraph p in paragraphList)
{
//Collect all the Text
List textList = p
.Descendants()
.ToList();

//Iterate and find tokenStart Text block or replace text if whole token found
foreach (Drawing.Text t in new List(p.Descendants().ToList()))
{
if (t.Text.EndsWith(tokenStart))
{
tokenStartId = textList.IndexOf(t);

//append next two text segments and remove them
Drawing.Text appendText = textList[tokenStartId + 1];
t.Text = t.Text + appendText.Text;
//must remove at the Drawing.Run level
appendText.Parent.Remove();
appendText = textList[tokenStartId + 2];
t.Text = t.Text + appendText.Text;
//must remove at the Drawing.Run level
appendText.Parent.Remove();

textList.RemoveAt(tokenStartId + 1);
textList.RemoveAt(tokenStartId + 2);
}
//substitute text
t.Text = t.Text.Replace(token, subValue);
}

}
}


After running this I get the following output as desired.

.pptx after substitution


Breakdown of the text runs after substitution
Text After: Token fun
Text After: Sometimes a TOKEN string will get broken up.
Text After: When spell check flags a misspelled word the
Text After: BROKEN TOKEN gets split between multiple text blocks

Now this doesn't cure all ails. I have also seen it where after going back and making changes to existing text that sometimes the start-token and token are together but the end-token is seperated to another text run. We also can't just take all the runs of a paragraph and smash them together into a single run and then substitute. This would ignore any text styling that was done within the paragraph. I have tried parsing a paragraph such that it looks at the RunProperties between consequetive runs to see if it's valid to append them together. I am still working on this and will present the final solution at a later time.

Hopefully this post has revealed some nuances of doing text substitution and that it isn't always straight forward.

Wednesday, October 29, 2008

PSC participates at the DII Workshop in Redmond

Last week i attended the DII workshop in Redmond with John Head and David Munaretto. We had a great opportunity to present our case study for server-side Powerpoint document generation as well as participate in discussions across many topics. Doug Mahugh has posted some great details about the entire DII workshop.

Topics covered ranged from document validator tools, document test library, document taxonomy, and OOXML development. This was my first time at a DII event and it was great to meet so many members of the Office product groups as well as other developers and "standards wonks" (credit for that description to Alex Brown). Interoperability dominates most the discussions, because in the end it's the users that matter most, not us developers. In this regard Microsoft is taking a nice step in attempting to approach the community not only with their successes, but also their weaknesses. Shawn Villaron was able share some examples of how they plan to document their format implementations to the public, creating an openess that will be much appreciated going forward.

The highlight of the workshop for me was John and I presenting PSC's case study of server-side document generation (Click to view slides). I was very pleased with the all feedback we received from both the Microsoft groups as well as the other developers. It's exciting to be on the tip of an emerging technology, something that isn't always easy. Both the Powerpoint and OOXML teams listened to our frustrations and made it clear that they are there to help.

My biggest gripe doing PPTX has been the lack of documentation and content that exists out on the web. Now that I have some scenarios that work i plan to try and fill that gap and help the community get on track using OOXML. I will be trying out some new functionality within the 2.0 SDK and will be posting my results. Some topics I plan to post about are content-substitution versus content-generation and where a good mix is between the two, conditional formatting within Powerpoint table data (not Excel sheets) as well as dealing with the finer aspects of Powerpoint including themes and smart art.

Visual Studio 2010 and .NET 4.0 CTP available

Just got done moving into my new place, so I have been behind in getting my posts out about the DII workshop last week in Redmond, it's still coming. I wanted to pass this piece of news along if you haven't already seen. A Virtual PC image of VS 2010 with the 4.0 framework is available HERE. Microsoft is working hard getting feedback early and often, so go through the walkthroughs and post in the related forums. Enjoy!

Wednesday, October 22, 2008

New Case Study for OOXML

PSC Group, LLC has published a case study for using OOXML, specifically for Powerpoint, as an enterprise reporting framework, click through to read.

"Opening Access to Office Information with OOXML"