fix: rewrite transformer(OOXML->CiceroMark)#421
Conversation
Remove old transformer logic Travserse the xml nodes in DFS manner Construct the ciceroMark of a node using properties Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
|
@algomaster99 @dselman |
|
If the tests fail, there must be something wrong with the logic. Let the tests speak for your code. That's why I stress on rigorous testing. The more unique test cases you add, the better your code becomes. |
By test fail, I mean that the current transformation is not complete, so for that the test fail. For the conversion |
Add heading transformation logic Create function to construct Nodes Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
Logic to transform variable OOXML tags Functions: getName, getElementType Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
Use a function to traverse runTime nodes use function only when the node is <w:r/> Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
Logic fix For loop range change Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
Check for element existence before using Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
|
@algomaster99 @dselman |
Pass offline:true in model Manager Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
algomaster99
left a comment
There was a problem hiding this comment.
Left some comments. I have more questions than changes, so please address them when you get time. Just one more question - currently, you're only dealing with nesting of bold and italic elements, right?
So I couldn't properly test it using my custom inputs, probably because there is no CLI for this right now, and I don't want to log infinitely long strings in my terminal to check the output. This is an indication that we need to integrate these transformers with CLI as soon as possible.
| ...this.JSONXML[i], | ||
| properties: [...this.JSONXML[i].properties.slice(commonLength)], | ||
| }; | ||
| constructedNode = this.construtCiceroMarkNodeJSON(updatedProperties); |
There was a problem hiding this comment.
Could you explain why are you calling this method twice in the same block? Just wanted to understand the functioning.
There was a problem hiding this comment.
- Call for the 1st element in
this.JSONXML. - Call for the remaining elements present in the above array using for loop
| if (property.name === 'w:t') { | ||
| return property.elements[0].text; | ||
| } | ||
| construtCiceroMarkNodeJSON(nodeInformation) { |
There was a problem hiding this comment.
construtCiceroMarkNodeJSON and constructNodes purposes are not clear from just reading the names of the methods or even docstrings. You have even mentioned or block CiceroMark Node in docstrings of constructNodes so why was construtCiceroMarkNodeJSON necessary in the first place?
Perusing your code made a little clearer that former creates atomic nodes like text, softbreak, etc. While the latter creates blocks for heading paragraph. So just write their method names properly without relying too much on the docstrings.
There was a problem hiding this comment.
The spelling of "construct" is wrong. I have suggested this before and I will suggest again using an IDE or a VSCode plugin for spell checkers.
There was a problem hiding this comment.
construtCiceroMarkNodeJSONandconstructNodespurposes are not clear from just reading the names of the methods or even docstrings. You have even mentionedor block CiceroMark Nodein docstrings ofconstructNodesso why wasconstrutCiceroMarkNodeJSONnecessary in the first place?Perusing your code made a little clearer that former creates atomic nodes like text, softbreak, etc. While the latter creates blocks for heading paragraph. So just write their method names properly without relying too much on the docstrings.
construtCiceroMarkNodeJSON: It creates a text, softbreak or any other inline node
constructNodes:This creates all the nodes present in ap paragraph, or heading node.
This it the exact use of the above.
| } else if (nodeInformation.nodeType === 'variable') { | ||
| obj = { | ||
| $class: `${NS_PREFIX_CiceroMarkModel}Variable`, | ||
| value: nodeInformation.value || 'Not provided', |
There was a problem hiding this comment.
If the value of the variable is '', you transformer will return 'Not provided. So I think you can omit the or condition.
Moreover, I don't think so a variable entity would be without these attributes ever so you should omit all the or conditions. However, if this was possible CiceroMark should have defined a default empty value for each of these attributes.
There was a problem hiding this comment.
This was done in the case that maybe a user uses sdt in word first and then tries to convert it in a ciceroMark. Although it won't be a variable and I agree here so I will remove this.
Return empty object if certain properties not present in variable Naming and dosctrings Refactor Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
|
@algomaster99 |
| $class: `${NS_PREFIX_CommonMarkModel}Softbreak`, | ||
| }; | ||
| } else if (nodeInformation.nodeType === 'variable') { | ||
| if (nodeInformation.elementType && nodeInformation.name && nodeInformation.value) { |
There was a problem hiding this comment.
I don't think so you need this check. There won't be any case where any of them is not there if a variable is present.
There was a problem hiding this comment.
There was a case present. This is the reason i used above.
There was a problem hiding this comment.
Is this case handled in any test case? If not, please push it.
There was a problem hiding this comment.
Actually, the test case file is deleted. I will try to recreate a similar test
There was a problem hiding this comment.
@algomaster99
I have pushed a commit which involves the test for this.
There was a problem hiding this comment.
{
"$class": "org.accordproject.commonmark.Document",
"xmlns": "http://commonmark.org/xml/1.0",
"nodes": [
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Try TemplateMark"
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Just trying "
},
{
"$class": "org.accordproject.commonmark.Strong",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "few things "
},
{
"$class": "org.accordproject.commonmark.Emph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "here"
}
]
}
]
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Strong",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "hello"
}
]
},
{
"$class": "org.accordproject.commonmark.Text",
"text": " "
},
{
"$class": "org.accordproject.commonmark.Strong",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "there"
}
]
}
]
}
]
}This is the test case you pushed but I cannot see any variable entity. So your program shouldn't even enter this if branch.
There was a problem hiding this comment.
To further prove my point, check this step out. The else branch is not being covered also.
| this.JSONXML = []; | ||
|
|
||
| // nodes of a paragraph or heading element | ||
| this.nodes = []; |
There was a problem hiding this comment.
Any reason why these are instance variables? Their content would be cleared after each block element is traversed so I think it would be better to pass such elements as parameters to functions where you need them.
There was a problem hiding this comment.
I was not able to properly use them in a function that is the solo reason i used them as above.
Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
| $class: `${NS_PREFIX_CommonMarkModel}Softbreak`, | ||
| }; | ||
| } else if (nodeInformation.nodeType === 'variable') { | ||
| if (nodeInformation.elementType && nodeInformation.name && nodeInformation.value) { |
There was a problem hiding this comment.
{
"$class": "org.accordproject.commonmark.Document",
"xmlns": "http://commonmark.org/xml/1.0",
"nodes": [
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Try TemplateMark"
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Just trying "
},
{
"$class": "org.accordproject.commonmark.Strong",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "few things "
},
{
"$class": "org.accordproject.commonmark.Emph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "here"
}
]
}
]
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Strong",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "hello"
}
]
},
{
"$class": "org.accordproject.commonmark.Text",
"text": " "
},
{
"$class": "org.accordproject.commonmark.Strong",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "there"
}
]
}
]
}
]
}This is the test case you pushed but I cannot see any variable entity. So your program shouldn't even enter this if branch.
|
@algomaster99 |
Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
|
@algomaster99 |
| $class: `${NS_PREFIX_CommonMarkModel}Softbreak`, | ||
| }; | ||
| } else if (nodeInformation.nodeType === 'variable') { | ||
| if (nodeInformation.elementType && nodeInformation.name && nodeInformation.value) { |
There was a problem hiding this comment.
To further prove my point, check this step out. The else branch is not being covered also.
| if (propertiesCurrent[j] === propertiesPrevious[j]) { | ||
| commonPropertiesLength++; | ||
| } else { | ||
| break; |
There was a problem hiding this comment.
This is also not covered by the coverage tool. Either your tests are weak or this is unnecessary.
There was a problem hiding this comment.
@algomaster99
else is not used currently since I am not having more inline elements which may cause this case to be called.
There was a problem hiding this comment.
Handle it later then. If, suppose, you don't require it later, this branch would dangle in the codebase without any purpose.
There was a problem hiding this comment.
@algomaster99
Updated the test case it now covers above.
Didn't get you. Could you please be clearer? |
I mean you also checked for the name, variable and type and then only defined that as variable. case 'w:sdt': {
const name = this.getName(element.elements);
const value = this.getValue(element.elements);
const elementType = this.getElementType(element.elements);
if (name && value && elementType) {
return {
$class: 'org.accordproject.ciceromark.Variable',
value,
name,
elementType,
};
}
return this.deserializeElements(element.elements);
} |
Well, you are here to improve the code, right? If the justification of your change is "Aman has done like that", you won't be able to improve it much. I wrote a broken transformer last year but take inspiration from it and please question things if they are now necessary or not. |
Sure, but I also have a doubt regarding this. Let's say anyone who used docx first and then he wanted to convert to ciceroMark. |
That's only possible when someone is writing the entire OOXML on their own. Or generating it and purposefully deleting its attribute. In both cases, it's the client's fault and we cannot do anything about it. |
| @@ -0,0 +1 @@ | |||
| {"$class":"org.accordproject.commonmark.Document","xmlns":"http://commonmark.org/xml/1.0","nodes":[{"$class":"org.accordproject.commonmark.Heading","level":"2","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"Try TemplateMark"}]},{"$class":"org.accordproject.commonmark.Paragraph","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"Just trying "},{"$class":"org.accordproject.commonmark.Strong","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"few things "},{"$class":"org.accordproject.commonmark.Emph","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"here"}]}]},{"$class":"org.accordproject.commonmark.Softbreak"},{"$class":"org.accordproject.commonmark.Strong","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"hello"}]},{"$class":"org.accordproject.commonmark.Text","text":" "},{"$class":"org.accordproject.commonmark.Strong","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"there"}]}]}]} No newline at end of file | |||
There was a problem hiding this comment.
Is this test doing something different than the test you proposed in #418 ?
There was a problem hiding this comment.
@algomaster99
it can explain why ternary operator is used in line 195.
There was a problem hiding this comment.
Are you talking about this line (} else if (runTimeProperties.name === 'w:b') {)? This doesn't have a ternary operator.
There was a problem hiding this comment.
nodeInformation.value = runTimeNodes.elements ? runTimeNodes.elements[0].text : ' ';
This line. I think it got shifted a bit in the latest commit.
There was a problem hiding this comment.
There was a problem hiding this comment.
The test is a special case. It does not increase coverage, but is there to tell why ternary operator is required and not simply write the expression runTimeNodes.elements[0].text
There was a problem hiding this comment.
Okay, now it makes sense.
Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
|
@algomaster99 |
Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>
|
@algomaster99 |
| @@ -0,0 +1 @@ | |||
| {"$class":"org.accordproject.commonmark.Document","xmlns":"http://commonmark.org/xml/1.0","nodes":[{"$class":"org.accordproject.commonmark.Paragraph","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"just trying "},{"$class":"org.accordproject.commonmark.Strong","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"strong"}]},{"$class":"org.accordproject.commonmark.Emph","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"italic"}]}]}]} No newline at end of file | |||
There was a problem hiding this comment.
Sometimes I wish JSON had comments but anyway, this sounds good.
| @@ -0,0 +1 @@ | |||
| {"$class":"org.accordproject.commonmark.Document","xmlns":"http://commonmark.org/xml/1.0","nodes":[{"$class":"org.accordproject.commonmark.Heading","level":"2","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"Try TemplateMark"}]},{"$class":"org.accordproject.commonmark.Paragraph","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"Just trying "},{"$class":"org.accordproject.commonmark.Strong","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"few things "},{"$class":"org.accordproject.commonmark.Emph","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"here"}]}]},{"$class":"org.accordproject.commonmark.Softbreak"},{"$class":"org.accordproject.commonmark.Strong","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"hello"}]},{"$class":"org.accordproject.commonmark.Text","text":" "},{"$class":"org.accordproject.commonmark.Strong","nodes":[{"$class":"org.accordproject.commonmark.Text","text":"there"}]}]}]} No newline at end of file | |||
There was a problem hiding this comment.
You aren't parsing the "ternary operator in text". You are parsing text nodes that only have a space. One question here - do you have an idea when MS Word interprets something as space and when as a softbreak? Shouldn't this space have been a softbreak instead? If no, could you send the markdown file you parsed this CiceroMark from?
There was a problem hiding this comment.
You aren't parsing the "ternary operator in text". You are parsing text nodes that only have a space. One question here - do you have an idea when MS Word interprets something as space and when as a softbreak? Shouldn't this space have been a softbreak instead? If no, could you send the markdown file you parsed this CiceroMark from?
I am not sure but I think xmljs might not consider space string as elements. Otherwise, it won't throw cannot read property 0 of undefined(AFAIR).
There was a problem hiding this comment.
You aren't parsing the "ternary operator in text". You are parsing text nodes that only have a space.
Said this with respect to the file name of the test case.
There was a problem hiding this comment.
Yeah, I got that. I was just explaining the reason for the space thing which might be due to xmljs. I will rename the file.
I mistakenly quoted the complete statement instead of the line
One question here - do you have an idea when MS Word interprets something as space and when as a softbreak? Shouldn't this space have been a softbreak instead?
There was a problem hiding this comment.
I am not sure but I think xmljs might not consider space string as elements. Otherwise, it won't throw cannot read property 0 of undefined(AFAIR).
I don't think so XMLJS is screwing up here. It's the template mark dingus because it should have interpreted that space as softbreak.
Anyway, if there are cases like this, then it makes sense to have that ternary operator.
There was a problem hiding this comment.
@algomaster99
Made the push. Is the current name correct?
Rename file to suit test file and not code execution Signed-off-by: K-Kumar-01 <kushalkumargupta4@gmail.com>

Signed-off-by: K-Kumar-01 kushalkumargupta4@gmail.com
This PR deals with rewriting the current OOXML transformer so that it supports nesting.
Changes
Major changes will be as follows:
Author Checklist
--signoffoption of git commit.masterfromfork:branchname