forked from mwilliamson/python-mammoth
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
302 lines (163 loc) · 6.12 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
# 1.4.7
* Always write files as UTF-8 in the CLI.
# 1.4.6
* Fix: default style mappings caused footnotes, endnotes and comments
containing multiple paragraphs to be converted into a single paragraph.
# 1.4.5
* Read the children of v:rect elements.
# 1.4.4
* Parse paragraph indents.
* Read part paths using relationships. This improves support for documents
created by Word Online.
# 1.4.3
* Add style mapping for small caps.
* Add style mapping for tables.
# 1.4.2
* Read children of v:group elements.
# 1.4.1
* Read w:noBreakHyphen elements as non-breaking hyphen characters.
# 1.4.0
* Extract the default data URI image converter to the images module.
* Add anchor on hyperlinks as fragment if present.
* Convert target frames on hyperlinks to targets on anchors.
* Detect header rows in tables and convert to thead > tr > th.
# 1.3.5
* Handle complex fields that do not have a "separate" fldChar.
# 1.3.4
* Add transforms.run.
# 1.3.3
* Read children of w:object elements.
* Add support for document transforms.
# 1.3.2
* Handle hyperlinks created with complex fields.
# 1.3.1
* Handle absolute paths within zip files. This should fix an issue where some
images within a document couldn't be found.
# 1.3.0
* Allow style names to be mapped by prefix. For instance:
r[style-name^='Code '] => code
* Add default style mappings for Heading 5 and Heading 6.
* Allow escape sequences in style IDs, style names and CSS class names.
* Allow a separator to be specified when HTML elements are collapsed.
* Add include_embedded_style_map argument to allow embedded style maps to be
disabled.
* Include embedded styles when explicit style map is passed.
# 1.2.2
* Ignore bold, italic, underline and strikethrough elements that have a value of
false or 0.
# 1.2.1
* Ignore v:imagedata elements without relationship ID with warning.
# 1.2.0
* Use alt text title as alt text for images when the alt text description is
blank or missing.
# 1.1.1
* Handle comments without author initials.
* Change numbering of comments to be global rather than per-user to match the
behaviour of Word.
# 1.1.0
* Add support for comments.
# 1.0.4
* Add support for w:sdt elements. This allows the bodies of content controls,
such as bibliographies, to be converted.
# 1.0.3
* Add support for table cells spanning multiple rows.
# 1.0.2
* Add support for table cells spanning multiple columns.
# 1.0.1
* Improve script installation on Windows by using entry_points instead of
scripts in setup.py.
# 1.0.0
* Remove deprecated convert_underline argument.
* Officially support ID prefixes.
* Generated IDs no longer insert a hyphen after the ID prefix.
* The default ID prefix is now the empty string rather than a random number
followed by a hyphen.
* Rename mammoth.images.inline to mammoth.images.img_element to better reflect
its behaviour.
# 0.3.31
* Improve collapsing of similar non-fresh HTML elements.
# 0.3.30
* Allow bold and italic style mappings to be configured.
# 0.3.29
* Handle references to missing styles when reading documents.
# 0.3.28
* Improve support for lists made in LibreOffice. Specifically, this changes the
default style mapping for paragraphs with a style of "Normal" to have the
lowest precedence.
# 0.3.27
* Handle XML where the child nodes of an element contains text nodes.
# 0.3.26
* Always use mc:Fallback when reading mc:AlternateContent elements.
# 0.3.25
* Remove duplicate messages from results.
* Read v:imagedata with r:id attribute.
* Read children of v:roundrect.
* Ignore office-word:wrap, v:shadow and v:shapetype.
# 0.3.24
* Continue with warning if external images cannot be found.
* Add support for embedded style maps.
# 0.3.23
* Fix Python 3 support.
# 0.3.22
* Generate warnings for not-understood style mappings and continue, rather than
stopping with an error.
* Support file objects without a name attribute again (broken since 0.3.20).
# 0.3.21
* Ignore w:numPr elements without w:numId or w:ilvl children.
# 0.3.20
* Add support for linked images.
# 0.3.19
* Fix: cannot extract raw text from elements without children
# 0.3.18
* Support links and images in footnotes and endnotes.
# 0.3.17
* Add support for underlines in style map.
* Add support for strikethrough.
# 0.3.16
* Add basic support for text boxes. The contents of the text box are treated as
a separate paragraph that appears after the paragraph containing the text box.
# 0.3.15
* Support styles defined without a name
# 0.3.14
* Add ignore_empty_paragraphs option, which defaults to True.
# 0.3.13
* Always use forward slashes in ZIP paths. This should fix image handling on
Windows.
# 0.3.12
* Make style names case-insensitive in style mappings. This should make style
mappings easier to write, especially since Microsoft Word sometimes represents
style names in the UI differently from in the style definition. For instance,
the style displayed in Word as "Heading 1" has a style name of "heading 1".
This hopefully shouldn't cause an issue for anyone, but if you were relying
on case-sensitivity, please do get in touch.
# 0.3.11
* Add support for hyperlinks to bookmarks in the same document.
# 0.3.10
* Add basic support for Markdown. Not all features are currently supported.
# 0.3.9
* Add default style mappings for builtin footnote and endnote styles in
Microsoft Word and LibreOffice.
* Allow style mappings with a zero-element HTML path.
* Emit warnings when image types are unlikely to be supported by web browsers.
# 0.3.8
* Add support for endnotes.
# 0.3.7
* Add support for superscript and subscript text.
# 0.3.6
* Add support for footnotes.
# 0.3.5
* Add support for line breaks.
# 0.3.4
* Add optional underline conversion.
# 0.3.3
* Add `mammoth.images.inline`, and document custom image conversion.
# 0.3.2
* Add the function `mammoth.extract_raw_text`.
# 0.3.1
* Add support for tables
# 0.3.0
* Rename --styles CLI argument to --style-map.
* Rename styles argument in convert_to_html to style_map.
* Allow paragraphs and runs to be matched by style name. For instance, to match
a paragraph with the style name `Heading 1`:
p[style-name='Heading 1']