2004年03月10日 星期三 15:43
I copied this sample xml file from the web:And then when I tried to parse it using the following python code: from xml.dom import minidom xmldoc = minidom.parse('samplexml.xml') print xmldoc.toxml() Python still says that the xml is not well-formed. See below: Traceback (most recent call last): File "C:\Python23\codes\xmltest.py", line 4, in -toplevel- xmldoc = minidom.parse('samplexml.xml') File "C:\Python23\lib\xml\dom\minidom.py", line 1919, in parse return expatbuilder.parse(file) File "C:\Python23\lib\xml\dom\expatbuilder.py", line 924, in parse result = builder.parseFile(fp) File "C:\Python23\lib\xml\dom\expatbuilder.py", line 207, in parseFile parser.Parse(buffer, 0) ExpatError: not well-formed (invalid token): line 1, column 5 How come? What can I do about this? __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com Sample Document Brandon Voss The XML Pages This is element text and an entity follows:&Description;
2004年03月10日 星期三 15:55
Hello Anthony, "" is unnecessary !!! === [ 15:43 ; 04-03-10 ] you wrote: AL> I copied this sample xml file from the web: AL> AL> AL>AL> AL> And then when I tried to parse it using the following AL> python code: AL> from xml.dom import minidom AL> xmldoc = minidom.parse('samplexml.xml') AL> print xmldoc.toxml() AL> Python still says that the xml is not well-formed. AL> See below: AL> Traceback (most recent call last): AL> File "C:\Python23\codes\xmltest.py", line 4, in AL> -toplevel- AL> xmldoc = minidom.parse('samplexml.xml') AL> File "C:\Python23\lib\xml\dom\minidom.py", line AL> 1919, in parse AL> return expatbuilder.parse(file) AL> File "C:\Python23\lib\xml\dom\expatbuilder.py", line AL> 924, in parse AL> result = builder.parseFile(fp) AL> File "C:\Python23\lib\xml\dom\expatbuilder.py", line AL> 207, in parseFile AL> parser.Parse(buffer, 0) AL> ExpatError: not well-formed (invalid token): line 1, AL> column 5 AL> How come? AL> What can I do about this? AL> __________________________________ AL> Do you Yahoo!? AL> Yahoo! Search - Find what youre looking for faster AL> http://search.yahoo.com === === === === === === === === === === -- Best regards, Zoom.Quiet /=======================================\ ]Time is unimportant, only life important![ \=======================================/Sample Document AL>AL> Brandon AL>Voss AL>The XML Pages This is element text and an entity AL> follows:&Description; AL> AL>
2004年03月10日 星期三 15:57
ok, then let me give it another try. --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "" > is unnecessary !!! > > === [ 15:43 ; 04-03-10 ] you wrote: > > AL> I copied this sample xml file from the web: > > AL> > AL> > AL>> AL> Sample Document > AL>> AL> Brandon Voss > AL>The XML Pages > AL>This is element text and an entity > AL> follows:&Description; > AL> > AL> > > AL> And then when I tried to parse it using the > following > AL> python code: > > AL> from xml.dom import minidom > AL> xmldoc = minidom.parse('samplexml.xml') > AL> print xmldoc.toxml() > > AL> Python still says that the xml is not > well-formed. > AL> See below: > > AL> Traceback (most recent call last): > AL> File "C:\Python23\codes\xmltest.py", line 4, > in > AL> -toplevel- > AL> xmldoc = minidom.parse('samplexml.xml') > AL> File "C:\Python23\lib\xml\dom\minidom.py", > line > AL> 1919, in parse > AL> return expatbuilder.parse(file) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 924, in parse > AL> result = builder.parseFile(fp) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 207, in parseFile > AL> parser.Parse(buffer, 0) > AL> ExpatError: not well-formed (invalid token): > line 1, > AL> column 5 > > AL> How come? > > AL> What can I do about this? > > > AL> __________________________________ > AL> Do you Yahoo!? > AL> Yahoo! Search - Find what you抮e looking for > faster > AL> http://search.yahoo.com > > === === === === === === === === === === > > -- > Best regards, > Zoom.Quiet > > /=======================================\ > ]Time is unimportant, only life important![ > \=======================================/ > __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 15:59
Man, I removed that line, but the problem remains. Watch this: ExpatError: not well-formed (invalid token): line 1, column 5 --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "" > is unnecessary !!! > > === [ 15:43 ; 04-03-10 ] you wrote: > > AL> I copied this sample xml file from the web: > > AL> > AL> > AL>> AL> Sample Document > AL>> AL> Brandon Voss > AL>The XML Pages > AL>This is element text and an entity > AL> follows:&Description; > AL> > AL> > > AL> And then when I tried to parse it using the > following > AL> python code: > > AL> from xml.dom import minidom > AL> xmldoc = minidom.parse('samplexml.xml') > AL> print xmldoc.toxml() > > AL> Python still says that the xml is not > well-formed. > AL> See below: > > AL> Traceback (most recent call last): > AL> File "C:\Python23\codes\xmltest.py", line 4, > in > AL> -toplevel- > AL> xmldoc = minidom.parse('samplexml.xml') > AL> File "C:\Python23\lib\xml\dom\minidom.py", > line > AL> 1919, in parse > AL> return expatbuilder.parse(file) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 924, in parse > AL> result = builder.parseFile(fp) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 207, in parseFile > AL> parser.Parse(buffer, 0) > AL> ExpatError: not well-formed (invalid token): > line 1, > AL> column 5 > > AL> How come? > > AL> What can I do about this? > > > AL> __________________________________ > AL> Do you Yahoo!? > AL> Yahoo! Search - Find what you抮e looking for > faster > AL> http://search.yahoo.com > > === === === === === === === === === === > > -- > Best regards, > Zoom.Quiet > > /=======================================\ > ]Time is unimportant, only life important![ > \=======================================/ > __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 16:02
我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 ------- Explicit is better than implicit ... -----Original Message----- From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] Sent: 2004年3月10日 16:00 To: pycn Subject: Re: [python-chinese] strage minidom or xml Man, I removed that line, but the problem remains. Watch this: ExpatError: not well-formed (invalid token): line 1, column 5 --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "" > is unnecessary !!! > > === [ 15:43 ; 04-03-10 ] you wrote: > > AL> I copied this sample xml file from the web: > > AL> > AL> > AL>> AL> Sample Document > AL>> AL> Brandon Voss > AL>The XML Pages > AL>This is element text and an entity follows:&Description; > AL> > AL> > > AL> And then when I tried to parse it using the > following > AL> python code: > > AL> from xml.dom import minidom > AL> xmldoc = minidom.parse('samplexml.xml') > AL> print xmldoc.toxml() > > AL> Python still says that the xml is not > well-formed. > AL> See below: > > AL> Traceback (most recent call last): > AL> File "C:\Python23\codes\xmltest.py", line 4, > in > AL> -toplevel- > AL> xmldoc = minidom.parse('samplexml.xml') > AL> File "C:\Python23\lib\xml\dom\minidom.py", > line > AL> 1919, in parse > AL> return expatbuilder.parse(file) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 924, in parse > AL> result = builder.parseFile(fp) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 207, in parseFile > AL> parser.Parse(buffer, 0) > AL> ExpatError: not well-formed (invalid token): > line 1, > AL> column 5 > > AL> How come? > > AL> What can I do about this?
2004年03月10日 星期三 16:08
Hello Anthony, ¿ÉÄÜÊǹí×Ö·ûÁË£¡ ÖØÐÂʹÓÃxmlSpy Ö®ÀàµÄXML±à¼Æ÷Éú³ÉÒ»¸öXMLÎĵµ£¬ ÓÉÆäÏÈÈ·ÈÏÁ¼¹¹·ñ°É£¡ === [ 15:59 ; 04-03-10 ] you wrote: AL> Man, I removed that line, but the problem remains. AL> Watch this: AL> ExpatError: not well-formed (invalid token): line 1, AL> column 5 AL> --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: >> Hello Anthony, >> >> "" >> is unnecessary !!! >> >> === [ 15:43 ; 04-03-10 ] you wrote: >> >> AL> I copied this sample xml file from the web: >> >> AL> >> AL> >> AL>>> AL> Sample Document >> AL>>> AL> Brandon Voss >> AL>The XML Pages >> AL>This is element text and an entity >> AL> follows:&Description; >> AL> >> AL> >> >> AL> And then when I tried to parse it using the >> following >> AL> python code: >> >> AL> from xml.dom import minidom >> AL> xmldoc = minidom.parse('samplexml.xml') >> AL> print xmldoc.toxml() >> >> AL> Python still says that the xml is not >> well-formed. >> AL> See below: >> >> AL> Traceback (most recent call last): >> AL> File "C:\Python23\codes\xmltest.py", line 4, >> in >> AL> -toplevel- >> AL> xmldoc = minidom.parse('samplexml.xml') >> AL> File "C:\Python23\lib\xml\dom\minidom.py", >> line >> AL> 1919, in parse >> AL> return expatbuilder.parse(file) >> AL> File >> "C:\Python23\lib\xml\dom\expatbuilder.py", line >> AL> 924, in parse >> AL> result = builder.parseFile(fp) >> AL> File >> "C:\Python23\lib\xml\dom\expatbuilder.py", line >> AL> 207, in parseFile >> AL> parser.Parse(buffer, 0) >> AL> ExpatError: not well-formed (invalid token): >> line 1, >> AL> column 5 >> >> AL> How come? >> >> AL> What can I do about this? >> >> >> AL> __________________________________ >> AL> Do you Yahoo!? >> AL> Yahoo! Search - Find what you抮e looking AL> for >> faster >> AL> http://search.yahoo.com >> >> === === === === === === === === === === >> >> -- >> Best regards, >> Zoom.Quiet >> >> /=======================================\ >> ]Time is unimportant, only life important![ >> \=======================================/ >> AL> __________________________________ AL> Do you Yahoo!? AL> Yahoo! Search - Find what youre looking for faster AL> http://search.yahoo.com === === === === === === === === === === -- Best regards, Zoom.Quiet /=======================================\ ]Time is unimportant, only life important![ \=======================================/
2004年03月10日 星期三 16:11
You are suggesting me to take a look at expatbuilder.py? --- Jacob Fan <jacob at exoweb.net> wrote: > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > > ------- > Explicit is better than implicit ... > > -----Original Message----- > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > Sent: 2004年3月10日 16:00 > To: pycn > Subject: Re: [python-chinese] strage minidom or xml > > > Man, I removed that line, but the problem remains. > Watch this: > > ExpatError: not well-formed (invalid token): line 1, > column 5 > > > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > > Hello Anthony, > > > > "" > > is unnecessary !!! > > > > === [ 15:43 ; 04-03-10 ] you wrote: > > > > AL> I copied this sample xml file from the web: > > > > AL> > > AL> > > AL>> > AL> Sample Document > > AL>> > AL> Brandon Voss > > AL>The XML Pages > > AL>This is element text and an entity > follows:&Description; > > AL> > > AL> > > > > AL> And then when I tried to parse it using the > > following > > AL> python code: > > > > AL> from xml.dom import minidom > > AL> xmldoc = minidom.parse('samplexml.xml') > > AL> print xmldoc.toxml() > > > > AL> Python still says that the xml is not > > well-formed. > > AL> See below: > > > > AL> Traceback (most recent call last): > > AL> File "C:\Python23\codes\xmltest.py", line 4, > > in > > AL> -toplevel- > > AL> xmldoc = minidom.parse('samplexml.xml') > > AL> File "C:\Python23\lib\xml\dom\minidom.py", > > line > > AL> 1919, in parse > > AL> return expatbuilder.parse(file) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 924, in parse > > AL> result = builder.parseFile(fp) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 207, in parseFile > > AL> parser.Parse(buffer, 0) > > AL> ExpatError: not well-formed (invalid token): > > line 1, > > AL> column 5 > > > > AL> How come? > > > > AL> What can I do about this? > > _______________________________________________ > python-chinese list > python-chinese at lists.python.cn > http://python.cn/mailman/listinfo/python-chinese __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 16:18
Please look at the traceback? If this is your script, how do you debug = it? ;) First look at here expatbuilder.py line 207: > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 207, in parseFile > > AL> parser.Parse(buffer, 0) > > AL> ExpatError: not well-formed (invalid token): > > line 1, > > AL> column 5 The ExpatError is thrown by parser.Parse We could add a print statement above parser.Parse(buffer,0) to see which = parser does it actually use. Then look into that parser to see where it = throws a ExpatError with the message "not well-formed(invalid token)". = But before that, maybe we can just, as Zoom.Quiet said, check if there = are a ghost character. If you have something such as UltraEdit, you may = use it to see if there are strange characters in the file. Or just use a = known good file to check. ------- Explicit is better than implicit ...=20 -----Original Message----- From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]=20 Sent: 2004=C4=EA3=D4=C210=C8=D5 16:11 To: python-chinese at lists.python.cn Subject: RE: [python-chinese] strage minidom or xml You are suggesting me to take a look at expatbuilder.py? --- Jacob Fan <jacob at exoweb.net> wrote: > 我建议你到源码里面&= #30475;看。我每次遇到这= 31181;问题就先去看代码= 5292;看看某个结果是怎= 040;出来的。 >=20 > ------- > Explicit is better than implicit ... >=20 > -----Original Message----- > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > Sent: 2004年3月10日 16:00 > To: pycn > Subject: Re: [python-chinese] strage minidom or xml >=20 >=20 > Man, I removed that line, but the problem remains. > Watch this: >=20 > ExpatError: not well-formed (invalid token): line 1, > column 5 >=20 >=20 > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > > Hello Anthony, > >=20 > > "" > > is unnecessary !!! > >=20 > > =3D=3D=3D [ 15:43 ; 04-03-10 ] you wrote: > >=20 > > AL> I copied this sample xml file from the web: > >=20 > > AL> > > AL> > > AL>> > AL> Sample Document > > AL>> > AL> Brandon Voss > > AL>The XML Pages > > AL>This is element text and an entity > follows:&Description; > > AL> > > AL> > >=20 > > AL> And then when I tried to parse it using the > > following > > AL> python code: > >=20 > > AL> from xml.dom import minidom > > AL> xmldoc =3D minidom.parse('samplexml.xml') > > AL> print xmldoc.toxml() > >=20 > > AL> Python still says that the xml is not > > well-formed. > > AL> See below: > >=20 > > AL> Traceback (most recent call last): > > AL> File "C:\Python23\codes\xmltest.py", line 4, > > in > > AL> -toplevel- > > AL> xmldoc =3D minidom.parse('samplexml.xml') > > AL> File "C:\Python23\lib\xml\dom\minidom.py", > > line > > AL> 1919, in parse > > AL> return expatbuilder.parse(file) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 924, in parse > > AL> result =3D builder.parseFile(fp) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 207, in parseFile > > AL> parser.Parse(buffer, 0) > > AL> ExpatError: not well-formed (invalid token): > > line 1, > > AL> column 5 > >=20 > > AL> How come? > >=20 > > AL> What can I do about this? >=20 > _______________________________________________ > python-chinese list > python-chinese at lists.python.cn=20 > http://python.cn/mailman/listinfo/python-chinese __________________________________ Do you Yahoo!? Yahoo! Search - Find what you=92re looking for faster = http://search.yahoo.com
2004年03月10日 星期三 16:27
I really don't know what happened to the code. I tested that code and the sample xml file on the Mandrake system, and I still get the same error message: not well-formed. O, my gosh, I am really fed up with it. --- Jacob Fan <jacob at exoweb.net> wrote: > Please look at the traceback? If this is your > script, how do you debug it? ;) > First look at here expatbuilder.py line 207: > > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > > AL> 207, in parseFile > > > AL> parser.Parse(buffer, 0) > > > AL> ExpatError: not well-formed (invalid token): > > > line 1, > > > AL> column 5 > The ExpatError is thrown by parser.Parse > We could add a print statement above > parser.Parse(buffer,0) to see which parser does it > actually use. Then look into that parser to see > where it throws a ExpatError with the message "not > well-formed(invalid token)". But before that, maybe > we can just, as Zoom.Quiet said, check if there are > a ghost character. If you have something such as > UltraEdit, you may use it to see if there are > strange characters in the file. Or just use a known > good file to check. > > ------- > Explicit is better than implicit ... > > -----Original Message----- > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > Sent: 2004年3月10日 16:11 > To: python-chinese at lists.python.cn > Subject: RE: [python-chinese] strage minidom or xml > > > You are suggesting me to take a look at > expatbuilder.py? > > --- Jacob Fan <jacob at exoweb.net> wrote: > > > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > > > > ------- > > Explicit is better than implicit ... > > > > -----Original Message----- > > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > > Sent: 2004年3月10日 16:00 > > To: pycn > > Subject: Re: [python-chinese] strage minidom or > xml > > > > > > Man, I removed that line, but the problem remains. > > Watch this: > > > > ExpatError: not well-formed (invalid token): line > 1, > > column 5 > > > > > > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > > > Hello Anthony, > > > > > > "" > > > is unnecessary !!! > > > > > > === [ 15:43 ; 04-03-10 ] you wrote: > > > > > > AL> I copied this sample xml file from the web: > > > > > > AL> > > > AL> > > > AL>> > > AL> Sample Document > > > AL>> > > AL> Brandon Voss > > > AL>The XML Pages > > > AL>This is element text and an entity > > follows:&Description; > > > AL> > > > AL> > > > > > > AL> And then when I tried to parse it using the > > > following > > > AL> python code: > > > > > > AL> from xml.dom import minidom > > > AL> xmldoc = minidom.parse('samplexml.xml') > > > AL> print xmldoc.toxml() > > > > > > AL> Python still says that the xml is not > > > well-formed. > > > AL> See below: > > > > > > AL> Traceback (most recent call last): > > > AL> File "C:\Python23\codes\xmltest.py", line > 4, > > > in > > > AL> -toplevel- > > > AL> xmldoc = minidom.parse('samplexml.xml') > > > AL> File "C:\Python23\lib\xml\dom\minidom.py", > > > line > > > AL> 1919, in parse > > > AL> return expatbuilder.parse(file) > > > AL> File > > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > > AL> 924, in parse > > > AL> result = builder.parseFile(fp) > > > AL> File > > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > > AL> 207, in parseFile > > > AL> parser.Parse(buffer, 0) > > > AL> ExpatError: not well-formed (invalid token): > > > line 1, > > > AL> column 5 > > > > > > AL> How come? > > > > > > AL> What can I do about this? > > > > _______________________________________________ > > python-chinese list > > python-chinese at lists.python.cn > > http://python.cn/mailman/listinfo/python-chinese > > > __________________________________ > Do you Yahoo!? > Yahoo! Search - Find what you抮e looking for faster > http://search.yahoo.com > _______________________________________________ > python-chinese list > python-chinese at lists.python.cn > http://python.cn/mailman/listinfo/python-chinese __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 16:43
Hello Anthony,
"Mandrake"??
"C:\Python23\"??
WHAT SYSTEM U RUNNING PYTHON??
so so at frist use Py test weel-format self!
"""
from xml.sax.handler import ContentHandler
from xml.sax import make_parser
from glob import glob
import sys
def parsefile(file):
parser = make_parser( )
parser.setContentHandler(ContentHandler( ))
parser.parse(file)
for arg in sys.argv[1:]:
for filename in glob(arg):
try:
parsefile(filename)
print "%s is well-formed" % filename
except Exception, e:
print "%s is NOT well-formed! %s" % (filename, e)
"""
and try expat to parsers ??
minidom is poor and slow...
"""
import xml.parsers.expat
# 3 handler functions
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
p.Parse("""
Text goes here
More text
""")
"""
=== [ 16:27 ; 04-03-10 ] you wrote:
AL> I really don't know what happened to the code. I
AL> tested that code and the sample xml file on the
AL> Mandrake system, and I still get the same error
AL> message: not well-formed.
AL> O, my gosh, I am really fed up with it.
AL> --- Jacob Fan <jacob at exoweb.net> wrote:
>> Please look at the traceback? If this is your
>> script, how do you debug it? ;)
>> First look at here expatbuilder.py line 207:
>> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> > > AL> 207, in parseFile
>> > > AL> parser.Parse(buffer, 0)
>> > > AL> ExpatError: not well-formed (invalid token):
>> > > line 1,
>> > > AL> column 5
>> The ExpatError is thrown by parser.Parse
>> We could add a print statement above
>> parser.Parse(buffer,0) to see which parser does it
>> actually use. Then look into that parser to see
>> where it throws a ExpatError with the message "not
>> well-formed(invalid token)". But before that, maybe
>> we can just, as Zoom.Quiet said, check if there are
>> a ghost character. If you have something such as
>> UltraEdit, you may use it to see if there are
>> strange characters in the file. Or just use a known
>> good file to check.
>>
>> -------
>> Explicit is better than implicit ...
>>
>> -----Original Message-----
>> From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]
>> Sent: 2004年3月10日 16:11
>> To: python-chinese at lists.python.cn
>> Subject: RE: [python-chinese] strage minidom or xml
>>
>>
>> You are suggesting me to take a look at
>> expatbuilder.py?
>>
>> --- Jacob Fan <jacob at exoweb.net> wrote:
>> >
>>
AL> 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。
>> >
>> > -------
>> > Explicit is better than implicit ...
>> >
>> > -----Original Message-----
>> > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]
>> > Sent: 2004年3月10日 16:00
>> > To: pycn
>> > Subject: Re: [python-chinese] strage minidom or
>> xml
>> >
>> >
>> > Man, I removed that line, but the problem remains.
>> > Watch this:
>> >
>> > ExpatError: not well-formed (invalid token): line
>> 1,
>> > column 5
>> >
>> >
>> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
>> > > Hello Anthony,
>> > >
>> > > ""
>> > > is unnecessary !!!
>> > >
>> > > === [ 15:43 ; 04-03-10 ] you wrote:
>> > >
>> > > AL> I copied this sample xml file from the web:
>> > >
>> > > AL>
>> > > AL>
>> > > AL>
>> > > AL> Sample Document
>> > > AL> Brandon
>> > > AL> Voss
>> > > AL> The XML Pages
>> > > AL> This is element text and an entity
>> > follows:&Description;
>> > > AL>
>> > > AL>
>> > >
>> > > AL> And then when I tried to parse it using the
>> > > following
>> > > AL> python code:
>> > >
>> > > AL> from xml.dom import minidom
>> > > AL> xmldoc = minidom.parse('samplexml.xml')
>> > > AL> print xmldoc.toxml()
>> > >
>> > > AL> Python still says that the xml is not
>> > > well-formed.
>> > > AL> See below:
>> > >
>> > > AL> Traceback (most recent call last):
>> > > AL> File "C:\Python23\codes\xmltest.py", line
>> 4,
>> > > in
>> > > AL> -toplevel-
>> > > AL> xmldoc = minidom.parse('samplexml.xml')
>> > > AL> File "C:\Python23\lib\xml\dom\minidom.py",
>> > > line
>> > > AL> 1919, in parse
>> > > AL> return expatbuilder.parse(file)
>> > > AL> File
>> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> > > AL> 924, in parse
>> > > AL> result = builder.parseFile(fp)
>> > > AL> File
>> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> > > AL> 207, in parseFile
>> > > AL> parser.Parse(buffer, 0)
>> > > AL> ExpatError: not well-formed (invalid token):
>> > > line 1,
>> > > AL> column 5
>> > >
>> > > AL> How come?
>> > >
>> > > AL> What can I do about this?
>> >
>> > _______________________________________________
>> > python-chinese list
>> > python-chinese at lists.python.cn
>> > http://python.cn/mailman/listinfo/python-chinese
>>
>>
>> __________________________________
>> Do you Yahoo!?
>> Yahoo! Search - Find what you抮e looking for
AL> faster
>> http://search.yahoo.com
>> _______________________________________________
>> python-chinese list
>> python-chinese at lists.python.cn
>> http://python.cn/mailman/listinfo/python-chinese
AL> __________________________________
AL> Do you Yahoo!?
AL> Yahoo! Search - Find what youre looking for faster
AL> http://search.yahoo.com
=== === === === === === === === === ===
--
Best regards,
Zoom.Quiet
/=======================================\
]Time is unimportant, only life important![
\=======================================/
2004年03月10日 星期三 17:05
I tested it on both Mandrake and Win2K, it worked on neither of them. --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "Mandrake"?? > "C:\Python23\"?? > > WHAT SYSTEM U RUNNING PYTHON?? > > so so at frist use Py test weel-format self! > """ > from xml.sax.handler import ContentHandler > from xml.sax import make_parser > from glob import glob > import sys > > def parsefile(file): > parser = make_parser( ) > parser.setContentHandler(ContentHandler( )) > parser.parse(file) > > for arg in sys.argv[1:]: > for filename in glob(arg): > try: > parsefile(filename) > print "%s is well-formed" % filename > except Exception, e: > print "%s is NOT well-formed! %s" % > (filename, e) > """ > > and try expat to parsers ?? > minidom is poor and slow... > """ > import xml.parsers.expat > > # 3 handler functions > def start_element(name, attrs): > print 'Start element:', name, attrs > def end_element(name): > print 'End element:', name > def char_data(data): > print 'Character data:', repr(data) > > p = xml.parsers.expat.ParserCreate() > > p.StartElementHandler = start_element > p.EndElementHandler = end_element > p.CharacterDataHandler = char_data > > p.Parse(""" >> here > Text goes More text > """) > > """ > > === [ 16:27 ; 04-03-10 ] you wrote: > > AL> I really don't know what happened to the code. I > AL> tested that code and the sample xml file on the > AL> Mandrake system, and I still get the same error > AL> message: not well-formed. > > AL> O, my gosh, I am really fed up with it. > > > AL> --- Jacob Fan <jacob at exoweb.net> wrote: > >> Please look at the traceback? If this is your > >> script, how do you debug it? ;) > >> First look at here expatbuilder.py line 207: > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > >> > > line 1, > >> > > AL> column 5 > >> The ExpatError is thrown by parser.Parse > >> We could add a print statement above > >> parser.Parse(buffer,0) to see which parser does > it > >> actually use. Then look into that parser to see > >> where it throws a ExpatError with the message > "not > >> well-formed(invalid token)". But before that, > maybe > >> we can just, as Zoom.Quiet said, check if there > are > >> a ghost character. If you have something such as > >> UltraEdit, you may use it to see if there are > >> strange characters in the file. Or just use a > known > >> good file to check. > >> > >> ------- > >> Explicit is better than implicit ... > >> > >> -----Original Message----- > >> From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> Sent: 2004年3月10日 16:11 > >> To: python-chinese at lists.python.cn > >> Subject: RE: [python-chinese] strage minidom or > xml > >> > >> > >> You are suggesting me to take a look at > >> expatbuilder.py? > >> > >> --- Jacob Fan <jacob at exoweb.net> wrote: > >> > > >> > AL> > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > >> > > >> > ------- > >> > Explicit is better than implicit ... > >> > > >> > -----Original Message----- > >> > From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> > Sent: 2004年3月10日 16:00 > >> > To: pycn > >> > Subject: Re: [python-chinese] strage minidom or > >> xml > >> > > >> > > >> > Man, I removed that line, but the problem > remains. > >> > Watch this: > >> > > >> > ExpatError: not well-formed (invalid token): > line > >> 1, > >> > column 5 > >> > > >> > > >> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > >> > > Hello Anthony, > >> > > > >> > > "" > >> > > is unnecessary !!! > >> > > > >> > > === [ 15:43 ; 04-03-10 ] you wrote: > >> > > > >> > > AL> I copied this sample xml file from the > web: > >> > > > >> > > AL> > >> > > AL> > >> > > AL>> >> > > AL> Sample Document > >> > > AL>> >> > > AL> Brandon Voss > >> > > AL>The XML Pages > > >> > > AL>This is element text and an > entity > >> > follows:&Description; > >> > > AL> > >> > > AL> > >> > > > >> > > AL> And then when I tried to parse it using > the > >> > > following > >> > > AL> python code: > >> > > > >> > > AL> from xml.dom import minidom > >> > > AL> xmldoc = minidom.parse('samplexml.xml') > >> > > AL> print xmldoc.toxml() > >> > > > >> > > AL> Python still says that the xml is not > >> > > well-formed. > >> > > AL> See below: > >> > > > >> > > AL> Traceback (most recent call last): > >> > > AL> File "C:\Python23\codes\xmltest.py", > line > >> 4, > >> > > in > >> > > AL> -toplevel- > >> > > AL> xmldoc = > minidom.parse('samplexml.xml') > >> > > AL> File > "C:\Python23\lib\xml\dom\minidom.py", > >> > > line > >> > > AL> 1919, in parse > >> > > AL> return expatbuilder.parse(file) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 924, in parse > >> > > AL> result = builder.parseFile(fp) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > === message truncated === __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月11日 星期四 00:47
The parse is successful if I lower-case the "xml" in the declaration of the xml document, and meanwhile remove the ampersand (&) before "Description". But if I insert some Chinese characters into the xml document, the same sample python code cannot parse it. The code got stuck whenever it hits the 1st Chinese character. Python complains: ExpatError: not well-formed (invalid token): line 3, column 7 where lin3 and column 7 pinpoints the 1st byte of the 1st Chinese character in the xml document. How can I correctly parse an xml document containing Chinese using python? Give a hint, please. --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "Mandrake"?? > "C:\Python23\"?? > > WHAT SYSTEM U RUNNING PYTHON?? > > so so at frist use Py test weel-format self! > """ > from xml.sax.handler import ContentHandler > from xml.sax import make_parser > from glob import glob > import sys > > def parsefile(file): > parser = make_parser( ) > parser.setContentHandler(ContentHandler( )) > parser.parse(file) > > for arg in sys.argv[1:]: > for filename in glob(arg): > try: > parsefile(filename) > print "%s is well-formed" % filename > except Exception, e: > print "%s is NOT well-formed! %s" % > (filename, e) > """ > > and try expat to parsers ?? > minidom is poor and slow... > """ > import xml.parsers.expat > > # 3 handler functions > def start_element(name, attrs): > print 'Start element:', name, attrs > def end_element(name): > print 'End element:', name > def char_data(data): > print 'Character data:', repr(data) > > p = xml.parsers.expat.ParserCreate() > > p.StartElementHandler = start_element > p.EndElementHandler = end_element > p.CharacterDataHandler = char_data > > p.Parse(""" >> here > Text goes More text > """) > > """ > > === [ 16:27 ; 04-03-10 ] you wrote: > > AL> I really don't know what happened to the code. I > AL> tested that code and the sample xml file on the > AL> Mandrake system, and I still get the same error > AL> message: not well-formed. > > AL> O, my gosh, I am really fed up with it. > > > AL> --- Jacob Fan <jacob at exoweb.net> wrote: > >> Please look at the traceback? If this is your > >> script, how do you debug it? ;) > >> First look at here expatbuilder.py line 207: > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > >> > > line 1, > >> > > AL> column 5 > >> The ExpatError is thrown by parser.Parse > >> We could add a print statement above > >> parser.Parse(buffer,0) to see which parser does > it > >> actually use. Then look into that parser to see > >> where it throws a ExpatError with the message > "not > >> well-formed(invalid token)". But before that, > maybe > >> we can just, as Zoom.Quiet said, check if there > are > >> a ghost character. If you have something such as > >> UltraEdit, you may use it to see if there are > >> strange characters in the file. Or just use a > known > >> good file to check. > >> > >> ------- > >> Explicit is better than implicit ... > >> > >> -----Original Message----- > >> From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> Sent: 2004年3月10日 16:11 > >> To: python-chinese at lists.python.cn > >> Subject: RE: [python-chinese] strage minidom or > xml > >> > >> > >> You are suggesting me to take a look at > >> expatbuilder.py? > >> > >> --- Jacob Fan <jacob at exoweb.net> wrote: > >> > > >> > AL> > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > >> > > >> > ------- > >> > Explicit is better than implicit ... > >> > > >> > -----Original Message----- > >> > From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> > Sent: 2004年3月10日 16:00 > >> > To: pycn > >> > Subject: Re: [python-chinese] strage minidom or > >> xml > >> > > >> > > >> > Man, I removed that line, but the problem > remains. > >> > Watch this: > >> > > >> > ExpatError: not well-formed (invalid token): > line > >> 1, > >> > column 5 > >> > > >> > > >> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > >> > > Hello Anthony, > >> > > > >> > > "" > >> > > is unnecessary !!! > >> > > > >> > > === [ 15:43 ; 04-03-10 ] you wrote: > >> > > > >> > > AL> I copied this sample xml file from the > web: > >> > > > >> > > AL> > >> > > AL> > >> > > AL>> >> > > AL> Sample Document > >> > > AL>> >> > > AL> Brandon Voss > >> > > AL>The XML Pages > > >> > > AL>This is element text and an > entity > >> > follows:&Description; > >> > > AL> > >> > > AL> > >> > > > >> > > AL> And then when I tried to parse it using > the > >> > > following > >> > > AL> python code: > >> > > > >> > > AL> from xml.dom import minidom > >> > > AL> xmldoc = minidom.parse('samplexml.xml') > >> > > AL> print xmldoc.toxml() > >> > > > >> > > AL> Python still says that the xml is not > >> > > well-formed. > >> > > AL> See below: > >> > > > >> > > AL> Traceback (most recent call last): > >> > > AL> File "C:\Python23\codes\xmltest.py", > line > >> 4, > >> > > in > >> > > AL> -toplevel- > >> > > AL> xmldoc = > minidom.parse('samplexml.xml') > >> > > AL> File > "C:\Python23\lib\xml\dom\minidom.py", > >> > > line > >> > > AL> 1919, in parse > >> > > AL> return expatbuilder.parse(file) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 924, in parse > >> > > AL> result = builder.parseFile(fp) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > === message truncated === __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
Zeuux © 2025
京ICP备05028076号