2004年03月10日 星期三 15:43
I copied this sample xml file from the web:And then when I tried to parse it using the following python code: from xml.dom import minidom xmldoc = minidom.parse('samplexml.xml') print xmldoc.toxml() Python still says that the xml is not well-formed. See below: Traceback (most recent call last): File "C:\Python23\codes\xmltest.py", line 4, in -toplevel- xmldoc = minidom.parse('samplexml.xml') File "C:\Python23\lib\xml\dom\minidom.py", line 1919, in parse return expatbuilder.parse(file) File "C:\Python23\lib\xml\dom\expatbuilder.py", line 924, in parse result = builder.parseFile(fp) File "C:\Python23\lib\xml\dom\expatbuilder.py", line 207, in parseFile parser.Parse(buffer, 0) ExpatError: not well-formed (invalid token): line 1, column 5 How come? What can I do about this? __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com Sample Document Brandon Voss The XML Pages This is element text and an entity follows:&Description;
2004年03月10日 星期三 15:55
Hello Anthony, "" is unnecessary !!! === [ 15:43 ; 04-03-10 ] you wrote: AL> I copied this sample xml file from the web: AL> AL> AL>AL> AL> And then when I tried to parse it using the following AL> python code: AL> from xml.dom import minidom AL> xmldoc = minidom.parse('samplexml.xml') AL> print xmldoc.toxml() AL> Python still says that the xml is not well-formed. AL> See below: AL> Traceback (most recent call last): AL> File "C:\Python23\codes\xmltest.py", line 4, in AL> -toplevel- AL> xmldoc = minidom.parse('samplexml.xml') AL> File "C:\Python23\lib\xml\dom\minidom.py", line AL> 1919, in parse AL> return expatbuilder.parse(file) AL> File "C:\Python23\lib\xml\dom\expatbuilder.py", line AL> 924, in parse AL> result = builder.parseFile(fp) AL> File "C:\Python23\lib\xml\dom\expatbuilder.py", line AL> 207, in parseFile AL> parser.Parse(buffer, 0) AL> ExpatError: not well-formed (invalid token): line 1, AL> column 5 AL> How come? AL> What can I do about this? AL> __________________________________ AL> Do you Yahoo!? AL> Yahoo! Search - Find what youre looking for faster AL> http://search.yahoo.com === === === === === === === === === === -- Best regards, Zoom.Quiet /=======================================\ ]Time is unimportant, only life important![ \=======================================/Sample Document AL>AL> Brandon AL>Voss AL>The XML Pages This is element text and an entity AL> follows:&Description; AL> AL>
2004年03月10日 星期三 15:57
ok, then let me give it another try. --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "" > is unnecessary !!! > > === [ 15:43 ; 04-03-10 ] you wrote: > > AL> I copied this sample xml file from the web: > > AL> > AL> > AL>> AL> Sample Document > AL>> AL> Brandon Voss > AL>The XML Pages > AL>This is element text and an entity > AL> follows:&Description; > AL> > AL> > > AL> And then when I tried to parse it using the > following > AL> python code: > > AL> from xml.dom import minidom > AL> xmldoc = minidom.parse('samplexml.xml') > AL> print xmldoc.toxml() > > AL> Python still says that the xml is not > well-formed. > AL> See below: > > AL> Traceback (most recent call last): > AL> File "C:\Python23\codes\xmltest.py", line 4, > in > AL> -toplevel- > AL> xmldoc = minidom.parse('samplexml.xml') > AL> File "C:\Python23\lib\xml\dom\minidom.py", > line > AL> 1919, in parse > AL> return expatbuilder.parse(file) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 924, in parse > AL> result = builder.parseFile(fp) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 207, in parseFile > AL> parser.Parse(buffer, 0) > AL> ExpatError: not well-formed (invalid token): > line 1, > AL> column 5 > > AL> How come? > > AL> What can I do about this? > > > AL> __________________________________ > AL> Do you Yahoo!? > AL> Yahoo! Search - Find what you抮e looking for > faster > AL> http://search.yahoo.com > > === === === === === === === === === === > > -- > Best regards, > Zoom.Quiet > > /=======================================\ > ]Time is unimportant, only life important![ > \=======================================/ > __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 15:59
Man, I removed that line, but the problem remains. Watch this: ExpatError: not well-formed (invalid token): line 1, column 5 --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "" > is unnecessary !!! > > === [ 15:43 ; 04-03-10 ] you wrote: > > AL> I copied this sample xml file from the web: > > AL> > AL> > AL>> AL> Sample Document > AL>> AL> Brandon Voss > AL>The XML Pages > AL>This is element text and an entity > AL> follows:&Description; > AL> > AL> > > AL> And then when I tried to parse it using the > following > AL> python code: > > AL> from xml.dom import minidom > AL> xmldoc = minidom.parse('samplexml.xml') > AL> print xmldoc.toxml() > > AL> Python still says that the xml is not > well-formed. > AL> See below: > > AL> Traceback (most recent call last): > AL> File "C:\Python23\codes\xmltest.py", line 4, > in > AL> -toplevel- > AL> xmldoc = minidom.parse('samplexml.xml') > AL> File "C:\Python23\lib\xml\dom\minidom.py", > line > AL> 1919, in parse > AL> return expatbuilder.parse(file) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 924, in parse > AL> result = builder.parseFile(fp) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 207, in parseFile > AL> parser.Parse(buffer, 0) > AL> ExpatError: not well-formed (invalid token): > line 1, > AL> column 5 > > AL> How come? > > AL> What can I do about this? > > > AL> __________________________________ > AL> Do you Yahoo!? > AL> Yahoo! Search - Find what you抮e looking for > faster > AL> http://search.yahoo.com > > === === === === === === === === === === > > -- > Best regards, > Zoom.Quiet > > /=======================================\ > ]Time is unimportant, only life important![ > \=======================================/ > __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 16:02
我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 ------- Explicit is better than implicit ... -----Original Message----- From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] Sent: 2004年3月10日 16:00 To: pycn Subject: Re: [python-chinese] strage minidom or xml Man, I removed that line, but the problem remains. Watch this: ExpatError: not well-formed (invalid token): line 1, column 5 --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "" > is unnecessary !!! > > === [ 15:43 ; 04-03-10 ] you wrote: > > AL> I copied this sample xml file from the web: > > AL> > AL> > AL>> AL> Sample Document > AL>> AL> Brandon Voss > AL>The XML Pages > AL>This is element text and an entity follows:&Description; > AL> > AL> > > AL> And then when I tried to parse it using the > following > AL> python code: > > AL> from xml.dom import minidom > AL> xmldoc = minidom.parse('samplexml.xml') > AL> print xmldoc.toxml() > > AL> Python still says that the xml is not > well-formed. > AL> See below: > > AL> Traceback (most recent call last): > AL> File "C:\Python23\codes\xmltest.py", line 4, > in > AL> -toplevel- > AL> xmldoc = minidom.parse('samplexml.xml') > AL> File "C:\Python23\lib\xml\dom\minidom.py", > line > AL> 1919, in parse > AL> return expatbuilder.parse(file) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 924, in parse > AL> result = builder.parseFile(fp) > AL> File > "C:\Python23\lib\xml\dom\expatbuilder.py", line > AL> 207, in parseFile > AL> parser.Parse(buffer, 0) > AL> ExpatError: not well-formed (invalid token): > line 1, > AL> column 5 > > AL> How come? > > AL> What can I do about this?
2004年03月10日 星期三 16:08
Hello Anthony, ¿ÉÄÜÊǹí×Ö·ûÁË£¡ ÖØÐÂʹÓÃxmlSpy Ö®ÀàµÄXML±à¼Æ÷Éú³ÉÒ»¸öXMLÎĵµ£¬ ÓÉÆäÏÈÈ·ÈÏÁ¼¹¹·ñ°É£¡ === [ 15:59 ; 04-03-10 ] you wrote: AL> Man, I removed that line, but the problem remains. AL> Watch this: AL> ExpatError: not well-formed (invalid token): line 1, AL> column 5 AL> --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: >> Hello Anthony, >> >> "" >> is unnecessary !!! >> >> === [ 15:43 ; 04-03-10 ] you wrote: >> >> AL> I copied this sample xml file from the web: >> >> AL> >> AL> >> AL>>> AL> Sample Document >> AL>>> AL> Brandon Voss >> AL>The XML Pages >> AL>This is element text and an entity >> AL> follows:&Description; >> AL> >> AL> >> >> AL> And then when I tried to parse it using the >> following >> AL> python code: >> >> AL> from xml.dom import minidom >> AL> xmldoc = minidom.parse('samplexml.xml') >> AL> print xmldoc.toxml() >> >> AL> Python still says that the xml is not >> well-formed. >> AL> See below: >> >> AL> Traceback (most recent call last): >> AL> File "C:\Python23\codes\xmltest.py", line 4, >> in >> AL> -toplevel- >> AL> xmldoc = minidom.parse('samplexml.xml') >> AL> File "C:\Python23\lib\xml\dom\minidom.py", >> line >> AL> 1919, in parse >> AL> return expatbuilder.parse(file) >> AL> File >> "C:\Python23\lib\xml\dom\expatbuilder.py", line >> AL> 924, in parse >> AL> result = builder.parseFile(fp) >> AL> File >> "C:\Python23\lib\xml\dom\expatbuilder.py", line >> AL> 207, in parseFile >> AL> parser.Parse(buffer, 0) >> AL> ExpatError: not well-formed (invalid token): >> line 1, >> AL> column 5 >> >> AL> How come? >> >> AL> What can I do about this? >> >> >> AL> __________________________________ >> AL> Do you Yahoo!? >> AL> Yahoo! Search - Find what you抮e looking AL> for >> faster >> AL> http://search.yahoo.com >> >> === === === === === === === === === === >> >> -- >> Best regards, >> Zoom.Quiet >> >> /=======================================\ >> ]Time is unimportant, only life important![ >> \=======================================/ >> AL> __________________________________ AL> Do you Yahoo!? AL> Yahoo! Search - Find what youre looking for faster AL> http://search.yahoo.com === === === === === === === === === === -- Best regards, Zoom.Quiet /=======================================\ ]Time is unimportant, only life important![ \=======================================/
2004年03月10日 星期三 16:11
You are suggesting me to take a look at expatbuilder.py? --- Jacob Fan <jacob at exoweb.net> wrote: > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > > ------- > Explicit is better than implicit ... > > -----Original Message----- > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > Sent: 2004年3月10日 16:00 > To: pycn > Subject: Re: [python-chinese] strage minidom or xml > > > Man, I removed that line, but the problem remains. > Watch this: > > ExpatError: not well-formed (invalid token): line 1, > column 5 > > > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > > Hello Anthony, > > > > "" > > is unnecessary !!! > > > > === [ 15:43 ; 04-03-10 ] you wrote: > > > > AL> I copied this sample xml file from the web: > > > > AL> > > AL> > > AL>> > AL> Sample Document > > AL>> > AL> Brandon Voss > > AL>The XML Pages > > AL>This is element text and an entity > follows:&Description; > > AL> > > AL> > > > > AL> And then when I tried to parse it using the > > following > > AL> python code: > > > > AL> from xml.dom import minidom > > AL> xmldoc = minidom.parse('samplexml.xml') > > AL> print xmldoc.toxml() > > > > AL> Python still says that the xml is not > > well-formed. > > AL> See below: > > > > AL> Traceback (most recent call last): > > AL> File "C:\Python23\codes\xmltest.py", line 4, > > in > > AL> -toplevel- > > AL> xmldoc = minidom.parse('samplexml.xml') > > AL> File "C:\Python23\lib\xml\dom\minidom.py", > > line > > AL> 1919, in parse > > AL> return expatbuilder.parse(file) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 924, in parse > > AL> result = builder.parseFile(fp) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 207, in parseFile > > AL> parser.Parse(buffer, 0) > > AL> ExpatError: not well-formed (invalid token): > > line 1, > > AL> column 5 > > > > AL> How come? > > > > AL> What can I do about this? > > _______________________________________________ > python-chinese list > python-chinese at lists.python.cn > http://python.cn/mailman/listinfo/python-chinese __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 16:18
Please look at the traceback? If this is your script, how do you debug = it? ;) First look at here expatbuilder.py line 207: > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 207, in parseFile > > AL> parser.Parse(buffer, 0) > > AL> ExpatError: not well-formed (invalid token): > > line 1, > > AL> column 5 The ExpatError is thrown by parser.Parse We could add a print statement above parser.Parse(buffer,0) to see which = parser does it actually use. Then look into that parser to see where it = throws a ExpatError with the message "not well-formed(invalid token)". = But before that, maybe we can just, as Zoom.Quiet said, check if there = are a ghost character. If you have something such as UltraEdit, you may = use it to see if there are strange characters in the file. Or just use a = known good file to check. ------- Explicit is better than implicit ...=20 -----Original Message----- From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]=20 Sent: 2004=C4=EA3=D4=C210=C8=D5 16:11 To: python-chinese at lists.python.cn Subject: RE: [python-chinese] strage minidom or xml You are suggesting me to take a look at expatbuilder.py? --- Jacob Fan <jacob at exoweb.net> wrote: > 我建议你到源码里面&= #30475;看。我每次遇到这= 31181;问题就先去看代码= 5292;看看某个结果是怎= 040;出来的。 >=20 > ------- > Explicit is better than implicit ... >=20 > -----Original Message----- > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > Sent: 2004年3月10日 16:00 > To: pycn > Subject: Re: [python-chinese] strage minidom or xml >=20 >=20 > Man, I removed that line, but the problem remains. > Watch this: >=20 > ExpatError: not well-formed (invalid token): line 1, > column 5 >=20 >=20 > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > > Hello Anthony, > >=20 > > "" > > is unnecessary !!! > >=20 > > =3D=3D=3D [ 15:43 ; 04-03-10 ] you wrote: > >=20 > > AL> I copied this sample xml file from the web: > >=20 > > AL> > > AL> > > AL>> > AL> Sample Document > > AL>> > AL> Brandon Voss > > AL>The XML Pages > > AL>This is element text and an entity > follows:&Description; > > AL> > > AL> > >=20 > > AL> And then when I tried to parse it using the > > following > > AL> python code: > >=20 > > AL> from xml.dom import minidom > > AL> xmldoc =3D minidom.parse('samplexml.xml') > > AL> print xmldoc.toxml() > >=20 > > AL> Python still says that the xml is not > > well-formed. > > AL> See below: > >=20 > > AL> Traceback (most recent call last): > > AL> File "C:\Python23\codes\xmltest.py", line 4, > > in > > AL> -toplevel- > > AL> xmldoc =3D minidom.parse('samplexml.xml') > > AL> File "C:\Python23\lib\xml\dom\minidom.py", > > line > > AL> 1919, in parse > > AL> return expatbuilder.parse(file) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 924, in parse > > AL> result =3D builder.parseFile(fp) > > AL> File > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > AL> 207, in parseFile > > AL> parser.Parse(buffer, 0) > > AL> ExpatError: not well-formed (invalid token): > > line 1, > > AL> column 5 > >=20 > > AL> How come? > >=20 > > AL> What can I do about this? >=20 > _______________________________________________ > python-chinese list > python-chinese at lists.python.cn=20 > http://python.cn/mailman/listinfo/python-chinese __________________________________ Do you Yahoo!? Yahoo! Search - Find what you=92re looking for faster = http://search.yahoo.com
2004年03月10日 星期三 16:27
I really don't know what happened to the code. I tested that code and the sample xml file on the Mandrake system, and I still get the same error message: not well-formed. O, my gosh, I am really fed up with it. --- Jacob Fan <jacob at exoweb.net> wrote: > Please look at the traceback? If this is your > script, how do you debug it? ;) > First look at here expatbuilder.py line 207: > > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > > AL> 207, in parseFile > > > AL> parser.Parse(buffer, 0) > > > AL> ExpatError: not well-formed (invalid token): > > > line 1, > > > AL> column 5 > The ExpatError is thrown by parser.Parse > We could add a print statement above > parser.Parse(buffer,0) to see which parser does it > actually use. Then look into that parser to see > where it throws a ExpatError with the message "not > well-formed(invalid token)". But before that, maybe > we can just, as Zoom.Quiet said, check if there are > a ghost character. If you have something such as > UltraEdit, you may use it to see if there are > strange characters in the file. Or just use a known > good file to check. > > ------- > Explicit is better than implicit ... > > -----Original Message----- > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > Sent: 2004年3月10日 16:11 > To: python-chinese at lists.python.cn > Subject: RE: [python-chinese] strage minidom or xml > > > You are suggesting me to take a look at > expatbuilder.py? > > --- Jacob Fan <jacob at exoweb.net> wrote: > > > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > > > > ------- > > Explicit is better than implicit ... > > > > -----Original Message----- > > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] > > Sent: 2004年3月10日 16:00 > > To: pycn > > Subject: Re: [python-chinese] strage minidom or > xml > > > > > > Man, I removed that line, but the problem remains. > > Watch this: > > > > ExpatError: not well-formed (invalid token): line > 1, > > column 5 > > > > > > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > > > Hello Anthony, > > > > > > "" > > > is unnecessary !!! > > > > > > === [ 15:43 ; 04-03-10 ] you wrote: > > > > > > AL> I copied this sample xml file from the web: > > > > > > AL> > > > AL> > > > AL>> > > AL> Sample Document > > > AL>> > > AL> Brandon Voss > > > AL>The XML Pages > > > AL>This is element text and an entity > > follows:&Description; > > > AL> > > > AL> > > > > > > AL> And then when I tried to parse it using the > > > following > > > AL> python code: > > > > > > AL> from xml.dom import minidom > > > AL> xmldoc = minidom.parse('samplexml.xml') > > > AL> print xmldoc.toxml() > > > > > > AL> Python still says that the xml is not > > > well-formed. > > > AL> See below: > > > > > > AL> Traceback (most recent call last): > > > AL> File "C:\Python23\codes\xmltest.py", line > 4, > > > in > > > AL> -toplevel- > > > AL> xmldoc = minidom.parse('samplexml.xml') > > > AL> File "C:\Python23\lib\xml\dom\minidom.py", > > > line > > > AL> 1919, in parse > > > AL> return expatbuilder.parse(file) > > > AL> File > > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > > AL> 924, in parse > > > AL> result = builder.parseFile(fp) > > > AL> File > > > "C:\Python23\lib\xml\dom\expatbuilder.py", line > > > AL> 207, in parseFile > > > AL> parser.Parse(buffer, 0) > > > AL> ExpatError: not well-formed (invalid token): > > > line 1, > > > AL> column 5 > > > > > > AL> How come? > > > > > > AL> What can I do about this? > > > > _______________________________________________ > > python-chinese list > > python-chinese at lists.python.cn > > http://python.cn/mailman/listinfo/python-chinese > > > __________________________________ > Do you Yahoo!? > Yahoo! Search - Find what you抮e looking for faster > http://search.yahoo.com > _______________________________________________ > python-chinese list > python-chinese at lists.python.cn > http://python.cn/mailman/listinfo/python-chinese __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月10日 星期三 16:43
Hello Anthony, "Mandrake"?? "C:\Python23\"?? WHAT SYSTEM U RUNNING PYTHON?? so so at frist use Py test weel-format self! """ from xml.sax.handler import ContentHandler from xml.sax import make_parser from glob import glob import sys def parsefile(file): parser = make_parser( ) parser.setContentHandler(ContentHandler( )) parser.parse(file) for arg in sys.argv[1:]: for filename in glob(arg): try: parsefile(filename) print "%s is well-formed" % filename except Exception, e: print "%s is NOT well-formed! %s" % (filename, e) """ and try expat to parsers ?? minidom is poor and slow... """ import xml.parsers.expat # 3 handler functions def start_element(name, attrs): print 'Start element:', name, attrs def end_element(name): print 'End element:', name def char_data(data): print 'Character data:', repr(data) p = xml.parsers.expat.ParserCreate() p.StartElementHandler = start_element p.EndElementHandler = end_element p.CharacterDataHandler = char_data p.Parse("""""") """ === [ 16:27 ; 04-03-10 ] you wrote: AL> I really don't know what happened to the code. I AL> tested that code and the sample xml file on the AL> Mandrake system, and I still get the same error AL> message: not well-formed. AL> O, my gosh, I am really fed up with it. AL> --- Jacob Fan <jacob at exoweb.net> wrote: >> Please look at the traceback? If this is your >> script, how do you debug it? ;) >> First look at here expatbuilder.py line 207: >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line >> > > AL> 207, in parseFile >> > > AL> parser.Parse(buffer, 0) >> > > AL> ExpatError: not well-formed (invalid token): >> > > line 1, >> > > AL> column 5 >> The ExpatError is thrown by parser.Parse >> We could add a print statement above >> parser.Parse(buffer,0) to see which parser does it >> actually use. Then look into that parser to see >> where it throws a ExpatError with the message "not >> well-formed(invalid token)". But before that, maybe >> we can just, as Zoom.Quiet said, check if there are >> a ghost character. If you have something such as >> UltraEdit, you may use it to see if there are >> strange characters in the file. Or just use a known >> good file to check. >> >> ------- >> Explicit is better than implicit ... >> >> -----Original Message----- >> From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] >> Sent: 2004年3月10日 16:11 >> To: python-chinese at lists.python.cn >> Subject: RE: [python-chinese] strage minidom or xml >> >> >> You are suggesting me to take a look at >> expatbuilder.py? >> >> --- Jacob Fan <jacob at exoweb.net> wrote: >> > >> AL> 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 >> > >> > ------- >> > Explicit is better than implicit ... >> > >> > -----Original Message----- >> > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] >> > Sent: 2004年3月10日 16:00 >> > To: pycn >> > Subject: Re: [python-chinese] strage minidom or >> xml >> > >> > >> > Man, I removed that line, but the problem remains. >> > Watch this: >> > >> > ExpatError: not well-formed (invalid token): line >> 1, >> > column 5 >> > >> > >> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: >> > > Hello Anthony, >> > > >> > > "" >> > > is unnecessary !!! >> > > >> > > === [ 15:43 ; 04-03-10 ] you wrote: >> > > >> > > AL> I copied this sample xml file from the web: >> > > >> > > AL> >> > > AL> >> > > AL> Text goes here More text >> > > AL> Sample Document >> > > AL>>> > > AL> Brandon Voss >> > > AL>The XML Pages >> > > AL>This is element text and an entity >> > follows:&Description; >> > > AL> >> > > AL> >> > > >> > > AL> And then when I tried to parse it using the >> > > following >> > > AL> python code: >> > > >> > > AL> from xml.dom import minidom >> > > AL> xmldoc = minidom.parse('samplexml.xml') >> > > AL> print xmldoc.toxml() >> > > >> > > AL> Python still says that the xml is not >> > > well-formed. >> > > AL> See below: >> > > >> > > AL> Traceback (most recent call last): >> > > AL> File "C:\Python23\codes\xmltest.py", line >> 4, >> > > in >> > > AL> -toplevel- >> > > AL> xmldoc = minidom.parse('samplexml.xml') >> > > AL> File "C:\Python23\lib\xml\dom\minidom.py", >> > > line >> > > AL> 1919, in parse >> > > AL> return expatbuilder.parse(file) >> > > AL> File >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line >> > > AL> 924, in parse >> > > AL> result = builder.parseFile(fp) >> > > AL> File >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line >> > > AL> 207, in parseFile >> > > AL> parser.Parse(buffer, 0) >> > > AL> ExpatError: not well-formed (invalid token): >> > > line 1, >> > > AL> column 5 >> > > >> > > AL> How come? >> > > >> > > AL> What can I do about this? >> > >> > _______________________________________________ >> > python-chinese list >> > python-chinese at lists.python.cn >> > http://python.cn/mailman/listinfo/python-chinese >> >> >> __________________________________ >> Do you Yahoo!? >> Yahoo! Search - Find what you抮e looking for AL> faster >> http://search.yahoo.com >> _______________________________________________ >> python-chinese list >> python-chinese at lists.python.cn >> http://python.cn/mailman/listinfo/python-chinese AL> __________________________________ AL> Do you Yahoo!? AL> Yahoo! Search - Find what youre looking for faster AL> http://search.yahoo.com === === === === === === === === === === -- Best regards, Zoom.Quiet /=======================================\ ]Time is unimportant, only life important![ \=======================================/
2004年03月10日 星期三 17:05
I tested it on both Mandrake and Win2K, it worked on neither of them. --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "Mandrake"?? > "C:\Python23\"?? > > WHAT SYSTEM U RUNNING PYTHON?? > > so so at frist use Py test weel-format self! > """ > from xml.sax.handler import ContentHandler > from xml.sax import make_parser > from glob import glob > import sys > > def parsefile(file): > parser = make_parser( ) > parser.setContentHandler(ContentHandler( )) > parser.parse(file) > > for arg in sys.argv[1:]: > for filename in glob(arg): > try: > parsefile(filename) > print "%s is well-formed" % filename > except Exception, e: > print "%s is NOT well-formed! %s" % > (filename, e) > """ > > and try expat to parsers ?? > minidom is poor and slow... > """ > import xml.parsers.expat > > # 3 handler functions > def start_element(name, attrs): > print 'Start element:', name, attrs > def end_element(name): > print 'End element:', name > def char_data(data): > print 'Character data:', repr(data) > > p = xml.parsers.expat.ParserCreate() > > p.StartElementHandler = start_element > p.EndElementHandler = end_element > p.CharacterDataHandler = char_data > > p.Parse(""" >> here > Text goes More text > """) > > """ > > === [ 16:27 ; 04-03-10 ] you wrote: > > AL> I really don't know what happened to the code. I > AL> tested that code and the sample xml file on the > AL> Mandrake system, and I still get the same error > AL> message: not well-formed. > > AL> O, my gosh, I am really fed up with it. > > > AL> --- Jacob Fan <jacob at exoweb.net> wrote: > >> Please look at the traceback? If this is your > >> script, how do you debug it? ;) > >> First look at here expatbuilder.py line 207: > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > >> > > line 1, > >> > > AL> column 5 > >> The ExpatError is thrown by parser.Parse > >> We could add a print statement above > >> parser.Parse(buffer,0) to see which parser does > it > >> actually use. Then look into that parser to see > >> where it throws a ExpatError with the message > "not > >> well-formed(invalid token)". But before that, > maybe > >> we can just, as Zoom.Quiet said, check if there > are > >> a ghost character. If you have something such as > >> UltraEdit, you may use it to see if there are > >> strange characters in the file. Or just use a > known > >> good file to check. > >> > >> ------- > >> Explicit is better than implicit ... > >> > >> -----Original Message----- > >> From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> Sent: 2004年3月10日 16:11 > >> To: python-chinese at lists.python.cn > >> Subject: RE: [python-chinese] strage minidom or > xml > >> > >> > >> You are suggesting me to take a look at > >> expatbuilder.py? > >> > >> --- Jacob Fan <jacob at exoweb.net> wrote: > >> > > >> > AL> > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > >> > > >> > ------- > >> > Explicit is better than implicit ... > >> > > >> > -----Original Message----- > >> > From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> > Sent: 2004年3月10日 16:00 > >> > To: pycn > >> > Subject: Re: [python-chinese] strage minidom or > >> xml > >> > > >> > > >> > Man, I removed that line, but the problem > remains. > >> > Watch this: > >> > > >> > ExpatError: not well-formed (invalid token): > line > >> 1, > >> > column 5 > >> > > >> > > >> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > >> > > Hello Anthony, > >> > > > >> > > "" > >> > > is unnecessary !!! > >> > > > >> > > === [ 15:43 ; 04-03-10 ] you wrote: > >> > > > >> > > AL> I copied this sample xml file from the > web: > >> > > > >> > > AL> > >> > > AL> > >> > > AL>> >> > > AL> Sample Document > >> > > AL>> >> > > AL> Brandon Voss > >> > > AL>The XML Pages > > >> > > AL>This is element text and an > entity > >> > follows:&Description; > >> > > AL> > >> > > AL> > >> > > > >> > > AL> And then when I tried to parse it using > the > >> > > following > >> > > AL> python code: > >> > > > >> > > AL> from xml.dom import minidom > >> > > AL> xmldoc = minidom.parse('samplexml.xml') > >> > > AL> print xmldoc.toxml() > >> > > > >> > > AL> Python still says that the xml is not > >> > > well-formed. > >> > > AL> See below: > >> > > > >> > > AL> Traceback (most recent call last): > >> > > AL> File "C:\Python23\codes\xmltest.py", > line > >> 4, > >> > > in > >> > > AL> -toplevel- > >> > > AL> xmldoc = > minidom.parse('samplexml.xml') > >> > > AL> File > "C:\Python23\lib\xml\dom\minidom.py", > >> > > line > >> > > AL> 1919, in parse > >> > > AL> return expatbuilder.parse(file) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 924, in parse > >> > > AL> result = builder.parseFile(fp) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > === message truncated === __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
2004年03月11日 星期四 00:47
The parse is successful if I lower-case the "xml" in the declaration of the xml document, and meanwhile remove the ampersand (&) before "Description". But if I insert some Chinese characters into the xml document, the same sample python code cannot parse it. The code got stuck whenever it hits the 1st Chinese character. Python complains: ExpatError: not well-formed (invalid token): line 3, column 7 where lin3 and column 7 pinpoints the 1st byte of the 1st Chinese character in the xml document. How can I correctly parse an xml document containing Chinese using python? Give a hint, please. --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > Hello Anthony, > > "Mandrake"?? > "C:\Python23\"?? > > WHAT SYSTEM U RUNNING PYTHON?? > > so so at frist use Py test weel-format self! > """ > from xml.sax.handler import ContentHandler > from xml.sax import make_parser > from glob import glob > import sys > > def parsefile(file): > parser = make_parser( ) > parser.setContentHandler(ContentHandler( )) > parser.parse(file) > > for arg in sys.argv[1:]: > for filename in glob(arg): > try: > parsefile(filename) > print "%s is well-formed" % filename > except Exception, e: > print "%s is NOT well-formed! %s" % > (filename, e) > """ > > and try expat to parsers ?? > minidom is poor and slow... > """ > import xml.parsers.expat > > # 3 handler functions > def start_element(name, attrs): > print 'Start element:', name, attrs > def end_element(name): > print 'End element:', name > def char_data(data): > print 'Character data:', repr(data) > > p = xml.parsers.expat.ParserCreate() > > p.StartElementHandler = start_element > p.EndElementHandler = end_element > p.CharacterDataHandler = char_data > > p.Parse(""" >> here > Text goes More text > """) > > """ > > === [ 16:27 ; 04-03-10 ] you wrote: > > AL> I really don't know what happened to the code. I > AL> tested that code and the sample xml file on the > AL> Mandrake system, and I still get the same error > AL> message: not well-formed. > > AL> O, my gosh, I am really fed up with it. > > > AL> --- Jacob Fan <jacob at exoweb.net> wrote: > >> Please look at the traceback? If this is your > >> script, how do you debug it? ;) > >> First look at here expatbuilder.py line 207: > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > >> > > line 1, > >> > > AL> column 5 > >> The ExpatError is thrown by parser.Parse > >> We could add a print statement above > >> parser.Parse(buffer,0) to see which parser does > it > >> actually use. Then look into that parser to see > >> where it throws a ExpatError with the message > "not > >> well-formed(invalid token)". But before that, > maybe > >> we can just, as Zoom.Quiet said, check if there > are > >> a ghost character. If you have something such as > >> UltraEdit, you may use it to see if there are > >> strange characters in the file. Or just use a > known > >> good file to check. > >> > >> ------- > >> Explicit is better than implicit ... > >> > >> -----Original Message----- > >> From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> Sent: 2004年3月10日 16:11 > >> To: python-chinese at lists.python.cn > >> Subject: RE: [python-chinese] strage minidom or > xml > >> > >> > >> You are suggesting me to take a look at > >> expatbuilder.py? > >> > >> --- Jacob Fan <jacob at exoweb.net> wrote: > >> > > >> > AL> > 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。 > >> > > >> > ------- > >> > Explicit is better than implicit ... > >> > > >> > -----Original Message----- > >> > From: Anthony Liu > [mailto:antonyliu2002 at yahoo.com] > >> > Sent: 2004年3月10日 16:00 > >> > To: pycn > >> > Subject: Re: [python-chinese] strage minidom or > >> xml > >> > > >> > > >> > Man, I removed that line, but the problem > remains. > >> > Watch this: > >> > > >> > ExpatError: not well-formed (invalid token): > line > >> 1, > >> > column 5 > >> > > >> > > >> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote: > >> > > Hello Anthony, > >> > > > >> > > "" > >> > > is unnecessary !!! > >> > > > >> > > === [ 15:43 ; 04-03-10 ] you wrote: > >> > > > >> > > AL> I copied this sample xml file from the > web: > >> > > > >> > > AL> > >> > > AL> > >> > > AL>> >> > > AL> Sample Document > >> > > AL>> >> > > AL> Brandon Voss > >> > > AL>The XML Pages > > >> > > AL>This is element text and an > entity > >> > follows:&Description; > >> > > AL> > >> > > AL> > >> > > > >> > > AL> And then when I tried to parse it using > the > >> > > following > >> > > AL> python code: > >> > > > >> > > AL> from xml.dom import minidom > >> > > AL> xmldoc = minidom.parse('samplexml.xml') > >> > > AL> print xmldoc.toxml() > >> > > > >> > > AL> Python still says that the xml is not > >> > > well-formed. > >> > > AL> See below: > >> > > > >> > > AL> Traceback (most recent call last): > >> > > AL> File "C:\Python23\codes\xmltest.py", > line > >> 4, > >> > > in > >> > > AL> -toplevel- > >> > > AL> xmldoc = > minidom.parse('samplexml.xml') > >> > > AL> File > "C:\Python23\lib\xml\dom\minidom.py", > >> > > line > >> > > AL> 1919, in parse > >> > > AL> return expatbuilder.parse(file) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 924, in parse > >> > > AL> result = builder.parseFile(fp) > >> > > AL> File > >> > > "C:\Python23\lib\xml\dom\expatbuilder.py", > line > >> > > AL> 207, in parseFile > >> > > AL> parser.Parse(buffer, 0) > >> > > AL> ExpatError: not well-formed (invalid > token): > === message truncated === __________________________________ Do you Yahoo!? Yahoo! Search - Find what youre looking for faster http://search.yahoo.com
Zeuux © 2024
京ICP备05028076号