<div dir="ltr"><div dir="ltr"><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">This is how I do it in a url-encoding routine.<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">I take advantage of the fact that the "safe" characters are all in a contiguous range of ascii values.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">So, rather than testing for a bunch of string matches (probably cpu-expensive when you're doing it sixty zillion times as you loop though and examine each and every byte of data individually...)</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">I just get the ascii value and test if it's either lt or gt certain values. A numerical gt/lt test should be more efficient than a string comparison.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">This example is pretty aggressive and checks 3 ranges to allow ONLY numbers and letters through, and EVERYTHING else gets url-encoded.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">You may want something a little differen, and I'll show simple modifications afterwards.<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Lines 101-end are a gosub.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Line 50 is how I use it.<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">50   -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If:                                                                   ◆<br>       Then: x = raw_data ; gosub ure ; urlencoded_data = x<br><br>.......<br><br><br>101  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If: '       position    inlen       inchr     inchr_dec   outchr out  ◆<br>       Then: declare ur_p(8,.0), ur_l(8,.0), ur_ic(1), ur_d(3,.0), ur_oc, ur_o ◆<br>102  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>ure    ◆ If:                                        '--- URL-Encode ---        ◆<br>       Then: x = ""{x{"" ; ur_l = len(x) ; ur_p = "1" ; ur_o = ""              ◆<br>103  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If: x eq ""                                                           ◆<br>       Then: return                                                            ◆<br>104  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>ure1   ◆ If:                                                                   ◆<br>       Then: ur_ic = mid(x,ur_p,"1") ;ur_oc = ur_ic ;ur_d = asc(ur_ic)         ◆<br>105  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If: ur_d ge "97" and ur_d le "122"    ' a-z                           ◆<br>       Then: goto ure_                                                         ◆<br>106  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If: ur_d ge "48" and ur_d le "57"     ' 0-9                           ◆<br>       Then: goto ure_                                                         ◆<br>107  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If: ur_d ge "65" and ur_d le "90"     ' A-Z                           ◆<br>       Then: goto ure_                                                         ◆<br>108  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If:                                                                   ◆<br>       Then: ur_oc = "%" { base(ur_d,"10","16")                                ◆<br>109  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>ure_   ◆ If:                                                                   ◆<br>       Then: ur_o = ur_o & ur_oc                                               ◆<br>110  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If: ur_p lt ur_l                                                      ◆<br>       Then: ur_p = ur_p + "1" ; goto ure1                                     ◆<br>111  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If:                                                                   ◆<br>       Then: x = ur_o ; return ' ure                                           ◆<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">The order of lines 105-107 is optimized so that the most likely condition is matched first, so that most of the time the routine is skipped while doing as little work as possible.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">IE, most random characters will be lowercase letters, so most of the iterations of the loop skip to the next char after doing only a single If: test.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Then next-most likely is probably numbers, so that's the next test, and least-most frequent is capital letters, so it's the last thing tested.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">It needs to be as tight as possible like that because that block runs once for every character in "x", and the entire gosub is run for every field you need processed. You want to do as little work as possible in there.<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">In this case, all bytes that aren't in one of those 3 ranges of safe ascii values, is replaced with it's url-encoded equivalent.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">That may be more agressive than you want, or it might be perfect. I can't say because I don't know what you are doing with the data.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Some characters, like most punctuation and symbols, don't need to be encoded if you're just displaying them or printing them,</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">but they do need to be encoded if you're using them in a shell command or a url.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Some characters, like everything below 32 and above 126 maybe you just want to delete them instead of encode them.<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Encoding is good in general because it makes your data safe yet preserves the original information just in another form.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">If you want to strip the unsafe bytes instead of encode them, you would replace line 108 with</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">108: Then: ur_oc = ""</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">You will probably want to adjust lines 105-107 to allow some punctuation characters through.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">As it's written above, it's ONLY letting 0-9, a-z, and A-Z through, and url-encoding everything else.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">That includes all high ascii, control characters, and even all punctuation.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">I can't say whether you want to delete things like "@", or urlencode them, or let them through un-touched.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">It depends on what you're doing with the data.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">To allow more characters through without deleting or encoding, one way is you could insert one new line between 107 and 108,</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">test for a bunch of punctuation that you want to allow though, like this:</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><div class="gmail-if">If: "$&+,/:;=?@ <>#%{}|^~[]`'"{chr("92"){chr("34") co ur_ic
               <br>
               </div><div class="gmail-linenum">Then: goto ure_ <br></div>
               </div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">I think that might be unnecessarily cpu-expensive.<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Here's another simpler option.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">If you want to simply allow all the "printable" characters, and strip everything else, no encoding, just replace lines 105-108 with these 2 lines:</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">105  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If: ur_d ge 32" and ur_d le "126"    ' printable 7-bit ascii          ◆<br>       Then: goto ure_                                                         ◆<br>106  -------   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -<br>       ◆ If:                                                                   ◆<br>       Then: ur_oc = ""                                                        ◆</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">Google "ascii table" to look at any ascii chart to see what the ascii numbers are for the different characters, to figure out how to handle any special cases that these examples don't cover exactly the way you need.</div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">-- <br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default">bkw<br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div><div style="font-family:monospace,monospace;font-size:small" class="gmail_default"><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 17, 2019 at 10:13 AM oldtony via Filepro-list <<a href="mailto:filepro-list@lists.celestial.com">filepro-list@lists.celestial.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">seeking coaching - a customer has some data in a stock file that is <br>
displaying non ASCII data. How do i search for non ASCII data so i can <br>
delete it? - partial screen shot below- thanks for the help - Old Tony<br>
<br>
-- <br>
<a href="mailto:tony@ynotsoftware.com" target="_blank">tony@ynotsoftware.com</a><br>
Tony Freehauf (Old Tony)<br>
YNOT Software & PC Support<br>
815.467.2179<br>
YNOT sounds like "Why Not."<br>
YNOT let us help you.<br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mailman.celestial.com/pipermail/filepro-list/attachments/20190117/bbec35a7/attachment.html" rel="noreferrer" target="_blank">http://mailman.celestial.com/pipermail/filepro-list/attachments/20190117/bbec35a7/attachment.html</a>><br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: nldachnbcbekeici.png<br>
Type: image/png<br>
Size: 6080 bytes<br>
Desc: not available<br>
URL: <<a href="http://mailman.celestial.com/pipermail/filepro-list/attachments/20190117/bbec35a7/attachment.png" rel="noreferrer" target="_blank">http://mailman.celestial.com/pipermail/filepro-list/attachments/20190117/bbec35a7/attachment.png</a>><br>
_______________________________________________<br>
Filepro-list mailing list<br>
<a href="mailto:Filepro-list@lists.celestial.com" target="_blank">Filepro-list@lists.celestial.com</a><br>
Subscribe/Unsubscribe/Subscription Changes<br>
<a href="http://mailman.celestial.com/mailman/listinfo/filepro-list" rel="noreferrer" target="_blank">http://mailman.celestial.com/mailman/listinfo/filepro-list</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><font face="monospace,monospace">bkw<br></font></div></div>