Python Forum
Large Data-set Edit | Ideal language?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Large Data-set Edit | Ideal language?
#1
Hello!
I am primarily a JS & DOS programmer so this issue is giving me some trouble in regards to how to approach it! I would love to know how best to tackle this problem/what scripting language you would recommend for such a problem.

I have quite a bit of map data that I have to edit. I originally had a DOS script that would reverse x1, y1 coordinates in order to change the direction of a particular segment in a map file. It worked wonderfully and all was well, but today my bossman told me that there is a boatload of nodes that make up the segments I have flipped.
I have flipped the segment x1, y1 - x2, y2 >> x2, y2 - x1, y1 in order to change the direction of the segments contained in the .asc file I am working with. But!! Now I have a file that contains all of the node data for each segment that exists on the map file in question. (this is edulog data if anyone is curious) The data is read from left to right.

What I need to be able to do is essentially flip the node data (like I did with the segment data)
A segment represents a street, a street is made up of nodes, and each street has a direction; either going left to right or right to left. My task is to reverse the direction of the segments(which I did in my DOS script) and also reverse the incrementation of the nodes contained within each segment. Each node has its own X1, Y1 data that increments in the direction that the segment is going in.
Ex:
X1-------X1(1107483)----------------X1(1107612)------------------X1(1119580)---------------------X2
------------n(1)-------------------------n(2)----------------------------n(3)------------------------------
Y1-------Y1(584932)-----------------Y1(584980)-------------------Y1(583142)----------------------Y2

I would like to run a script that would reverse the direction of the nodes:

Example of what the data looks like (a more thorough example is pasted below)

23732 N 23732 N 3 Y (1) 1035678 406785 (2) 1035676 406814 (3) 1035668 406858

After the script is run the data would (ideally) look like:

23732 N 23732 N 3 Y (3) 1035668 406858 (2) 1035676 406814 (1) 1035678 406785

So far, the most node points that a segment has had is 105 Nodes, 105 X1, Y1 pairs that I would like to reverse:

The approach I have in mind is to have a script that will create 105 empty variable objects at execution
- It will process each line of data and assign each node and its two values to corresponding objects containing the N#, X1, Y1
- It will then go through and reverse the order of the variable objects
- n(3) X1 Y1 n(2) X1 Y1 n3 X1 Y1...
- Spit it out into a new file
- Clear the variables
- Move on to the next line and repeat the process until there are no more lines to process
===========================================================================================
I am wondering if python would be an ideal way to approach this issue;
Are there any suggestions for what languages would be appropriate for this type of issue ?
Thank you for any and all suggestions :)
===========================================================================================
Example of the data I am working with:

40 Y 40 N 1 N (1) 961884 641632
41 Y 41 N 1 N (1) 967487 627129
42 Y 42 N 1 N (1) 967424 627104
44 Y 44 N 1 N (1) 977911 620540
46 Y 46 N 2 N (1) 979073 620398 (2) 978884 620434
47 Y 47 N 1 N (1) 977997 620602
48 Y 48 N 4 N (1) 979093 620314 (2) 979004 620332 (3) 978913 620350 (4) 978640 620401
56 Y 56 N 1 N (1) 979284 568834
57 Y 57 N 5 N (1) 979494 568276 (2) 979231 568622 (3) 979210 568652 (4) 978921 569034
Reply
#2
Personally, I'd look at sed or awk. They were made for exactly this kind of project. It sounds like a one-liner, to be honest.

Python is very capable of performing this. It'd be only a couple of lines of code. It would run significantly slower, however.

If performance is important, sed or awk. If not, Python is fine.
Reply
#3
Excellent! Thank you SRG, I really appreciate the quick response. I will take a look at sed & awk and see which suits me the best. Hot dog! A one liner!? How cool would that be huh? yeehaw!
I'm looking for efficiency and would like it to run as quick as possible.
Thanks so much for the response, you are a lifesaver :)
Reply
#4
If you need Python to be faster for this, I'd suggest giving PyPy a try. I would expect it to have comparable performance to awk and sed at that point. But yeah, if you need speed and awk and sed are simple enough, no need to complicate thingsĀ  :)
Reply
#5
Thank you so much micseydel!
I appreciate all the replies in this thread and now that I am back from being a little sick over xmas I will hop on this and give things a try!
Python is still on my list of things to learn at some point, as is c, so ideally (if I have the time) I will try this script in Awk or Sed, but once I have that finished I would like to give the PyPy a TryTry ( Sick ) just for fun!
Thank you for the replies!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  googletrans library to translate text language for using data frame is not running gcozba2023 0 1,163 Mar-06-2023, 09:50 AM
Last Post: gcozba2023
  Looping Through Large Data Sets JoeDainton123 10 4,275 Oct-18-2020, 02:58 PM
Last Post: buran
  Extract data from large string pzig98 1 2,095 Jul-20-2020, 12:39 AM
Last Post: Larz60+
  Moving large amount of data between MySql and Sql Server using Python ste80adr 4 3,325 Apr-24-2020, 01:24 PM
Last Post: Jeff900
  alternative to nested loops for large data set JonnyEnglish 2 2,526 Feb-19-2020, 11:26 PM
Last Post: JonnyEnglish
  how to load large data into dataframe. sandy 0 2,622 Feb-01-2019, 06:19 PM
Last Post: sandy
  Working with large volume of data (RAM is not enough) evonevo 6 4,026 Oct-21-2018, 09:24 PM
Last Post: Larz60+
  Avoid output buffering when redirecting large data (40KB) to another process Ramphic 3 3,344 Mar-10-2018, 04:49 AM
Last Post: Larz60+
  Receiving large data stream using Pyserial trampas 1 4,199 Mar-09-2017, 03:32 AM
Last Post: Analyser

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020