Numpy random permutation

12/12/2023

This works.but it's a little scary, as I see little guarantee it'll continue to work - it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example. One other thought I had was this: def shuffle_in_unison_scary(a, b): Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too. However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays - I'd rather shuffle them in-place, since they'll be quite large. Permutation = (len(a))įor old_index, new_index in enumerate(permutation):įor example: > a = numpy.asarray(,, ]) Shuffled_b = numpy.empty(b.shape, dtype=b.dtype) Shuffled_a = numpy.empty(a.shape, dtype=a.dtype) This code works, and illustrates my goals: def shuffle_in_unison(a, b): shuffle them in unison with respect to their leading indices. I want to shuffle each of them, such that corresponding elements continue to correspond - i.e. Train = df.sample(frac=0.I have two numpy arrays of different shapes, but with the same length (leading dimension). #%% Method 4: using pandas dataframe to splitĭf = pd.read_csv(file_path, header=None) # Some csv file (I used some file with 3 columns) Idx = np.random.permutation(arr.shape) # can also use random.shuffle #%% Method 3: shuffle indicies without a for loop #%% Method 2: shuffle the indecies and then shuffle and apply to X and Y

X_train, x_test, y_train, y_test = X, X, Y, Y #%% Method 1: shuffle the whole matrix arr and then split The code for the 4 different methods I timed: import numpy as np Method 3 won by far with the shortest time, after that method 1, and method 2 and 4 discovered to be really inefficient.

same as method 2, but in a more efficient way to do it.
shuffle the indices and then assign it x and y to split the data.
shuffle the whole matrix arr and then split the data to train and test.
I used 4 different methods (non of them are using the library sklearn, which I'm sure will give the best results, giving that it is well designed and tested code): Train_inds,test_inds = get_train_test_inds(y,train_proportion=0.5)Īfter doing some reading and taking into account the (many.) different ways of splitting the data to train and test, I decided to timeit! N = int(train_proportion*len(value_inds)) Testing sets are preserved (stratified sampling). Initial proportions of classes inside training and Y is any iterable indicating classes of each observation in the sample. With proportions train_proportion and (1-train_proportion) of initial sample. '''Generates indices, making random stratified split into training set and testing sets import numpy as npĭef get_train_test_inds(y,train_proportion=0.7): This makes training and testing sets better reflect the properties of the original dataset. Startified division also generates training and testing set randomly but in such a way that original class proportions are preserved. You may also consider stratified division into training and testing set. sklearn also includes more advanced "stratified sampling" methods that create a partition of the data that is balanced with respect to some features, for example to make sure that there is the same proportion of positive and negative examples in the training and test set.

Many of those are available in the sklearn library (k-fold, leave-n-out. There are many ways other ways to repeatedly partition the same data set for cross validation. Training_idx, test_idx = indices, indices If you want to split the data set once in two parts, you can use, or if you need to keep track of the indices (remember to fix the random seed to make everything reproducible): import numpy

0 Comments

Numpy random permutation

Leave a Reply.

Author

Archives

Categories