-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathpython-in-r.qmd
1008 lines (787 loc) · 23.5 KB
/
python-in-r.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Introduction to Python (in R) {#sec-python-in-r}
```{r}
#| echo: false
#| message: false
source("_common.R")
```
<!-- Note: None of the Python chunks evaluate. Outputs are all hard-coded so that rmarkdown doesn't have to call Python. This is because the page will be build with Github Actions and Python is having problems for some reason with that -->
> ### Learning Objectives {.unnumbered}
>
> * Learn how to use the [**reticulate**](https://rstudio.github.io/reticulate/) R package to work with python in R.
> * Learn basic python syntax and how it compares to R.
> * Learn how to write functions, conditional statements, loops in python.
> * Learn how to use Python methods.
> * Learn basic string manipulation in python.
> * Learn how to run a Python script fromr R.
>
> ### Suggested readings {.unnumbered}
>
> * The [**reticulate**](https://rstudio.github.io/reticulate/) R package documentation.
## Getting started with Python (in R)
Python is another very popular computing language for data analysis and general purpose computing. Since R is the main language for this course, we will not cover all the many wonderous things that Python can do. Instead, we will introduce Python through the lens of how it is used for _data analysis_, with a particular focus on comparing its similarities and differences with R.
While you can work with Python in a number of ways, we will use the [**reticulate**](https://rstudio.github.io/reticulate/) to access it directly from R!
### Installation
To get started, install the package (remember, you only need to do this once on your computer):
```{r}
#| eval: false
install.package('reticulate')
```
Once installed, load the package:
```{r}
#| eval: false
library(reticulate)
```
If you already have Python installed on your computer, you should be okay, but you may see the following message pop up in the console:
```{r}
#| eval: false
Would you like to install Miniconda? [Y/n]:
```
If so, I recommend you go ahead and install Miniconda by typing `y` and pressing enter. Miniconda is a smaller version of the larger ["Conda"](https://docs.conda.io/en/latest/) distribution that most people use to install Python, and it is the preferred setup for using Python in R.
### Starting Python
Once you've loaded the **reticulate** library, use the following command to open up a Python REPL (which stands for "**R**ead–**E**val–**P**rint-**L**oop"):
```{r}
#| eval: false
repl_python()
```
Now look at your console - you should see three `>>>` symbols. This means you're now using Python! (Remember, the R console has only one `>` symbol).
**Check your Python version!**
Above the `>>>` symbols, you should see a message indicating which version of Python you are using. **It should say "Python 3...."**. Python has two versions (2 and 3) - we'll be using Python 3. If you see Python 2, then you'll need to adjust your [configuration](https://rstudio.github.io/reticulate/articles/versions.html) to use Python 3. If you installed Miniconda, this should be Python 3.
### Exiting Python
If you want to get back to good 'ol R, just type the command `exit` into the Python console:
```{r}
#| eval: false
exit
```
Note that you should `exit` and not `exit()` with parenthesis.
## Python basics
### Operators
Python has all the same arithmetic (`+-*/`), relational (`<>=`), and logical (`&|!`) operators as R, but some of the symbols are a little different. Here's a quick comparison of these differences:
<div style="width:450px">
Arithmetic operators | R | Python
---------------------|--------|-----------
Integer division | `%/%` | `//`
Modulus | `%%` | `%`
Powers | `^` | `**`
</div>
<div style="width:450px">
Logical operators | R | Python
-------------------|-----------|-----------
And | `&` | `&` or `and`
Or | `|` | `|` or `or`
Not | `!` | `!` or `not`
</div>
Python uses the same symbols `&`, `|`, and `!` for assessing logical statements, but Python also supports the use of the English words `and`, `or`, and `not`. For example, the following statements will both return `True`
```{python, eval=FALSE}
(3 == 3) & (4 == 4)
```
```
## True
```
```{python, eval=FALSE}
(3 == 3) and (4 == 4)
```
```
## True
```
### Variable assignment
While in R you can use either `=` or `<-` to assign values to objects, in Python only the `=` symbol can be used:
```{python, eval=FALSE}
value = 3
value
```
```
## 3
```
### Data types
For the most part, Python has the same data types as R: "numeric", "string", and "logical". But they use different words to describe them:
<div style="width:450px">
Description | R | Python
---------------------|--------------|-----------
numeric (w/decimal) | `"double"` | `"float"`
integer | `"integer"` | `"int"`
character | `"character"`| `"str"`
logical | `"logical"` | `"bool"`
</div>
There are three important distinctions between the languages on data types:
1. **Logicals**: Logical statements in R use the words `TRUE` and `FALSE` (in all caps) to denote logical statements that are "True" or "False", but in Python you only capitalize the first letter: `True` or `False`
2. **Integers vs. Floats**: In R, all numbers are "floats" by default (i.e. they have decimal places), so even numbers that _look_ like integers (e.g. `3`) are technically floats. In Python, numbers are integers by default unless they have decimal values (e.g. `3` is an `int` type, but `3.14` is a `float` type).
3. **NULL**: In R, a value of "nothing" is represented by `NULL`, but in Python we use `None`.
You can check the type using `typeof()` in R or `type()` in Python:
<div class = "row">
<div class = "col-md-6">
**R**:
```{r}
typeof(3.14)
```
```{r}
typeof(3L)
```
```{r}
typeof("3")
```
```{r}
typeof(TRUE)
```
</div>
<div class = "col-md-6">
**Python**:
```{python, eval=FALSE}
type(3.14)
```
```
## <class 'float'>
```
```{python, eval=FALSE}
type(3)
```
```
## <class 'int'>
```
```{python, eval=FALSE}
type("3")
```
```
## <class 'str'>
```
```{python, eval=FALSE}
type(True)
```
```
## <class 'bool'>
```
</div>
</div>
### Coercing data types
In R, you can convert data types using the general form of `as.something()`, replacing "`something`" with a data type. In Python, you can simply use the data type name to convert types. Here's a comparison:
<div class = "row">
<div class = "col-md-4">
</div>
<div class = "col-md-4">
**R**
</div>
<div class = "col-md-4">
**Python**
</div>
</div>
<div class = "row">
<div class = "col-md-4">
**Convert to double / float**:
</div>
<div class = "col-md-4">
```{r}
as.double(3)
```
</div>
<div class = "col-md-4">
```{python, eval=FALSE}
float(3)
```
```
## 3.0
```
</div>
</div>
<div class = "row">
<div class = "col-md-4">
**Convert to integer**:
</div>
<div class = "col-md-4">
```{r}
as.integer(3.14)
```
</div>
<div class = "col-md-4">
```{python, eval=FALSE}
int(3.14)
```
```
## 3
```
</div>
</div>
<div class = "row">
<div class = "col-md-4">
**Convert to string**:
</div>
<div class = "col-md-4">
```{r}
as.character(3.14)
```
</div>
<div class = "col-md-4">
```{python, eval=FALSE}
str(3.14)
```
```
## '3.14'
```
</div>
</div>
<div class = "row">
<div class = "col-md-4">
**Convert to logical**:
</div>
<div class = "col-md-4">
```{r}
as.logical(3.14)
```
</div>
<div class = "col-md-4">
```{python, eval=FALSE}
bool(3.14)
```
```
## True
```
</div>
</div>
Remember that "logical" types convert to `TRUE` for any number other than `0`, which converts to `FALSE`.
## Loops
Perhaps the biggest syntax difference between R and Python is that **Python uses white space to define things**.
For example, to write a loop in Python, you have to indent the second line by four character spaces, otherwise you'll get an error. The benefits of this is that it forces you to use good style practices, and you don't have to use the `{}` symbols like you do in R. The downside is that if you have a single space character missing, you'll get an error, and sometimes that's hard to notice.
Here's a comparison of loops in R and Python:
<div class = "row">
<div class = "col-md-2">
</div>
<div class = "col-md-5">
**R**
</div>
<div class = "col-md-5">
**Python**
</div>
</div>
<div class = "row">
<div class = "col-md-2">
`for` loop:
</div>
<div class = "col-md-5">
```{r}
for (i in c(1,3,5)) {
print(i)
}
```
</div>
<div class = "col-md-5">
```{python, eval=FALSE}
for i in [1,3,5]:
print(i)
```
```
## 1
## 3
## 5
```
</div>
</div>
<div class = "row">
<div class = "col-md-2">
`while` loop:
</div>
<div class = "col-md-5">
```{r}
i <- 1
while (i <= 5) {
print(i)
i <- i + 2
}
```
</div>
<div class = "col-md-5">
```{python, eval=FALSE}
i = 1
while i <= 5:
print(i)
i = i + 2
```
```
## 1
## 3
## 5
```
</div>
</div>
One of the things many people love about Python is just how "clean" the syntax looks. Compared to R, the Python code above is more compact and contains less distracting elements, like the "`{}`" symbols. You also don't need to include `()` symbols on the first line.
Other than these differences in syntax, loops are essentially the same across the two languages.
## Functions
Functions use the same "spacing" format as loops, and again the Python syntax is more compact. Here's a comparison of the `isEven(n)` function:
<div class = "row">
<div class = "col-md-6">
**R**:
```{r, eval=FALSE}
isEven <- function(n) {
if (n %% 2 == 0) {
return(TRUE)
}
return(FALSE)
}
```
</div>
<div class = "col-md-6">
**Python**:
```{python, eval=FALSE}
def isEven(n):
if (n % 2 == 0):
return(True)
return(False)
```
</div>
</div>
Note the difference in the ordering of the first lines. In R, you first define the function name, then you assign to that name the function and argument(s).
In Python, you do not use any assignment to create a function. Rather, you use the command `def` followed by the function name and argument(s). Here, the Python syntax is quite natural - you use the same syntax that you would use when calling the function (e.g. `isEven(n)`).
Note also that the `if` statement in Python also uses the same general syntax of indented white space instead of using the `{}` symbols.
## Python Methods
You might have heard people (i.e. me) say that Python is more "object-oriented" whereas R is more "functional." What I mean is that in R you mostly apply _functions_ to _objects_, but in Python you often call special functions that _belong_ to certain object types. Here's an example of converting a string to upper case:
<div class = "row">
<div class = "col-md-6">
**R**: We use the string `"foo"` as an argument to the `str_to_upper()` function from the **stringr** library, which returns `"FOO"`.
```{r}
stringr::str_to_upper("foo")
```
</div>
<div class = "col-md-6">
**Python**: we use the `.upper()` _method_ that belongs to the string `"foo"`, which returns `"FOO"`. All strings in Python have this method.
```{python, eval=FALSE}
"foo".upper()
```
```
## 'FOO'
```
</div>
</div>
Methods are special functions that belong to objects of a certain _class_. You "call" methods using the name of the object followed by the `.` symbol, like this:
```{r, eval=FALSE}
object.method()
```
You can also see the different methods available for a particular object by calling the `dir` function on the object:
```{python, eval=FALSE}
s = "foo"
dir(s)
```
```
## ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
```
Wow, strings have a lot of methods!
The concept of using methods is a major part of the "object-oriented" way of programming, since it's the _object_ that is the center of attention. The _object_ in Python is more than just a stored value - it's a source of other methods (depending on the object's class).
Now that you've seen a little about how Python methods work, we'll get to use some working with strings!
## Strings
String manipulation is one area where more substantial differences emerge between Python and R. Because R's built in functions for dealing with strings are rather unintuitive, we've relied on the **stringr** package:
```{r}
#| message: false
library(stringr)
```
In Python, many of the basic string manipulations are actually done with basic arithmatic operators, just like with numbers. Here are a few comparisons:
<div class = "row">
<div class = "col-md-2">
</div>
<div class = "col-md-5">
**R**
</div>
<div class = "col-md-5">
**Python**
</div>
</div>
<div class = "row">
<div class = "col-md-2">
**String concatenation**:
</div>
<div class = "col-md-5">
In R, we use the function `paste()` to combine strings:
```{r}
paste("foo", "bar", sep = "")
```
</div>
<div class = "col-md-5">
In Python, you can combine strings by "adding" them together. The default is to merge them with no space in between:
```{python, eval=FALSE}
"foo" + "bar"
```
```
## 'foobar'
```
</div>
</div>
<div class = "row">
<div class = "col-md-2">
**String repetition**:
</div>
<div class = "col-md-5">
Creating a repeated string is even more complicated in R. You first have to create a _vector_ of repeated strings, and then "collapse" them using the `paste()` function:
```{r}
paste(rep("foo", 3), collapse = '')
```
</div>
<div class = "col-md-5">
In Python, you can just "multiply" the string, like this:
```{python, eval=FALSE}
"foo" * 3
```
```
## 'foofoofoo'
```
</div>
</div>
<div class = "row">
<div class = "col-md-2">
**Sub-string detection**:
</div>
<div class = "col-md-5">
In R, we use the `str_detect()` function:
```{r}
#| message: false
str_detect('Apple', 'ppl')
```
</div>
<div class = "col-md-5">
In Python, you can detect sub-strings using the `in` operator:
```{python, eval=FALSE}
'ppl' in 'Apple'
```
```
## True
```
</div>
</div>
### Functions and methods
Because Python has both functions and object methods, it can sometimes be tricky to remember which to use for a specific purpose. For example, if you want to know how many characters are in a string, you use a function, just like in R:
<div class = "row">
<div class = "col-md-2">
</div>
<div class = "col-md-5">
**R**
</div>
<div class = "col-md-5">
**Python**
</div>
</div>
<div class = "row">
<div class = "col-md-2">
**String length**:
</div>
<div class = "col-md-5">
```{r}
str_length('foo')
```
</div>
<div class = "col-md-5">
```{python, eval=FALSE}
len('foo')
```
```
## 3
```
</div>
</div>
However, lots of basic string manipulations are done with string methods:
<div class = "row">
<div class = "col-md-2">
</div>
<div class = "col-md-5">
**R**
</div>
<div class = "col-md-5">
**Python**
</div>
</div>
<div class = "row">
<div class = "col-md-2">
**Case converstion**:
</div>
<div class = "col-md-5">
```{r}
s <- "A longer string"
str_to_upper(s)
str_to_lower(s)
str_to_title(s)
```
</div>
<div class = "col-md-5">
```{python, eval=FALSE}
s = "A longer string"
s.upper()
```
```
## 'A LONGER STRING'
```
```{python, eval=FALSE}
s.lower()
```
```
## 'a longer string'
```
```{python, eval=FALSE}
s.title()
```
```
## 'A Longer String'
```
</div>
</div>
<div class = "row">
<div class = "col-md-2">
**Remove excess white space**:
</div>
<div class = "col-md-5">
```{r}
s <- " A string with space "
str_trim(s)
```
</div>
<div class = "col-md-5">
```{python, eval=FALSE}
s = " A string with space "
s.strip()
```
```
## 'A string with space'
```
</div>
</div>
<div class = "row">
<div class = "col-md-2">
**Detect if string contains only numbers**:
</div>
<div class = "col-md-5">
R doesn't have a function for this, but you can convert it to a number and check if the result is not `NA`:
```{r}
s <- "42"
!is.na(as.numeric(s))
```
</div>
<div class = "col-md-5">
Python has some handy string methods!
```{python, eval=FALSE}
s = "42"
s.isnumeric()
```
```
## True
```
</div>
</div>
### Slicing
To extract a sub-string, in R we have to use the `str_sub()` function. But in Python, you can simply use the `[]` symbols. In either case, you have to provide indices of where to start and stop the "slice".
For example, here's how to extract the sub-string `"App"` from `"Apple"` in each language:
<div class = "row">
<div class = "col-md-6">
**R**:
```{r}
s <- "Apple"
str_sub(s, 1, 3)
```
</div>
<div class = "col-md-6">
**Python**:
```{python, eval=FALSE}
s = "Apple"
s[0:3]
```
```
## 'App'
```
</div>
</div>
Note that we had to use a different starting index here to get the same sub-string in each language. That's because **indexing starts at 0 in Python**.
If this seems strange, just imagine "fence posts". In Python, the elements in a sequence are like items sitting _between_ fence posts. So the index of each character in the string `"Apple"` look like this:
```{r, eval=FALSE}
index: 0 1 2 3 4 5
| | | | | |
| "A" | "p" | "p" | "l" | "e" |
| | | | | |
```
When you make a slice in Python, you slice at the _fence post_ number to get the elements _between_ the posts. So in this case, if we want to get the sub-string `"App"` from `"Apple"`, we need to slice from the post `0` to `3`.
Negative indices are also handled differently.
<div class = "row">
<div class = "col-md-6">
**R**: Negative indices start from the end of the string _inclusively_:
```{r}
str_sub(s, -1)
str_sub(s, -3)
```
</div>
<div class = "col-md-6">
**Python**: Negative indices start from the end of the string, but only return the _character at that index_:
```{python, eval=FALSE}
s[-1]
```
```
## 'e'
```
```{python, eval=FALSE}
s[-3]
```
```
## 'p'
```
To get an inclusive string, you have to provide a starting and ending index:
```{python, eval=FALSE}
s[-3:-1]
```
```
## 'pl'
```
```{python, eval=FALSE}
s[-3:5]
```
```
## 'ple'
```
</div>
</div>
You can get the index of a character or sub-string in Python using the `.index()` method:
<div class = "row">
<div class = "col-md-6">
**R**: Returns the starting and ending indices of the sub-string
```{r}
str_locate(s, "pp")
```
</div>
<div class = "col-md-6">
**Python**: Returns only the starting index of the sub-string
```{python, eval=FALSE}
s.index("pp")
```
```
## 1
```
</div>
</div>
## Splitting strings
Like in R, splitting a string returns a list of strings. Python lists are similar to R lists, but they only have single brackets. Here's an example:
<div class = "row">
<div class = "col-md-6">
**R**:
```{r}
s <- "Apple"
str_split(s, "pp")
```
</div>
<div class = "col-md-6">
**Python**:
```{python, eval=FALSE}
s = "Apple"
s.split("pp")
```
```
## ['A', 'le']
```
</div>
</div>
In both languages, the returned list contains the remaining characters after splitting the string (in this case, `"A"` and `"le"`). One main difference though is that R returns a list of _vectors_, so to access the returned vector containing `"A"` and `"le"` you have to access the first element in the list, like this:
```{r}
str_split(s, "pp")[[1]]
```
This is because in R the `str_split()` function is _vectorized_, meaning that the function can also be performed on a _vector_ of strings, like this:
```{r}
s <- c("Apple", "Snapple")
str_split(s, "pp")
```
In this example, it's easier to see that R is returning a list of vectors. In contrast, Python cannot perform a split on multiple strings:
```{python, eval=FALSE}
s = ["Apple", "Snapple"]
s.split("pp")
```
```
## AttributeError: 'list' object has no attribute 'split'
```
To handle this, you will need to import the **numpy** package, which has an "array" structure similar to R vectors (we'll cover this in more detail on week 13). Here's an example:
```{python, eval=FALSE}
import numpy as np
s = np.array(["Apple", "Snapple"])
np.char.split(s, "pp")
```
```
## array([list(['A', 'le']), list(['Sna', 'le'])], dtype=object)
```
## Running a Python script in R
While R scripts end in `.R`, Python scripts end in `.py`. You can open up and save a blank Python script in RStudio by clicking
> File -> New File -> Python Script
Save it as `foo.py` in your project folder. Now that it's saved, let's add some code to run. As a quick example, I'm going to add code defining the function `isOdd()` and then create a few values testing it:
```{python, eval=FALSE}
def isOdd(n):
if (n % 2 == 1):
return(True)
return(False)
n1 = isOdd(4)
n2 = isOdd(3)
```
Now that you have this code stored in your `foo.py` file, you can source the file from inside R, like this:
```{r, eval=FALSE}
reticulate::source_python('foo.py')
```
Magically, the function `isOdd()` and the objects we created (`n1` and `n2`) are now accessible from R!
```{r}
#| eval: false
isOdd(7)
```
```
## [1] TRUE
```
```{r}
#| eval: false
n1
```
```
## [1] FALSE
```
```{r}
#| eval: false
n2
```
```
## [1] TRUE
```
## Summary of R/Python differences
- Indexing starts at 0 in Python and 1 in R.
- Strings in Python can be manipulated with arithmetic operators.
- Python is more "object-oriented" whereas R is more "functional".
## Tips
### Making your own Python class
You can get really creative with object-oriented programming in Python by creating your own custom classes, allowing you to embed values and methods that belong only to objects of that class. For example, here's how to create a class called `Animal`, which is defined by two values: `species` and `sound`. Note the white space indentations - without them Python will error:
```{python, eval=FALSE}
class Animal:
def __init__(self, species, sound):
self.species = species
self.sound = sound
```
The first function in any custom class is the `__init__` function. This is where to define any arguments that must be input when defining an object of the custom class. The use of `self` here defines which methods and values will be stored in the object onces it's created.
Here's a example of how we could use the `Animal` class:
```{python, eval=FALSE}
riley = Animal("Dog", "Woof")
```
Here I've defined an object named `riley` (my dog's name), and it has two stored values: `"Dog"` (the species) and `"Woof"` (the sound). I can access these stored values by calling the `species` and `sound` values from the `riley` object:
```{python, eval=FALSE}
riley.species
```
```
## 'Dog'
```
```{python, eval=FALSE}
riley.sound
```
```
## 'Woof'
```
I can also ask Python what type of object `riley` is, and it will tell me it's of the `Animal` class:
```{python, eval=FALSE}
type(riley)
```
```
<class '__main__.Animal'>
```
In addition to just storing values, you can create custom methods that will only be accessible to objects of the custom class. Here I'm adding the method `introduce()` to the class `Animal`:
```{python, eval=FALSE}
class Animal:
def __init__(self, species, sound):
self.species = species
self.sound = sound
def introduce(self):
print("I'm a " + self.species + " and I say " + self.sound)
```